[un producto de Bestiario]

Quadrigram es el primer lenguaje de programación visual desarrollado específicamente para crear y compartir visualizaciones de datos interactivas en su navegador. Hemos construido una biblioteca con cientos de módulos orientados a tal fin. Estos módulos van desde los recursos que recogen datos (ya sea localmente o importación para trabajar con APIs), a los operadores y los controles (que permiten un filtrado de datos), pasando por los módulos específicos para la visualización de datos (gráficos básicos de barras o redes de imágenes en 3D).

Una de las características de Quadrigram es la posibilidad de procesar texto, aprovechando R para realizar análisis semántico y sintáctico o generar modelos de predicción. Nuestros visualizadores avanzados también permiten a los usuarios crear sus propios enfoques para la visualización de datos, al trabajar con puntos, formas, líneas y otros primitivos, de modo que no sólo se limita a un conjunto convencional de gráficos. Una biblioteca rica en módulos reduce la barrera de entrada para los no programadores interesados en hacer visualizaciones de datos, mientras que al mismo tiempo permite soluciones no lineales a los procesos de datos. Además, Quadrigram proporciona una forma para que los programadores puedan crear prototipos de soluciones con mayor rapidez.

Tools and methods for capturing Twitter data during natural disasters, by Axel Bruns and Yuxian Eugene Liang. First Monday, Volume 17, Number 4 - 2 April 2012

Abstract: During the course of several natural disasters in recent years, Twitter has been found to play an important role as an additional medium for many–to–many crisis communication. Emergency services are successfully using Twitter to inform the public about current developments, and are increasingly also attempting to source first–hand situational information from Twitter feeds (such as relevant hashtags). The further study of the uses of Twitter during natural disasters relies on the development of flexible and reliable research infrastructure for tracking and analysing Twitter feeds at scale and in close to real time, however. This article outlines two approaches to the development of such infrastructure: one which builds on the readily available open source platform yourTwapperkeeper to provide a low–cost, simple, and basic solution; and, one which establishes a more powerful and flexible framework by drawing on highly scaleable, state–of–the–art technology.

A data-driven organization acquires, processes, and leverages data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

There are many ways to assess whether an organization is data driven. Some like to talk about how much data they generate. Others like to talk about the sophistication of data they use, or the process of internalizing data. I prefer to start by highlighting organizations that use data effectively.

I’ve found that the strongest data-driven organizations all live by the motto “if you can’t measure it, you can’t fix it” (a motto I learned from one of the best operations people I’ve worked with). This mindset gives you a fantastic ability to deliver value to your company by:

  • Instrumenting and collecting as much data as you can. Whether you’re doing business intelligence or building products, if you don’t collect the data, you can’t use it.
  • Measuring in a proactive and timely way. Are your products, and strategies succeeding? If you don’t measure the results, how do you know?
  • Getting many people to look at data. Any problems that may be present will become obvious more quickly — “with enough eyes all bugs are shallow.”
  • Fostering increased curiosity about why the data has changed or is not changing. In a data-driven organization, everyone is thinking about the data.

It’s easy to pretend that you’re data driven. But if you get into the mindset to collect and measure everything you can, and think about what the data you’ve collected means, you’ll be ahead of most of the organizations that claim to be data driven. And while I have a lot to say about professional data scientists later in this post, keep in mind that data isn’t just for the professionals. Everyone should be looking at the data.

[data science teams]

The silos that have traditionally separated data people from engineering, from design, and from marketing, don’t work when you’re building data products. I would contend that it is questionable whether those silos work for any kind of product development. But with data, it never works to have a waterfall process in which one group defines the product, another builds visual mock-ups, a data scientist preps the data, and finally a set of engineers builds it to some specification document. We’re not building Microsoft Office, or some other product where there’s 20-plus years of shared wisdom about how interfaces should work. Every data project is a new experiment, and design is a critical part of that experiment. It’s similar for operations: data products present entirely different stresses on a network and storage infrastructure than traditional sites. They capture much more data: petabytes and even exabytes. They deliver results that mash up data from many sources, some internal, some not. You’re unlikely to create a data product that is reliable and that performs reasonably well if the product team doesn’t incorporate operations from the start. This isn’t a simple matter of pushing the prototype from your laptop to a server farm …

Understanding collaboration in Wikipedia, by Royce Kimmons. First Monday, Volume 16, Number 12 - 5 December 2011

Abstract: Wikipedia stands as an undeniable success in online participation and collaboration. However, previous attempts at studying collaboration within Wikipedia have focused on simple metrics like rigor (i.e., the number of revisions in an article’s revision history) and diversity (i.e., the number of authors that have contributed to a given article) or have made generalizations about collaboration within Wikipedia based upon the content validity of a few select articles. By looking more closely at metrics associated with each extant Wikipedia article (N=3,427,236) along with all revisions (N=225,226,370), this study attempts to understand what collaboration within Wikipedia actually looks like under the surface. Findings suggest that typical Wikipedia articles are not rigorous, in a collaborative sense, and do not reflect much diversity in the construction of content and macro–structural writing, leading to the conclusion that most articles in Wikipedia are not reflective of the collaborative efforts of the community but, rather, represent the work of relatively few contributors.

The data-citizen driven city is a project done for the 4th Advanced Architecture Contest: Shaping our environment with real-time data (awarded with an honorable mention). A project done by: Sara Alvarellos,Cesar García, Jorge Medal, Sara Thomson.

data_driven_city

Understanding reality with data, changing personal habits.
Using open source technologies, like Arduino-based sensor units or mobile apps, data-citizens will be able to gather their own real-time data regarding issues they are really concerned about, such as air quality, noise levels, street deficiencies, plagues, etc. All data will be shared in open public repositories, like Pachube, available for everyone. Long term data archival will allow citizens to gain a better understanding of the urban environment and to improve their daily personal habits.

Collective intelligence and critical mass. Social Cohesion.
Once there is a critical mass of participants, distributed citizen sensor networks will reveal new emerging patterns that will lead to a collective intelligence. Citizens will soon become aware of the political power of data and they will begin to get organized in local work groups to develop new strategies to improve their neighbourhoods. The massive adoption of sensors will bring their price down, allowing anyone to participate in the extension of this smart city data layer, regardless of their income.

Renovation of the Social Contract. Collective emerging actions.
Involvement and commitment will be part of a new social contract in which the rights and obligations of the citizens and the institutions will be redefined.
The maintenance and development of local resources will be delegated to neighbours that will feel engaged in the improvement of the urban ecosystem. Alarm warnings will not be accounted for in an isolated way; an holistic approach based upon data modelling will provide a global solution taking into account all the gathered data. Open data governance and accountability will be enforced through civil actions. The mission of local institutions will consist in supporting these local processes and developing long term plans.

Conclusion: A more sustainable and democratic city.
By the year 2020, citizens will participate in direct democratic processes at a local scale to transform the city into a more sustainable and efficient environment. Data will enable new uses of public spaces offering streamlined solutions. People will feel highly engaged towards their neighbours and surroundings in contrast to their previously detached postures. The success of radically open transparent processes will constitute a genuine milestone in the transformation of 21st century public institutions.

We’d like to think of ourselves as dynamic, unpredictable individuals, but according to new research, that’s not the case at all. In a study published in last week’s Science, researchers looked at customer location data culled from cellular service providers. By looking at how customers moved around, the authors of the study found that it may be possible to predict human movement patterns and location up to 93 percent of the time. These findings may be useful in multiple fields, including city planning, mobile communication resource management, and anticipating the spread of viruses.

Science, 2010. DOI: 10.1126/science.1177170

… Cities are sitting on data goldmines. If we can overcome privacy concerns then academics and other nerds could do some really interesting research. The GIS mapping crew can map until the end of time but we could be more ambitious and make some progress on some causal questions.


Mayor Bloomberg, who knows something about the value of information, is leading the charge here. Will Mayors of smaller cities follow the leader? Suppose that the Mayor makes available the electricity bill of every commercial building in the city. Energy efficiency businesses could contact the buildings whose square foot consumption is high to see if these buildings are being “wasteful” or whether the activities taking place there (plasma TV?) just use a lot of power. The “green jobs” weatherizers should not be throwing darts at a map in choosing who to target. They should focus on high electricity consumers but do these consumers know that they are “high”? Equal easy access to information would make my answer “yes”. 

Lists of restaurants that have poisoned people recently will lead to improved sanitation and public health (see Phil Leslie’s work). This is just the tip of the iceberg. Information is a public good and government does have a role here in providing it. I have been told that Milton Friedman opposed this role but I have never understood his view here…

Beautiful Data: The Stories Behind Elegant Data Solutions, by Toby Segaran and Jeff Hammerbacher. O’Reilly Media, Inc. (2009)
In this insightful book, you’ll learn from the best data practitioners in the field just how wide-ranging - and beautiful - working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. With “Beautiful Data”, you will: explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web; learn how to visualize trends in urban crime, using maps and data mashups; discover the challenges of designing a data processing system that works within the constraints of space travel; also learn how crowdsourcing and transparency have combined to advance the state of drug research; and, understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data. Learn about the massive infrastructure required to create, capture, and process DNA data. That’s only small sample of what you’ll find in “Beautiful Data”. For anyone who handles data, this is a truly fascinating book. Contributors include: Nathan Yau; Jonathan Follett and Matt Holm; J.M. Hughes; Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava; Jeff Hammerbacher; Jason Dykes and Jo Wood; Jeff Jonas and Lisa Sokol; Jud Valeski; Alon Halevy and Jayant Madhavan; Aaron Koblin and Valdean Klump; Michal Migurski; Jeff Heer; Coco Krumme; Peter Norvig; Matt Wood and Ben Blackburne; Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen; Lukas Biewald and Brendan O’Connor; Hadley Wickham, Deborah Swayne, and David Poole; Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza; and, Toby Segaran.

Beautiful Data: The Stories Behind Elegant Data Solutions, by Toby Segaran and Jeff Hammerbacher. O’Reilly Media, Inc. (2009)

In this insightful book, you’ll learn from the best data practitioners in the field just how wide-ranging - and beautiful - working with data can be. Join 39 contributors as they explain how they developed simple and elegant solutions on projects ranging from the Mars lander to a Radiohead video. With “Beautiful Data”, you will: explore the opportunities and challenges involved in working with the vast number of datasets made available by the Web; learn how to visualize trends in urban crime, using maps and data mashups; discover the challenges of designing a data processing system that works within the constraints of space travel; also learn how crowdsourcing and transparency have combined to advance the state of drug research; and, understand how new data can automatically trigger alerts when it matches or overlaps pre-existing data. Learn about the massive infrastructure required to create, capture, and process DNA data. That’s only small sample of what you’ll find in “Beautiful Data”. For anyone who handles data, this is a truly fascinating book. Contributors include: Nathan Yau; Jonathan Follett and Matt Holm; J.M. Hughes; Raghu Ramakrishnan, Brian Cooper, and Utkarsh Srivastava; Jeff Hammerbacher; Jason Dykes and Jo Wood; Jeff Jonas and Lisa Sokol; Jud Valeski; Alon Halevy and Jayant Madhavan; Aaron Koblin and Valdean Klump; Michal Migurski; Jeff Heer; Coco Krumme; Peter Norvig; Matt Wood and Ben Blackburne; Jean-Claude Bradley, Rajarshi Guha, Andrew Lang, Pierre Lindenbaum, Cameron Neylon, Antony Williams, and Egon Willighagen; Lukas Biewald and Brendan O’Connor; Hadley Wickham, Deborah Swayne, and David Poole; Andrew Gelman, Jonathan P. Kastellec, and Yair Ghitza; and, Toby Segaran.