RSS
 

Arquivo para October 10th, 2013

A curious problem of Big Data

10 Oct

Simon DeDeo a research in applied mathematics and complex systems of the Santa Fe Institute,MatematicaCriativa had a problem , as posted in Wired magazine .

He was collaborating on a new project to analyze data from the archives of the Old Bailey court in London criminal court central England and Wales 300 years . ”

But there were no clean data ( say structured ) as in a normal Excel spreadsheet format simple , including variables such as prosecution, trial and sentence in each case , but about 10 million words written over just under 200 000 trials .

How could analyze this data ? DeDeo question: ” It is not the size of the data set , it was difficult for patterns of large data size was quite manageable .” Was this enormous complexity and lack of formal structure that represented a problem for these ” big data ” that disturbed .

The paradigm of the research involved the formation of a hypothesis , decide precisely what it was intended to measure , then build a device to make this measurement accurately as possible , is not exactly like physics where you control variables and has a limited number of data.

Alessandro Vespignani , a physicist at Northeastern University , which specializes in harnessing the power of social networks to model disease outbreaks , the behavior of the stock market , the collective social dynamics , and electoral results , collected many terabytes of data networks social as Twitter , this approach can help treat texts written out of social networking .

Scientists like DeDeo Vespignani and make good use of this fragmented approach to the analysis of large data , but the mathematician at Yale University, Ronald Coifman says what is really needed is the large volume of data equivalent to a Newtonian revolution , comparing with the invention the calculation of the 17th century , which he believes is already underway .

Coifman says ” We have all the pieces of the puzzle – now how do we actually ride them ,” ie , we still have to move forward to address scattered data.