Big Data

Analyzing big data with TDA

Thomas Bosshard
An century-old method of visualization is getting new attention.

It is a common misconception that business intelligence technologies can be used to find a diamond in a coal mine. Business analytics is more for confirmation than for random exploration. In other words, the experienced analyst first formulates a hypothesis that two or more things are related or that some underlying behavior is driving a pattern and then collects the data.

But can your data actually tell a story without first having to formulate a query or hypothesis? Until recently, this seemed hardly possible for large and high dimensional data sets, more commonly referred to as “big data”. However, a branch of mathematics called topology is emerging as a unique method to help identify otherwise invisible structures of big datasets.

Topology is a field within geometry that deals with the study of shapes dating back to the 18th century. Gunnar Carlsson, a mathematician at Stanford University, has been using topological data analysis (TDA) to represent big data sets according to their topological features. The result is an intuitive map of data based on the similarity of data points; using distance as an input that translates into a topological shape or network. The more similar the data points are, the closer they will be on the resulting map. The more different they are, the further apart they will be on the map.

TDA has already yielded useful results in the field of cancer therapy. Findings from a breast cancer dataset spanning 11 years discovered previously hidden subpopulations of survivors whose characteristics can now be used to help research and therapy.

One of the biggest challenges in big data today is asking the right questions of data in order to be able to turn large and complex data sets into useful knowledge. Recognizing shapes (patterns) in data can help to extract meaningful insights, and TDA seems to be a promising method to help assemble the pieces of the puzzle so we can see the big picture without having to ask questions.  

