Data Science, Complexity, and The Future

Big data is important because it provides information the world never had access to before. Despite the obvious nature of such a claim, it is truly the major function of big data and represents its contribution to society. Data science is crucial because raw numbers are useless without the tools to synthesize them. Big data holds the key to the universe and data science is the Rosetta stone needed to access the true meaning of the data. Evidently, in my view, data science holds much potential to advance our understanding of human development and other complex systems across the globe. It expands what society is able to understand, as the human mind is limited. Due to this higher level of computing power, data science has proven time and again to be effective in outlining problems and synthesizing solutions, as complex systems are modelable. Therefore, I have hope that big data and data science can usher in a new age of human well being, if harnessed correctly. There are many obstacles to overcome even with data science on the side of progress, but with proper oversight and care, progress can be made.

Humans do not have the capacity to compute and comprehend the complexity of the systems related to human development. If one places a sheet of the average GDP of East Asian countries in the past twenty years in front of another person, little will be accomplished. They may be able to detect an overall increase across the region or in a particular country, but they will certainly miss a potential interplay between the GDP of two countries or linear relationship between another two countries’ GDPs. Even in the event such a pattern is found, it would have taken a substantial amount of time and brainpower. Data science and the algorithms and methods involved can collect the data and draw the conclusions that humans are unable to envision. It gives us the means to find patterns that would be otherwise undetectable, instead of allowing people to focus on acting.

To illustrate, there are many occasions in which people have employed data science to understand complex systems. For instance, Anderson in his paper about the end of theory discusses a biologist who used a supercomputer to sequence data collected from ecosystems and discovered thousands of unknown species of bacteria. The biologist in question never examined any of the physical characteristics of any of the organisms, he simply investigated data he collected and discovered discrepancies pointing to novel creatures. Taxonomy and species classifications are notoriously complex subjects in biology, but the data science methods applied tackled the subject with ease. Additionally, scientists have been able to model the climate system and the potential impacts on agriculture in Africa for the next 70 years. Another article I read regarding a similar topic frequently discussed how the “climate is chaotic” and resultingly exceedingly difficult to predict. Time will tell whether the model conceived is an accurate estimation of what events will really transpire, but nonetheless, even an approximation of such an intricate system is impressive. And there are countless other examples of similar complex modeling, from measuring economic activity in China based on road density to discovering the Higgs particle with the Large Hadron Collider. Humans are simply unable to synthesis the inconceivable amount of data that was used to come to these conclusions. Without data science, we would be floundering.

The demonstrated capabilities of big data and data science give me hope for the future of understanding complex systems and addressing human development problems. First, systems “acquiesce” to modeling. Although potentially challenging from the offset, patterns in complex systems often exist, and they can be found. For example, West was able to find a consistent relationship of increased efficiency in larger cities (because less infrastructure is needed) and more innovation, more crime, and more disease as populations in cities rise. These correlations are reliably constant across the globe and scalable within a nation. We are able to harness these and similar patterned with data science. There are also examples of demonstrated good made possible with big data. As presented in the Blumenstock article, entities have been using satellite data to determine which houses in developing countries have thatched roofs. This allows organizations to more efficiently allocate resources and aid to families who have thatched roofs and assumably poorer than those without thatched roofs. Researchers have also been able to map areas with low birth weights and high stunting rates in children, which are both signs of malnutrition. The consequence of the study is two-fold. Similar to above, the scientists have found areas with higher rates of malnutrition and can better allocate aid to the regions where it is most needed. Furthermore, it was found that low birth weights and high rates of stunting proceeded periods of drought. Meaning, the scientists may have discovered a solution to malnutrition, which is improving the resilience of farmers to climatic stress. It is heartening that using data science, we have the ability to discover such patterns and produce solutions that would be otherwise unobtainable.

Nevertheless, the ability to find patterns does not mean improvements are guaranteed to occur. The Blumenstock article also pointed out examples of people pretending they live in a small house with a thatched roof, when in reality they live in the nicer house next door. In this way, people are acquiring resources they do not need at the detriment of those who desperately require them. In addition, knowing the solution to a challenge has no bearing on whether or not it will be enacted. Corruption and lack of concern, among other things, lead to many data science discoveries never being acted upon. Data provide the resources to people, but it can never solve the problem alone, meaning data science only provides hope of progress, not a guarantee.

Moreover, we cannot rely solely on data to even produce patterns and solutions. Anderson believes with the sheer amount of information available to humans and the unparalleled processing capacity of computers that everything will become apparent through data science and no human need be involved. This however is a fallacy. West admits that without data science his book would have no basis since the patterns he found would be impossible to detect. However, he also believes that a human developed framework and theory is required for the success of any data collected and synthesized. Not all data moves the world forwards. Some can confound findings by presenting a pattern with no causation and others can discover a relationship that damages the perception of a group by coming to the wrong conclusion. Kitchen discussed how data reveals that black Americans on average make less money compared to white Americans and live in segregated neighborhoods. Based on this alone, one may conclude that black Americans are lazy or undesirable to live near, but a human could easily dispel these false assumptions. A person understands the historical context and systemic issues that lead to the disparity found in the data. A framework and human involvement are needed for the proper application of data science findings. Finally, a person must be involved in the implementation of a solution. To address systemic inequalities, a person must be on hand to parse politics and motivate other people to work towards the same goal. Ultimately, data science provides people with the tools to resolve complex issues the world is facing, chiefly development challenges, but care must be taken. Data can not represent our only approach and we must be wary of abuse of power, however, I believe data science is the key to change in the future. Therefore, we may not be in the midst of a data science revolution, but data science will certainly be the catalyst for the next revolution.