Big Data Complexity: a Tangle of Connections

Big Data Complexity: a Tangle of Connections

New digital devices generate vast and seemingly incomprehensible amounts of data, and data analysts try to whittle them down to a digestible understanding. This dynamic is the now known by the infamous tagline: “Big Data”. New data sources are indeed complex to study because they are the consequences of interactions between users, companies and devices creating a complex series of interactions and networks.

This type of phenomenon, where actors are interacting in an entanglement of causes and consequences, has been studied by Edgar Morin, whose book Science with Conscience (Science avec Conscience, Fayard, 1982) offers readers the tools for understanding such complex systems. And even at more than 30 years old, the concepts he outlines hold truer than ever.

The world is a tangle of connections…                                                  

In this book, Edgar Morin outlines what he calls “Complex Thought” – in other words, the various ways though which analysts, economists and philosophers alike, look at and make sense of complex realities. He teaches us that rather than trying to oversimplify or get around complexity as if it were an obstacle, one must accept complexity for all the richness that it affords.

We do indeed live in a complex world: one where technology allows each and every one of us to add our own cog to the engine. With user Generated Content – like a restaurant review, an uploaded video, a shared article or simply a like on a business’ social media page – each of us are contributing to an avalanche of data and ushering in a revolution that will be both economic and societal. Big Data therefore becomes the ideal conceptual tool for analyzing a world where individuals interact through a tangle of connections.

Philosophically speaking, this is what Edgar Morin calls the principle of self-eco-organization: a system where creation and destruction meet, and where the principle of emergence is essential. Sites and content disappear as quickly as they went viral, and the prominence of one piece of content rather than another is more due to a butterfly effect than any kind of centralized control.

…where the individual is indistinguishable from the whole

According to Edgar Morin, complex systems must be analyzed as a whole: "It is impossible to know the parts without knowing the whole." In other words, the user and his or her network are one and no longer separate. It is impossible to understand one without understanding the other.

Two constraints make complex systems notoriously difficult to analyze. On the one hand, when we want to analyze the system as a whole, we face a technological impossibility as we cannot treat such large volumes. On the other hand, if we limit ourselves by reducing a sum of data to simple statistics, we fail to grasp the reality of the system in all its richness and complexity.

At least for now, no computer has the technical capacity to handle all of today’s user-generated data. In fact, according to the CNRS, Twitter and Facebook alone generate 17 terabytes of information every day, and this already huge volume is increasing all the time! When appliances, telephones, biometric meters, scientific instruments and more are generating thousands of pieces of data every second, treating everything that we have at our disposal is almost unthinkable.

On the other hand, conventional statistical methods, based on population samples, are rarely appropriate in the context of the Internet. Plus, network samples are very complicated because they need to take into account not only individuals, but also the connections between them. Not to mention the fact that the complexity of the Internet – where users can express their individuality in multiple ways – makes it almost impossible to take a truly representative sample. As Edgar Morin would say, "the principle of identity is complex. It includes the heterogeneity and plurality in one unity." This is the Unitas Multiplex principle: individuals have poly-identity characters.

For example, an airline looking to understand the behavior of its potential customers might look at Paul, a young consultant in an international firm who regularly travels business class between Asia and France. But Paul is also the father of two children who is working on putting aside a nest egg for his family. So when he goes on holiday, he is careful to look for the cheapest flight. Paul could be a potential client for this airline, but his behavior – sometimes investing in business class tickets and other times looking for the cheapest fare – would seem erratic.

His behavior is however clearer as soon as we start studying his social media activity.  On his family network (e.g. Facebook) we see that he is travelling with his family. Thanks to his professional network (e.g. LinkedIn), we will be able to see that he is travelling for professional reasons.

With this example, it’s clear that by studying the individuals and the networks as a whole, we have a better picture of the reality.  

2.0 dialogue creates a causality loop

Within the paradigm of complexity, Edgar Morin talks about the paradigmatic split between subject and object. A classic approach would be to think that a subject observes an object in order to have an impact on the object. But Edgar Morin emphasizes that because the latter also knows it is being observed, both must appear to be linked. Subject and object have a causal relationship but cannot be understood unilaterally: this causality "loop" is what Morin calls a Dialogic Loop.

For example, an analyst is trying to understand a PC tablet manufacturer’s pricing policy. If they announce the launch of a new version of said tablet, clients will be more likely to wait for the release of this new version to determine which product they’re going to buy. Therefore, since the analyst can anticipate a decrease in sales over the short term, they’re likely to suggest a lower price point for the older product.

This phenomenon, well known in econometrics, often leads to counter-intuitive correlations. In the case cited above, one sees a positive relationship between price and demand: the new product is more expensive than the former but sells better. Should we jump to the conclusion that the more expensive product is more attractive?

We observe a "dialogic loop" as clients anticipate the actions of companies and act accordingly. In turn, analysts can anticipate the market reactions and also act accordingly. In reality, there is no subject and object. This is a perfect example of Edgar Morin’s Ecology Action principle: when individuals or groups enter into complex interactions, the logic of said interactions is sometimes reversed.

Analyst in a complex world? Have a holistic view!

From a methodological point of view, modern techniques help us to understand observed complexities. Network analysis techniques can identify the role of "nodes" (individuals) within the network not as isolated elements but as parts of a whole. This is Edgar Morin’s paradigm of complexity put into action. Bayesian hierarchical models can treat the poly-identity character of individuals (Unitax Multiplex) taking account of variations within the same individual’s behavior. Finally, economic models like Structural Models can reflect the anticipations of actors and their inter-connectivity (Dialogic Loop.)

Edgar Morin’s philosophy reinforces the importance of a multidisciplinary approach: understanding the complex world in which we live requires the collaboration of engineers, economists, psychologists, sociologists and the various skills that they bring to the table. The challenges of tomorrow should not be analyzed in terms of a single discipline at a time, but by taking a holistic view. 

This article is the fruit of a collaboration between the Edgar Morin Complexity Chair, and the Accenture Chair in Strategic Business Analytics.

ESSEC Knowledge on X