Complex networks, data mining, causality, and beyond

Over the last few weeks Innaxis has published two papers that may be of interest to air transport researchers, among others.

The first paper is an extensive review on the combined use of complex network theory and data mining. Not only do complex network analysis and data mining share the same goal in general- that of extracting information from complex systems to ultimately create a new compact quantifiable representation- but they also often address similar problems as well. Despite these commonalities, a surprisingly low number of researchers take advantage of methodologies, as many conclude that these two fields are either largely redundant or totally antithetic. In this review, we challenge this perception, show how this state of affairs should be relegated to contingent rather than conceptual differences, and that these two fields can in fact advantageously be used in a synergistic manner. The review starts by presenting an overview of both fields, and by illustrating some of their fundamental concepts. A variety of contexts in which complex network theory and data mining have been used in a synergistic manner are then presented. Finally, all discussed concepts are illustrated with worked examples through a series of hands-on sections, which we hope will help the reader to put these ideas in practice. If you ever wonder how a real-world problem can be tackled by these two techniques, you should definitively read this review!

The second paper addresses the common misinterpretation of correlation vs causality. Following this idea, many causality metrics have been proposed in the literature, all sharing a same drawback: they are defined for time series. In other words, the system (or systems) under analysis should display a time evolution. Associating causality to the temporal domain is intuitive, due to the way the human brain incorporates time into our perception of causality; nevertheless, such association results in some rather important problems.

For instance, suppose one is trying to detect if there is a causality relation between the workload of an ATC controller and the appearance of loss of separation events. These events are only defined at one point in time. To illustrate, one can detect an instance of a loss of separation and check the corresponding workload; afterwards, perform the same actions for another event; and so forth. In the end, the researcher would get two vectors of features, which do not encode any temporal evolutions – in other words, consecutive values are not correlated. So, in this situation, how can we detect if a true causality (and not just a correlation) is present?

In this paper we propose a novel metric able to detect causality within static data sets, by analysing how extreme events in one element correspond to the appearance of extreme events in a second element- refer to the picture above for a graphical representation. The metric is able to detect non-linear causalities, to analyse both cross-sectional and longitudinal data sets, and to discriminate between real causalities and correlations caused by confounding factors.

If you are interested in these ideas, feel free to have a look at these two papers:

M. Zanin et al., Combining complex networks and data mining: why and how. Physics Reports (2016), pp. 1-44. http://authors.elsevier.com/a/1T3yF_8QfbYE-k. Also available at: http://arxiv.org/abs/1604.08816
M. Zanin, On causality of extreme events. PeerJ. Also available at: http://arxiv.org/abs/1601.07054

If you have questions about them, please contact M. Zanin at [email protected]

Finally, Seddik Belkoura is going to present a paper at the forthcoming ICRAT 2016, Philadelphia, about the use of the static causality metric to study delay propagation. You can find the paper on the official website of the conference (http://www.icrat.org/), and also by contacting him at [email protected].

Guardar

Complexity Science, Data Science