Infrastructure needed for Aviation Data Analytics
Author: Jens Krueger
Safety is key in aviation. To reach maximum safety, stakeholders are collecting a large amount of data for analytics. Ultimately, researchers want to not only evaluate the causal dependencies of safety critical events, but to also enhance operational efficiency.
Presently, such data is stored in isolated data silos. The goal of SafeClouds.eu is twofold: advance data-driven analytics for safety and efficiency and manipulate data outside of the silos to enable data sharing and merging between different stakeholders, including data owners. However, the infrastructure must ensure that personal or confidential data is not leaked to third parties; all while maintaining data sharing capabilities.
In order to address the requirements for data protection and analysis, the SafeClouds.eu infrastructure must enable the following data analysis paradigms:
- Fusion of identified confidential data streams into a single de-identified data stream. Identified data is data that contains information that could be used to directly or indirectly (e.g. via linking attacks) expose personal data linked to a specific group of people or individuals.
- Access to the de-identified data streams for SafeClouds.eu data analysis.
- Information sharing of the analysis of restricted and confidential data from aviation stakeholders (airlines, ANSPs) for blind benchmarking.
- Access governance should be in place, such specifics on data access (i.e. should be continuously monitored) and limitations.
The infrastructure architecture must reflect data protection requirements in order to guarantee the different data confidentiality levels. The physically-independent components are as follows:
The local system sits at the premises of the participating companies (e.g. airlines and ANSPs) and stores raw datasets from different source systems. The data leverages other sources to comprise a 360-scenario dataset with enhanced informational context and processing. The global cloud system should provide such datasets. Finally, the dataset is de-identified and made accessible. Authorised third parties are allowed access only for data management and administrative tasks.
Dedicated private cloud:
Each participating party will be provided with a private segment of the cloud infrastructure that is logically and physically independent. It is used for de-identified data storage and analytics. Data scientists from SafeClouds.eu official partners will have access to the de-identified data under the data protection agreements.
Global cloud system:
The global cloud system is divided into two parts. The global storage will hold all open datasets (Meteo, ADS-B, SWIM, Radar). It will also ensure dataset quality and accessibility through pre-processing. In addition, it will grant access from the local systems and the dedicated private cloud. Note that the global processing infrastructure performs analytics on joint datasets from all dedicated private clouds.
Figure 1: Hierarchical architecture of the SafeClouds.eu infrastructure
The SafeClouds.eu Cloud Infrastructure
The SafeClouds.eu cloud infrastructure is built on Amazon Web Services (AWS). One of the main advantages of AWS is that it consists of several datacenters located around the world. This enables SafeClouds.eu to reduce communication latencies by choosing the most appropriate datacenter locations. For example, each AWS datacenter is located within a region. Then, each region has several datacenters, or Availability Zones. Each Availability Zone is attached to a different part of the power grid, to mitigate a case of potential power outage damanage. Any distributed cloud application running in AWS must consider the tradeoff between fault-tolerance by placing nodes in different Availability Zones with keeping computational resources as close together as possible to enhance performance.
For SafeClouds.eu, AWS enables the infrastructure to horizontally scale with an increasing number of stakeholders or increased processing or storage requirements.
To ensure security AWS Identitiy and Access Management (IAM) as well as virtual private clouds (VPC) and encryption for data in motion and at rest is used.
The SafeClouds.eu infrastructure enables data protection, data sharing and flexibility. Data safety and security is key to gain trust from data providers; without it the overall project is at risk for success. This blog post stresses the importance of a distributed and secure infrastructure and gives a first look into how the overall infrastructure architecture is designed. However, alhough the base infrastructure technology supports scalability, security, and other factors, the most important challenge is to leverage and implement those technological capabilities. One of the main security threads is human failure, bugs, and wrong implementations. To account for user error, the infrastructure must be as automated as possible along with clearly defined and deterministic processes. In addition, each entry point must be defined and encapsulated while keeping accessibility and usability. SafeClouds.edu will be using this precise infrastructure for aviation data analytics, and will share those findings with the aviation and data science communities.