# Route clustering: reducing complexity not alternatives

Written by Innaxis on . Posted in Domino.

**AUTHOR: Tanja Bolic**

The objective of Domino is to study the impact of introduction of new solutions in the ATM system, using the Agent Based Modelling platform. An important part of the platform is the input data for the modelling, especially when it comes to the choice of routes. Domino allows the model to run using a pool of routes between the different origin and destinations. In order to provide the recent information on route choices to Domino platform, which are varied, but where the sets are limited, we performed the route clustering taking into account two 28 -day periods from 2017, namely AIRACs 1702 (February, winter season) and 1709 (September, summer season). These two AIRACs are selected to account for possible seasonal differences in the trajectory choices.

The two AIRACs contained 29 460 Origin-Destination (OD) pairs, with 1 284 560 flights being eligible for route clustering. These flights were further filtered to exclude:

· Military flights, as these are not scheduled flights

· Flights with origin or destination airports being “ZZZZ” or “AFIL”, as these indicate incomplete flight data thus obtaining a data set for clustering.

Route clustering was performed using the DBSCAN algorithm, which groups elements that are in close proximity, i.e., elements in a ε-neighbourhood and surrounded by a minimum number of neighbours. The algorithm requires two parameters: the maximum radius of the neighbourhood ε and the minimum number of elements *m* required for a cluster. The DBSCAN does not require to be initialized with the number of clusters to create, but it autonomously finds the number of clusters suitable for the problem. This property fits our scenario since we cannot estimate the correct number of typical trajectories apriori. We set maximum radius of the neighbourhood ε = 0.3 (which corresponds to 30km) and minimum number of elements *m = 1* as parameters of the DBSCAN algorithm. In order to run the algorithm, the trajectories of the flights in the data set were transformed into a geometry format, to speed up the clustering algorithm. For each OD pair, the Hausdorff distance[1] was calculated between all the trajectories belonging to the pair, which is then used by the DBSCAN algorithm to perform the clustering. In practice, the algorithm with these parameters places the trajectories for which the Hausdorff distance is less than 30 km in the same cluster, or different cluster if it is more than 30 KM.

The clustering decreased the number of viable routes between the OD pairs. The result of the clustering is set of routes for each OD pair, expressed as the total flight distance, and the entry and exit points in different airspace elements. From the results it can be seen that for a good portion of OD pairs, chosen trajectories are usually the same: 29% have one cluster, and 18% have two clusters. Furthermore, the number of OD pairs decreases with the increase of the number of clusters.

Most of the OD pairs with only one cluster are the ones with short flights. The figure below shows the resulting cluster for the EBBR-LFRS OD pair. There were 105 flights flown in the two AIRACs used in the analysis, and all of them flew along the same route. The flight along this route takes about an hour and 15 minutes (~350 NM).

For the longer flights, the situation is different – the number of possible clusters increases with the distance between the OD pairs. Figure below depicts the resulting seven clusters for EGSS-LGAV pair that contained 96 flights in the two AIRACs.