CACTUS
Problem
Building a 360-degree view of a personalised health path requires the capability to ingest different sources of data, deal with missing data, use small data sets, and integrate continuous and categorical data in the same model to build continuous learning.

Solution
Internal functioning of CACTUS}.
Each continuous feature in the dataset is partitioned into flips representing high (_U) or low (_D). Otherwise, it is kept unaltered. The threshold to partition the continuous features is found using a receiver operating characteristic (ROC) curve. Two populations are required to build the ROC curve and find the most appropriate threshold to abstract the features. These are created by partitioning the available AMD classes (e.g., by considering the healthy individuals as one population and all the AMD groups as another) into two groups. In previous versions, the user had to select a universal threshold to divide the classes, which required prior domain knowledge and multiple comparisons. Still, CACTUS has since been improved to select the most appropriate threshold for each feature, which is a more sensitive approach to consider the uniqueness of each feature in distinguishing between a particular set of classes. Discrete values allow building a knowledge graph representing each class by linking the elements by their conditional probability. The centralities computed on these represent a Significance profile. Thus, the classification process compares each individual against the available representations to assign the most similar one. The classification is then applied to all the available individuals to assess the performance. The significance of each flip in each class is analysed to identify the elements driving the classification and provide an explanation for the behaviour of the model.

