In supervised classification, you learn on one table, including the outcome feature. This means, in particular, that only one line per entity to analyse or to predict is possible.
Many features can be described in one line, the name of a customer, his age, if he already bought a type of product... That's why when you press the GET INSIGHTS button, the first model fits only the central table data. And gives a first bunch of insights.
Here follows a preview of the central table, used in the Step by step articles.
But so many raw data, easily linkable to the entity to analyse/predict (here, the customer), are discarded because it does not fit the "one line constraint". For example, all the customers reactions to emails sent by the company.
We see that the first client of the central table, identified by the KEY 0000168fc71e2e... , named Lawrence Ross, has been sent several emails. Each event is listed in a line, and shortly described.
The usual way to integrate information from peripheral tables, such as this one, is to calculate aggregates from its features. This means giving a kind of sum up of the feature for each entity to be analysed and/or predicted. For example, the most frequent value of campaign_label for each client, or the sum of nb_of_days_since_event for each client... This is a way to add a feature, i.e. one value per line, per customer in the central table.
PredicSis.ai automates this discovery phase of relevant aggregates. The user just have to ask a NUMBER of smart aggregates, and watch the performance of the successive models increasing.
Please download here this dataset (≈95MB).
And start again the step by step workflow, with several tables this time.
Suggestion of articles to read:
2) Get insights