Machine learning models can be developed, deployed and applied using the AutoML module in the Thinkwise Platform.
To enable this feature, ensure Indicium is configured to support Automated Machine Learning. More info here.
The current selection of AutoML models can be trained in the Software Factory and can be used in a production environment. There are two types of problems for which AutoML models can be trained at the moment. The Thinkwise Platform will automatically choose the right type of model.
Classification problems focus on choosing a type, group or other predefined classification based on the provided input. Often, this is a domain with elements, such as a risk level or priority.
Regression problems focus on determining a numerical value based on the provided input. Think of numerical values such as total profit or response time.
There are more types of problems which will be resolvable with AutoML in future releases, such as forecasting.
Training a model
An AutoML model is based on a certain table, view or mqt. One of the columns can be used as the column to predict by the AutoML model. Other columns of this table can be used as the input that determines the outcome.
The trained AutoML model will be useable for any scenario that provides the required input. Predictions can also be done for different tables or ad-hoc input. The table used to create the AutoML model is merely used for the training data.
The data from this table will be used to train the model. Historical production data is ideal for training. If a subset of the records in the table is required, a view can be used to limit the training set.
The AutoML configuration screen
To succesfully set-up an AutoML configuration, one target and one or more predictors must be chosen. At any point in time during set-up, the Load training data task can be used to retrieve the content of the table which will be used to train the model.
Nominal, ordinal, quantitative and binary columns
Every potential target and predictor has a type-classification that determines how the AutoML engine interprets the values of this column. This is based on how the column is modeled.
- Binary data only has two options. Checkboxes, combo's with two options etc.
- Nominal data is simply a set of labels. Every value for this column can occur in one or more records. There is no ordering in the values and the value itself cannot be used in calculations.
- Ordinal data is an ordered set of labels. Columns using domains that sort on the order number of the elements are considered ordinal.
- Quantitative data is numerical data that can be used in calculations. Not all numerical data is quantitative. Domains with elements, primary keys, foreign keys and identities are not quantiative.
The trained AutoML model will never be aware of nominal and ordinal values outside of the training set. For this reason, it doesn't make much sense to have free-text fields or non-base data look-ups used as predictors. Re-training might also be required when updating the elements of a domain.
Queueing and training
Once one or more predictors and the target have been chosen, and training data has been loaded, training can be queued. You can use the Queue AutoML model for training task to queue training.
Once the configuration has changed status to Queued for training, no further modifications can be done.
Reviewing training results
Result models tabpage can be used to monitor the various types of models that will be included in the training process. It can take up to 30 seconds before any types of models are shown here.
The various types of models will automatically be queued for training. Once training starts, the AutoML configuration will change status to Training.
Types of models
When a type of model is done training, performance metrics will be shown to indicate the precision and accuracy of the trained model based on samples of the training data. The default sorting of this tabpage is set so that the best performing model will most commonly be shown on top after training has finished.
When all types of models have been trained, the AutoML configuration will change status from Training to Training finished.
Currently, only one model is trained in parallel. When multiple projects or branches are training AutoML models, it might take longer to see results.
Activating a trained model
One of the types of model that have been trained can be chosen as the active model for this AutoML configuration. Use the corresponding tasks at the top of the tabpage to select the desired model. Process actions can now use this AutoML configuration and will use the selected model for prediction execution.
The AutoML configuration is now Ready for use.
The selected model can be changed afterwards.
Running a prediction
The easiest way to perform a prediction is to queue them in the database and use a scheduled system flow to pick up the queued predictions one-by-one.
Run AutoML modelprocess action can only be used in scheduled process flows. Once process flows are available in Universal, this process action can also be used in non-scheduled process flows performed by a user using Universal.
Step 1: Create a scheduled process flow with only the
Run AutoML model connector. Select the table the AutoML model has been trained on and select the AutoML model. Have the Start flag to the process action, have the process action point to itself with success and to Stop with failure.
A simple flow using a process action to run an AutoML model prediction
Add a schedule to the process flow to have this process flow periodically run the AutoML model. Set the schedule to default if activation by an IAM administrator is not required.
Once an AutoML configuration is in use by a process action, the status will change from Ready for use to Active.
Step 2: Create process flow variables for the predictors, the target and the status code. Also create one or more process flow variables to store the ID of the queued item to predict.
Map the variables for the predictors, target and status codes to the process action.
Step 3: Create the process logic. Mark the process action to use process logic and create a template.
The template should consist of two parts. The first part is to process the results of the previous execution. The second part is used to load a new item to predict from the queue.
The first part should contain the following statements:
- When the status code is
-2, the AutoML service is not running. Inform the user accordingly.
- Save the result of the last item
The second part should contain the following statements:
- Load the id and the predictors for the next item from the queue
- Decide whether the process flow should continue
The template will look something like this:
-- Check the result of the last execution. if @status_code = -2 begin -- The AutoML service is not running. Clear the queue and inform the user. update prediction_queue set sale_price = -1, failed = 1 where sale_price is null; end else if @status_code = 0 begin -- Store the result of the previous prediction update prediction_queue set sale_price = coalesce(@sale_price, -1) where id = @id; end; -- Load primary key and the predictors for the next item to predict from the queue select top 1 @id = p.id, @above_grade_living_area = h.above_grade_living_area, @alley = h.alley, @basement_condition = h.basement_condition, @basement_exposure = h.basement_exposure, @basement_quality = h.basement_quality, ... from prediction_queue p join house h on h.id = p.id and p.sale_price is null; -- Start prediction if a new item was loaded from the queue. Stop the process flow if not. if @@ROWCOUNT = 0 begin set @automl_run_model_house_training_data_stop = 10; set @automl_run_model_house_training_data_automl_run_model_house_training_data = null; end else begin set @automl_run_model_house_training_data_stop = null; set @automl_run_model_house_training_data_automl_run_model_house_training_data = 10; end;
In this example, the AutoML process action is executed with empty predictors on the first run. The returned status code will be
-3. This can be ignored.
Step 4: Synchronize the model to IAM to activate the scheduled process flow.