Select Page

Data Engineering and MLOps specialist: Streamlining EDW & Data Pipelines for ML & AI products.

Azure machine learning studio provides an easy-to-use interface for data scientists and developers to build train and productionise machine learning models. Another major benefit it provides is the ease of collaboration and

In this article, we will explore how to solve a machine learning problem with Azure Machine Learning Designer.

Defining the Problem

To solve the problem via Azure ML Studio. We need to do the following steps

  1. Create a Pipeline,
  2. Set pipeline’s compute target.
  3. Importing Data
  4. Transforming Data
  5. Train the Model
  6. Testing the Model
  7. Evaluate the Model

 

Creating pipeline using ML Designer

Azure machine learning pipelines are workflows of executable steps that enable users to complete Machine Learning workflows. Executable steps in azure pipelines include data import, transformation, feature engineering, model training, model optimisation, deployment etc.

There are 3 ways of creating pipelines in Azure Machine learning Studio

  1. Using Code ( Python SDK )
  2. Using Auto ML
  3. Using ML Designer.
  1. When we login to Azure ML Studio , we see the following options.

  1. Click on Designer (Start Now) to create a new Pipeline.

By default it’s given a name based on today’s date. I have changed it to Automobile Prediction.

Setting Compute Target

A compute target is an instance of Azure virtual machine which will be used to provide processing power for our pipeline execution.

Default compute target will be used for entire pipeline, we can also use separate compute targets for individual steps of execution.

I created a compute instance earlier, so I can select existing

Importing Data

We can import the data from several sources. For this article, I will use sample datasets provided by Azure.

To explore what is in the data set, we can click on the data set and go to preview data.

This provides us a sneak-preview of what’s included in the data. E.g. there are 205 rows, 26 columns. Clicking on each column provides key statistics about data in that specific column. E.g. If I click on length column, I get a histogram showing the frequency of length values and various other statistics about it.

Transform Data

The data preview feature is helpful in understanding the columns and transforming data for any characteristics necessary to run our model.

Exclude a column

The data transformation section on left side menu provides several commonly used data transformation operations.

I want to remove the column normalized-losses

So, I can drag and drop “Select Columns in Dataset”

When I go to the details of “Select Columns in Dataset” , I can then select all columns other than normalized losses

Clean Missing Data

After removing the normalized-losses columns, our data still has empty values. To clean the missing data. I can use Clean Missing Data module from left side menu. So that our worspace looks like

Going into details of Cleaning missing data, I can select Alll columns.

Training the Model

I want to divide my dataset rows into

  1. Training rows (training dataset)
  2. Testing rows (testing dataset)

I can use Split Data Module from left side menu, so that my pipeline will now look like :

The Split Data module has 2 outputs. The left outputwill connect to Train Model and Right output will connect to Test Data.

In the details of Split Data module, I can choose the ratio with which I want to split the data across training and testing.

In summary, we cleaned the dataset and then divided it into two separate one as shown in the image below.

Training the Model

To train any model, we need to have.

  1. The model
  2. The data on which model is to be trained.

In our case, we want to predict automobile prices using Linear Regression Model. The training data for Linear Regression Model is the one which we have split from our overall data.

Therefore, we can train our model by combination of Linear Regression module and Train Model module from left side menu.

The Train Model module requires a label to train the model for. A label in this case is independent variable? with the help of which we can find the dependent variable.

[From y = mx + c , a label is X ]

Testing the Model

In Split Data step above, we used only 60 % of our data for training the model. We left remaining 40% of the data for testing. We can setup the testing now by using Score Model module.

Score Model module will need two input.

  1. What needs to be tested (output of our trained model)
  2. With what to test (test data from split)

This will look like:

Evaluating the Model

Now we want to see how our model scored when compared against the test dataset. We can use Evaluate Model module and connect Score Model module to it.

This will finish our pipeline creation. We now need to submit it.

Pipeline Submission

Pipeline submission will create an experiment name and compute target.

Understanding the Predictions

Azure ML takes a bit of time especially if the experiment is run for the first time. You can view the progress by looking at the “Running” status (1) or by looking at what’s the status of each individual module ( 2, 3).

Once the model finishes it run, right-click on Score Model and select Visualize > Scored dataset

In the Scored Labels column, you can see the predicted prices.

Understanding Models efficiency

We can use Evaluate Model to see the efficiency of the trained mode.

Right click Evaluate Model -> Visualize -> Evaluation Results

The following statistics are available.

  1. Mean Absolute Error
  2. Root Mean Squared Error
  3. Relative Absolute Error
  4. Relative Squared Error
  5. Coefficient Determination