Jul 27

INTERVIEW QUESTIONS FOR A JOB AS A MACHINE LEARNING ENGINEER PART 2

By Kamal Jacob

Welcome to the Part-II of the machine learning interview question-answer series. We hope the Part-I of this series was helpful to you.

Let’s get started.

**Q1)How do you distinguish between Stochastic gradient descent (SGD) and gradient descent (GD)? Also, explain when you will use GD over SGD and vice-versa?**

Although both gradient descent (GD) and Stochastic gradient descent (SGD) are optimization algorithms used to find the values of parameters/coefficients of function that minimizes a cost function, there is a significant difference between them.

The difference is that in gradient descent we will evaluate all the training samples for each set of parameters while in stochastic gradient descent we will evaluate only 1 training sample for the set of parameters.

When we have a large amount of data, GD can be slow because one iteration of the gradient descent algorithm requires a prediction for each instance in the training dataset. On the other hand, SGD can be used for large datasets because the update to the coefficients is performed for each training instance, rather than at the end of instances.

That’s the reason, GD is preferable for small datasets while SGD is preferable for a larger one.

Image Source: https://icons8.com/ouch/

**Q2)What do you mean by imbalanced dataset and how would you handle it?**

An imbalanced dataset, relevant primarily in the context of supervised machine learning involving two or more classes, is one that has a different proportion of target categories.

For example, in a 1000 observation dataset, from which we have to detect fraud, we have only 20 Fraudulent Observations and 980 Non-Fraudulent Observation, means only 2% of the observations with fraudulent data.

Followings are some useful options to handle imbalanced datasets:

**I. Change the evaluation metrics**

In the example given above, if we had a model that always made negative predictions, it would achieve an accuracy of 98%, but obviously, this model won’t provide any valuable information for us. There are other alternative evaluation metrics such as Precision, Recall and F1 score, which can be applied.

**II. Change the algorithm**

While working with ML problem, it’s a good rule of thumb to try a variety of algorithms. It can also be beneficial with imbalanced datasets. Decision tree algorithm performs well on imbalanced data.

**III. Over-sampling minority class**

It can be defined as adding more observations of rare samples. It is especially used when the quantity of data is insufficient.

**IV. Under-sampling majority class**

It can be defined as removing some observations of abundant class. It is especially used when the quantity of data is enough.

Image Source: https://icons8.com/ouch/

**Q3)When will you choose classification over regression?**

Classification and regression both are related to prediction, where classification produces discrete values and predicts the ‘belonging’ to the class, whereas regression predicts a value from a continuous set that allows us to better distinguish between individual points.

For example, the prediction of the price of a house can be some numerical value, depending on saying ‘size’ and ‘location’, or can be in words like ‘costly’, ‘very costly’, ‘cheap’, ‘affordable’. The prediction in numerical value relates to a regression while the prediction in words relates to classification.

We can use classification over regression if we want our result to reflect the belongingness of data points in our dataset to certain explicit categories.

Image Source: https://icons8.com/ouch/

**Q4)Why Naïve Bayes classifier is naïve?**

Naïve Bayes is called Naïve because of the conditional independence assumption it makes. It assumes that X|Y is normally distributed with zero covariance between any of the components of X. In simple words, the features/attributes/predictors that are going into the model are not related to each other i.e. change in one variable will not affect the other variable directly.

Since this assumption is virtually impossible to see in the real world, we refer Naïve Bayes classifier as naïve.

**Q5)What’s the general principle of an ensemble model? Can you also explain bagging and boosting methods?**

**General principle:**

In order to provide robust prediction as compared to prediction done by individual rules, the ensemble model combines the prediction of multiple ‘individual’ models built with a given learning algorithm.

**Bagging Method:**

This ensemble method is also called Bootstrap Aggregating method. It works in the following steps:

a) First, we will create random samples of the training data set.

b) Next, we will build a classifier for each of the samples.

c) At last, the results of these multiple classifiers will be combined. We can use average or majority voting to combine them.

Bagging method reduces the variance error.

**Boosting Method:**

It provides sequential learning of the predictors i.e. each model that runs, dictates what features the next model will focus on. The first predictor will learn on the whole data set. Then onwards, the following predictors will learn on the training set based on the performance of the previous one.

Boosting perform better than bagging method in reducing variance error, but it also tends to overfit as well.

Image Source: https://icons8.com/ouch/

**Q6)Please explain the working of the SVM (Support Vector Machine) algorithm.**

SVM is a flexible supervised machine learning algorithm which is used for both classification and regression. The main goal of SVM is to divide the training dataset into classes to find an MMH (Maximum Marginal Hyperplane).

Following two steps briefly explain the working of SVM:

Step1: It will generate hyperplanes iteratively that segregates the classes in the best way.

Step2: Next, it will choose the hyperplane (called MMH) that separates these classes correctly.

**Q7)What are the support vectors and margin in SVM?**

Support vectors are the data points that are closest to the hyperplane, whereas margin is the gap between two lines on the closet data points of different classes. Margin can be calculated as the perpendicular distance from the line to the support vectors.

**Q****8) What is pruning and how to do it in decision trees?**

Pruning is a process to remove those parts of the Decision Tree which adds no or very little to the classification power of the tree. In other words, it is the process of adjusting the Decision Tree to minimize “misclassification error”.

Pruning reduces the complexity of the final classifier. It also improves predictive accuracy by the reduction of overfitting.

There are two techniques for pruning

**a) Post-pruning**

It is also known as backward pruning. In post-pruning technique, we first generate the complete tree and then adjust it with the aim of improving the classification accuracy on unseen instances.** Reduced Error Pruning, Cost-based pruning, Error complexity pruning,** are some of the most popular algorithms used for post-pruning.

**b) Pre-pruning**

It is also known as forwarding pruning. It prevents the generation of non-significant branches in the decision tree by using a ‘termination condition’. Based on this condition, the branches of the tree will be terminated prematurely as the tree is generated. **Chi-square pruning, the minimum number of objects pruning,** are some of the most popular algorithms used for post-pruning.

Image Source: https://icons8.com/ouch/

**Q9)Briefly explain clustering and its importance in machine learning.**

Clustering, one of the most useful machine learning methods,

first, find the similarity and relationship patterns among data samples and then cluster these samples into the groups having similarity based on features.

The importance of clustering lies in the fact that it determines the intrinsic grouping among the present unlabeled data.

**Q10) What is LOGLOSS?**

LOGLOSS, also called Logistic regression loss or cross-entropy loss, is a performance metric for classification problems. It measures the performance of a classification model where the input is a probability value between 0 and 1. We can get a better understanding of it by comparing it with accuracy as follows:

a) Accuracy is the count of predictions in our machine learning models.

b) LOGLOSS is the amount of uncertainty of our prediction based on how much the predicted value varies from the actual value.

And it’s a wrap of Part-II! Want to build a successful career & upskill yourself in Machine Learning? Have a look at Manipal’s Artificial Intelligence & Machine Learning course here!