INTERVIEW QUESTIONS FOR A JOB AS A MACHINE LEARNING ENGINEER PART 1
By Kamal Jacob
As the world is turning towards Artificial Intelligence (AI) for all sorts of decision making, both start-ups, as well as big tech giants, are recognizing the potential of Machine Learning. That’s the reason ML market size is growing and so is the demand for machine learning engineers. Companies are offering lucrative salaries to such skilled professionals. According to Payscale.com (2019 data), the median salary of Machine Learning engineer is around 7 lacs in India and $112,000 in the USA.
Whereas the demand is clearly understood, breaking into the field can be challenging. It requires intense Machine Learning training and good preparation for the interview. Here, in this 2-part series of Machine Learning interview questions-answers, we’ll be covering the most potential Machine Learning interview questions that will help you to crack interviews related to Machine Learning jobs.
So, let’s get started.
Q1) Explain different types of machine learning?
Supervised Learning: In supervised learning, the possible outcomes are already known, and the training data is also labeled with the correct answer. The main objective of such kind of ML algorithms is to learn an association between input data samples and corresponding output after carrying out multiple training data instances.
Unsupervised Learning: In unsupervised learning, we only have input data and there would be no corresponding output variables. For learning, the algorithm needs to discover interesting patterns in data. They are called unsupervised because unlike supervised learning no correct answer and no teacher is provided.
Semi-supervised Learning: It falls in between both supervised and unsupervised learning. Such kind of algorithms have a large amount of input data but only some of the data is labeled and the rest of the data is unlabeled.
Reinforcement Learning: The main concept behind reinforcement learning is, when exposed to an environment, the machine trains itself continually using trial and error. The machine mainly learns from past experiences and capture the best possible solution to a certain problem.
Q2) What are the stages to build a model in machine learning?
There are following three stages to build a model in machine learning:
a) Model building: This is the first stage. In this which we choose the suitable algorithm and train our model through training dataset.
b) Model testing: In this stage, we test the accuracy of our model through test dataset.
c) Applying the model: In this stage, we make the required changes in the tested model and apply the final model for real-time projects.
Q3) What is overfitting with respect to machine learning and how can you avoid it?
Overfitting, or high variance, occurs when a machine learning model learns the very specific pattern and noise from the training data to such an extent that it negatively impacts our model’s ability to generalize from our training data to unseen data.
Followings are some methods to avoid overfitting:
Regularization refers to a broad range of techniques that discourages learning a more complex or flexible model, to prevent overfitting. The basic idea behind regularization is to penalize complex models.
In Cross-validation, a powerful preventing measure against overfitting, we generate multiple mini train-test splits by using our initial training data and then use these splits to tune our model.
C. Early termination
It refers to stopping the training process before the learner passes a certain number of iterations after which the model’s ability to generalize starts weakening as it begins to overfit the training data.
When the testing error starts to increase, it’s time to terminate the training process.
Q4) How do you distinguish between L1 and L2 regularization?
Following table depicts the differences between L1 and L2 regularization:
It is performed by Ridge regression technique.
It is performed by Lasso regression technique.
It modifies the loss function by adding the penalty equivalent to the summation of the square of weights(coefficients).
It modifies the loss function by adding penalty equivalent to the summation of absolute values of weights(coefficients).
It is computationally efficient due to having analytical solutions.
It is computationally inefficient in non-sparse cases.
It provides non-sparse outputs.
It provides sparse outputs.
No feature selection, it includes all (or none) of the features in the model.
It performs feature selection.
Q5) What is the training and test set in machine learning? Based on the size of the training set, which classifier will you choose?
Training set in machine learning is the set of examples given to the model to analyze and learn. The machine learning model is initially fit on a training set having labeled data.
On the other hand, the test set is a dataset used to test the accuracy of the hypothesis generated by our machine learning model. Test data set is unlabeled data.
Usually, 70% of the entire dataset is taken as training dataset and the remaining 30% is taken as testing dataset.
When we have a small training set, a model having high bias and low variance will perform better because such models are less likely to overfit. Ex. Naïve Bayes classifier.
In contrast, when we have a large training set, a model having low bias and high variance will perform better because such models work fine with complex relationships. Ex. Decision Tree classifier.
Q6) What is bias-variance tradeoff? Explain.
Bias is an error, due to over-simplistic assumptions, in learning algorithm that we are using to fit our machine learning model. it is the difference between the average prediction of our machine learning model and the expected correct value we are trying to predict. A model with high bias pays very little attention to training data and lead to the model underfitting our data.
Variance is also an error but it is due to too much complexity in the learning algorithm. It is the variability of model prediction for a given data point. A model with high variance plays a lot of attention to training data but does not generalize on unseen data and leads to the model overfitting our data.
There is no escaping the relationship between these two concerns, bias and variance because increasing the bias will decrease the variance and vice-versa.
That’s why there is a tradeoff at play between these two. The learning algorithms we choose and the way we choose to configure them are finding different balances in a tradeoff for our problem. We don’t want either high bias or high variance in our ML model.
Q7) With respect to machine learning, can you explain the confusion matrix along with its associated terms?
A confusion matrix is a table used to measure the performance of a classifier (Classification model) where the output can be of two or more types of classes. A confusion matrix, also called the error matrix, is nothing but a table having two dimensions namely “Actual” and “Predicted”. Both dimensions have the following:
a) True Positives (TP) are those cases when both actual and predicted the class of data point is 1 means observation is positive and is predicted to be positive also.
b) True Negatives (TN) are those cases when both actual and predicted the class of data point is 0 means observation is negative and is predicted to be negative also.
c) False Positives (FP) are those cases when the actual class of the data point is 0 and the predicted the class of data point is 1 means observation is negative and is predicted to be positive.
d) False Negatives (FN) are those cases when the actual class of the data point is 1 and the predicted class of data point is 0 means observation is positive and is predicted to be negative.
We can derive the following important measures from a confusion matrix:
A. Accuracy/Classification Rate: It is the number of correct predictions made by our machine learning model.
B. Precision: It tells us about when the ML model predicts yes, how often is it correct.
C. Recall or Sensitivity: It is the number of positives returned by the model.
D. Specificity: It is the number of negatives returned by the model.
Q8) How do you distinguish between Type I and Type II error?
A Type I error or error of the first kind is a false positive. In simple words, this kind of error leads to the conclusion that something has happened when it hasn’t. For example, a fire alarm going on indicating a fire when in fact there is no fire, or telling a man he is pregnant but it can’t happen.
On the other hand, A Type II error or error of the second type is a false negative. In simple words, this kind of error leads to the conclusion that nothing is happening when in fact something is. For example, a fire breaking out and the fire alarm does not ring, or telling a pregnant woman she isn’t carrying a baby.
Q9) How the K-Nearest Neighbor (KNN) algorithm is different from K-means algorithms?
K-Nearest Neighbor (KNN) and K-Means Clustering (K-Means) algorithms are often confused with each other. Followings are some differences between these two popular machine learning techniques:
K-Nearest Neighbors (KNN)
It is unsupervised in nature.
It is supervised in nature.
K-Means is used for Clustering.
KNN is mostly used for Classification, and sometimes for Regression also.
Here, ‘K’ represents the number of clusters the algorithm should split the data into.
Here ‘K’ represents the number of nearest neighbors used for comparison.
It partitions a dataset into clusters such that
It tries to classify an unlabeled observation based on K (number of nearest neighbors).
In the training phase of K-Means, the K observations are arbitrarily selected.
It involves minimal training and doesn’t have a training phase as such. That’s why it is also known as lazy learners.
Q10) What do you mean by Curse of Dimensionality? How we can deal with it?
The curse of dimensionality means that the training data has way more features or dimensions, but the dataset does not have enough samples for an ML model to learn correctly from so many features. For example, we have a training dataset of 50 samples with say 100 features, it would be very hard for the model to learn because the model will find random relations between the features and the target.
Followings are some ways to deal with the curse of dimensionality:
a) Feature Selection: Rather than using all the features to train our ML model, we can use a smaller subset of features to train it.
b) Dimensionality reduction: Principal component analysis (PCA), Factor analysis, Independent Component Analysis are some of the techniques we can use to reduce the dimensionality.
c) L1 Regularization: As this regularization technique provides sparse solutions, it helps to deal with high-dimensionality input.
And it’s a wrap of Part-I! Want to build a successful career & upskill yourself in Machine Learning? Have a look at Manipal’s Artificial Intelligence & Machine Learning course here!