15 Machine Learning Interview Questions (with Answers) for Data Scientists

Data science is a progressive field that deals with handling large chunks of data that normal software fails to do. Although machine learning is a vast field in itself, machine learning interview questions are a common occurrence in a job interview of a data scientist. Some very basic data scientist interview questions deal with various aspects of it, including Statistics and programming. Over here, we will focus on the machine learning part of data science.

Machine Learning Interview Questions

1. Differentiate between supervised learning and unsupervised learning

These are some notable differences between the two:

Supervise Learning Unsupervised Learning
Trained on labeled dataset Trained on unlabeled dataset
Algorithms used: regression and classification Algorithms used: clustering, association and density estimation
Suited for predictions Suited for analysis
Maps input to the known output labels Finds hidden patterns and discovers the output

2. Define logistic regression with example

Also known as the Logit model, it’s used for predicting a binary outcome from predictor variables having a linear combination. For instance, predicting a politician’s victory or defeat in an election is binary. The predictor variables would be time spent in the camp and total money used for the camp.

3. How do classification machine learning techniques and regression differ?

These are the key differences

Classification Regression
Target variables can have discrete values Target variables can have continuous values, usually real numbers
Evaluated by measuring accuracy Evaluated by measuring root mean square error

4. What is meant by collaborative filtering?

These are the steps taken in an analytics project:-

1. Comprehending the business problems.
2. Transforming the variables, outlier detection for Data preparation for modeling, checking missing values.
3. Analyzing the outcome, using tweaked approaches after running the model, this is done for achieving a good outcome.
4. Validation of the model via a few data sets. Further, implementing the model and analyzing its performance over a specific duration.

6. Explain in brief a few types of ensemble learning

There are several types of ensemble learning, below are some of the more common types.

Boosting

An iterative technique that helps in weight adjustment of a particular observation based on previous classification. In case, the classification is incorrect, then observation weight is increased. This helps in building reliable predictive models, as it reduces the bias error, but there’s also a possibility of overfitting into the training data.

Bagging

It attempts to implement learners on a particular sample bunch, further taking a mean of the productions. One can implement other learners on varying bunches in generalized bagging, this prevents some of the variance errors.

7. Describe box-cox transformation

In a regression analysis, the dependent variable might not be able to satisfy ordinary least square regression assumptions. The residuals could be following the distribution (skewed) or curve, in case the prediction increases. In such scenarios, the transformation of response variable becomes a necessity in order for data to meet specific assumptions.

The box-cox transformation relates to Statistical techniques for transforming dependent, non-normal variables to a conventional shape. When the available data is unconventional, then many statistical techniques assume normality.

Numerous tests can be run when box-cox transformation is applied, it’s a method for transforming unconventional, dependent variables into a more conventional shape.

8. What’s Random Forest, and how does it function? Also, explain it’s working.

It’s a versatile method for machine learning that can do classification and regression both. It gets used in outlier values, dimensionality reduction, treating missing values. It’s a kind of ensemble learning method, wherein clusters of weak models integrate to build a powerful model.

Numerous decision trees are created instead of a single tree in a random forest. For classifying new attribute-based objects, every tree provides a classification, and the one that has maximum votes (total trees in the forest) gets selected by the forest, as for regression average output of varying trees gets considered.

Working of Random Forest

This technique’s main principle is that various weak learners combine to make a strong learner. The steps include:-

9. If you were to train a model using 10 GB of data set and had only 4 GB RAM, then how would you approach this problem?

To start, it’s best to ask about the type of ML model that requires training.

For SVM (partial fit will suit best)

Follow these steps

1. Start by division of a large data set into smaller size sets.
2. Implement SVM’s partial fit method, it will need the full data set’s subset.
3. Repeat the second step for different subsets.

For neural networks (NumPy array plus batch size will do)

Follow these measures

In NumPy array, load the full data, NumPy array has a tendency to make mapping of the full data set. It doesn’t load into the memory, the full data set.

For attaining required data, pass index into the NumPy array.

Make use of this data for passing to neural networks. Maintain a smaller batch size.

10. In an analysis, how do missing values get treated?

Once the variables having missing values get identified, the extent of the values that are absent also gets discovered. In case any patterns are picked, it becomes necessary for the analyst to pay attention as these could bring about a couple of significant and valuable business-related insights.

And, if no patterns are discovered, then median or mean values can take place of the missing values, or it can be ignored. A default value can be allotted as maximum, minimum, or mean value. In case, the variable is categorical, the default value is assigned to the missing value.

If data distribution is incoming, then a mean value is assigned for normal distribution. Also, if a variable’s 80% values seem missing, then it’s reasonable to drop the variable than treat the missing values.

11. How to treat outlier values?

For detection of outlier values, some graphical analysis or univariate method can be used. If the outliers are large, then either the 1st percentile value or the 99th percentile can replace the values. Also if the outliers are fewer, then the individual assessment can be done.

It should be noted that all outlier values are not necessarily extreme values. For treating outlier values, the values can either be modified and brought within range or they can be discarded.

12. Which cross-validation technique can be used on a time-series dataset?

Rather than Implementing the K-Fold technique, one should know that time-series have an inherent chronological order, and is not some randomly distributed data. As far as time-series data is concerned, one can implement the forward-chaining technique, where one has to model previous data, then consider data that is forward-facing.

fold 1: training[1], test[2]

fold 1: training[1 2], test[3]

fold 1: training[1 2 3], test[4]

fold 1: training[1 2 3 4], test[5]

13. How often an algorithm requires updating?

If these requirements call, then it’s suitable for an algorithm to be updated:-

1. Model evolution is a must as data runs through infrastructure.
2. Data source (underlying) is not constant.
3. A non-stationary case shows up.
4. Results don’t have good precision and accuracy as the algorithm doesn’t perform well.

14. List some drawbacks of linear model

These are a few drawbacks of the linear model

1. For binary and count outcomes, it is not usable.
2. Error linearity assumptions occur too often.
3. Over-fitting problems that cannot be solved.

15. Describe SVM algorithm

SVM (Support Vector Machine) is an algorithm (supervised machine learning) implemented for classification and regression. If one’s training data set has n features, then SVM does their plotting in a space that is n-dimensional, where every feature’s value is a specific coordinate’s value. SVM implements hyperplanes for the segregation of distinct classes.

Want a One-On-One session with Instructor to clear doubts and help you clear Interviews? Contact Us at +91-9999201478 or fill out the Enquiry Form