According to Indeed.com, machine learning engineer is becoming one of the best sought-after jobs in the world, due to growing demand and high salaries. It offers nearly endless potential, since there’s a severe need for talent, and the demand for qualified job applicants is already surpassing that of data scientists.
Gartner predicts that with the expansion of products and services related to Machine Learning, the industry will expand from $1.4 billion in 2017 to $8.8 billion by 2020, creating 2.3 million new jobs.
With Machine Learning job listings on the rise in areas such as natural language processing and deep learning, there is a place for anyone regardless of whichever specialty they are interested in.
This article is a repository of 60 questions commonly asked during machine learning interviews. It comes in a question/answer format. We have given the shortest answers possible for you to memorize them easily. Let’s get started!
1. What are the different techniques in machine learning?
- Supervised Learning
- Unsupervised Learning
- Semi-supervised Learning
- Reinforcement Learning
- Transduction
- Learning to Learn
2. What are the various approaches to machine learning?
- Concept Vs. Classification Learning
- Symbolic Vs. Statistical Learning
- Inductive Vs. Analytical Learning
3. What is the difference between supervised and unsupervised machine learning?
Supervised: Learning from labeled data using classification and regression models.
Unsupervised: Learning from unlabeled data using factor and cluster analysis models.
4. Explain the functions of Supervised Learning.
- Classifications
- Speech recognition
- Regression
- Predict time series
- Annotate strings
5. What are the algorithms used for supervised learning?
- Linear Regression
- Logistic Regression
- Decision Tree
- Random Forest
- KNN
- Naive Bayes
6. What are the two methods used for calibration in supervised learning?
- Platt Calibration
- Isotonic Regression
7. Explain the functions of Unsupervised Learning
- Find clusters of the data
- Find low-dimensional representations of the data
- Find interesting directions in data
- Interesting coordinates and correlations
- Find novel observations/ database cleaning.
8. What are the algorithms used for unsupervised learning?
- K-means
- Hierarchical Clustering
- t-SNE Clustering
- DBSCAN Clustering
- Principal Component Analysis (PCA)
- Anomaly detection
9. What is the standard approach to supervised learning?
Split the set of examples into the training set and the test.
10. Explain the difference between K-Nearest Neighbours and K-Means Clustering
K-Nearest Neighbours is a supervised machine learning algorithm, which classifies the points in a labeled data, based on the distance of the point from the nearest points. K-Means clustering is an unsupervised machine learning algorithm, which classifies points within an unlabelled data into clusters based on the mean of the distances between different points.
11. What are the five popular algorithms in Machine Learning?
- Decision Trees
- Neural Networks (backpropagation)
- Probabilistic networks
- Nearest Neighbor
- Support vector machines
12. What are the three stages of building a model?
- Model building
- Model testing
- Applying the model
13. What is the difference between classification and regression?
Classification is used to classify data into some specific categories and produce discrete results. Regression is used to deal with continuous data at a certain point in time.
14. Explain ‘Training set’ and ‘Test Set.’
The training set represents the dataset used to train the model. The testing set represents the dataset used to test the trained model.
15. What are parametric models?
Parametric models are those with a finite number of parameters, while non-parametric models are those with an unbounded number of parameters.
16. Why overfitting happens?
Overfitting occurs due to the complexity of parameters in a model. The model describes random error or noise instead of the underlying relationship.
17. How to avoid overfitting?
- Keep the model simple.
- Choose fewer variables and parameters to reduce the noise.
- Use cross-validation techniques like K-folds cross-validation to keep overfitting under control.
- Follow regularization techniques like LASSO.
18. Which is the method frequently used to prevent overfitting?
- Isotonic Regression
19. What are the advantages and disadvantages of decision trees?
Decision trees are easy to interpret, and there are relatively few parameters to tune. They are prone to overfit.
20. What is the difference between stochastic gradient descent (SGD) and gradient descent (GD)?
Standard gradient descent is a method to evaluate all training samples for each set of parameters to minimize a loss function and make adjustments, whereas, in stochastic gradient descent, you evaluate only one training sample for the set of parameters.
21. What is classifier?
A classifier utilizes some training data to understand how given input variables relate to the class.
22. What is the key advantage of Navie Bayes?
Unlike discriminative models like logistic regression, a naive Bayes classifier converge quicker thus needs less training data.
23. What is Ensemble learning?
Ensemble learning means to combine many base models like classifiers and regressors, to get better results. It is used while building an accurate and independent component classifiers.
24. Explain dimension reduction.
Dimension Reduction is the process of reducing the size of the feature matrix (such as the number of columns, either by combining columns or removing extra variables), to get a better feature set.
25. What are the best dimensionality reduction algorithms?
- Missing Value Ratio
- Low Variance Filter
- High Correlation Filter
- Random Forest
- Backward Feature Elimination
- Forward Feature Selection
- Factor Analysis
- Principal Component Analysis
- Independent Component Analysis
- t-Distributed Stochastic Neighbor Embedding (t-SNE)
- UMAP
26. What is ensemble learning?
Ensemble learning is a process of strategically generating and combining multiple models, such as classifiers or experts, to solve a particular computational program.
27. Why is ensemble learning used?
To improve the classification, prediction, and function approximation of a model.
28. When to use ensemble learning?
To build component classifiers that are more accurate and independent from each other.
29. What are the two paradigms of ensemble methods?
- Sequential ensemble methods
- Parallel ensemble methods
30. What is the general principle of an ensemble method?
Combine the predictions of several models and improve robustness over a single model.
31. What are ensemble techniques?
Basic: max voting, averaging, weighted averages
Advanced: Stacking, Blending, Bagging, Boosting
32. What is bagging and boosting?
Bagging is a method in ensemble for improving unstable estimation or classification schemes. Boosting is a method used sequentially to reduce the bias of the combined model. Boosting and bagging reduce errors by reducing the variance term.
33. What are the best bagging algorithms?
- Bagging Meta-estimator
- Random Forest
34. What are the best boosting algorithms?
- Adaboost
- Gradient boosting algorithm (GBM)
- Extreme gradient boosting (XBM)
- Light GBM
- Catboost
35. What would you do if your model suffers from low bias and high variance?
Use bagging algorithms like random forest regressor.
36. What is the difference between a random forest and a gradient boosting algorithm?
Random forest uses bagging techniques to reduce variance, while gradient boosting uses boosting techniques to reduce bias and variance.
37. Which are the areas pattern recognition is used the most?
- Computer Vision
- Speech Recognition
- Data Mining
- Statistics
- Informal Retrieval
- Bio-Informatics
38. What is genetic programming?
Genetic programming is a technique to test several models and select the best one based on better results.
39. What is inductive logic programming (ILP)
ILP is a subfield in machine learning which uses logical programming to represent background knowledge and examples.
40. What is the difference between heuristic for rule learning and heuristics for decision trees?
Heuristics for decision trees evaluate the average quality of a number of disjointed sets. Rule learners only evaluate the quality of the set of instances that are covered with the candidate rule.
41. What is perceptron?
Perceptron is an algorithm for supervised classification of the input into one of several possible non-binary outputs.
42. Explain the two components of the Bayesian logic program.
A logical component, consisting of a set of Bayesian Clauses, which captures the qualitative structure of the domain.
A quantitative component that encodes the quantitative information about the domain.
43. What is Bayesian Network?
Bayesian Network represents the graphical model for the probability relationship among a set of variables.
44. Why are instance-based learning algorithms referred to as Lazy learning algorithms?
They delay the induction or generalization process until classification is performed.
45. What are the two classification methods that SVM ( Support Vector Machine) can handle?
- Combining binary classifiers
- Modifying binary to incorporate multiclass learning
46. What is an incremental learning algorithm?
Incremental learning is the ability of an algorithm to learn from new data made available after an available dataset has generated a classifier.
47. What is PCA, KPCA, and ICA used for?
PCA (Principal Components Analysis), KPCA ( Kernel-based Principal Component Analysis) and ICA (Independent Component Analysis) are extraction techniques used for
dimensionality reduction.
48. What is dimension reduction
It is the process of reducing the number of random variables under consideration.
49. What are support vector machines?
Support vector machines are supervised learning algorithms used for classification and regression analysis.
50. What are the components of relational evaluation techniques?
- Data Acquisition
- Ground Truth Acquisition
- Cross-Validation Technique
- Query Type
- Scoring Metric
- Significance Test
51. What are the different methods in Sequential Supervised Learning?
- Sliding-window methods
- Recurrent sliding windows
- Hidden Markow models
- Maximum entropy Markow models
- Conditional random fields
- Graph transformer networks
52. What is PAC Learning?
PAC (Probably Approximately Correct) learning is a learning framework introduced to analyze learning algorithms and their statistical efficiency.
53. What are the different categories in the sequence learning process?
- Sequence prediction
- Sequence generation
- Sequence recognition
- Sequential decision
54. What is sequence learning?
Sequence learning is a method of teaching and learning in a logical manner.
55. What are the three data preprocessing techniques to handle outliers?
- Winsorize
- Transform to reduce skew
- Remove outliers if you’re certain they are anomalies or measurement errors.
56. What are the three ways of reducing dimensionality?
- Removing collinear features.
- Performing PCA, ICA, or other forms of algorithmic dimensionality reduction.
- Combining features with feature engineering.
57. What are the advantages and disadvantages of neural networks?
Neural networks are incredibly flexible to learn patterns from unstructured datasets such as images, audio, and video. But they require a large amount of training data to converge and are difficult to pick the right architecture.
58. What are the best techniques for the recommendation system?
- Content-based filtering
- Collaborative filtering
59. What are the different types of Times series algorithms?
- Naive Approach
- Simple average
- Moving average
- Single Exponential smoothing
- Holt’s linear trend method Method
- Holt’s Winter seasonal method Method
- ARIMA (Autoregressive and moving average)
60. What are the different applications of machine learning?
- Bioinformatics
- Brain-machine interfaces
- Computer Networks
- Computer vision
- Credit-card fraud detection
- Financial market analysis
- Handwriting recognition
- Information retrieval
- Insurance
- Internet fraud detection
- Medical diagnosis
- Optimization
- Recommender systems
- Search engines
- Sentiment analysis
- Sequence mining
- Speech recognition
- Time series forecasting
- User behavior analytics