Strengths and weaknesses of ML methods in marketing

machine learning

In the last two decades, artificial intelligence (AI) and machine learning (ML) have significantly transformed fields such as biology, education, engineering, finance, and healthcare. Marketing is no exception.

Thanks to the abundance of data and heavily digitized footprints of customers, companies today invest heavily in ML to enhance their marketing capabilities and improve their customer interactions in more personalized and ubiquitous ways.

Machine Learning today has a myriad of applications. For example, sophisticated machine learning algorithms power the recommender systems at eCommerce websites and content platforms, deep learning engines analyze and tag the billions of images on social media sites, automated bidding algorithms ex-amine a web surfer’s profile in millisecond timescale to determine the optimal bid for ad delivery, chatbots engage human-like conversations with customers to maintain relationship and loyalty.

Other applications include social media mining, sentiment analysis, customer churn prevention, predictive customer service, marketing automation, dynamic emails, predictive analytics, content generation, content curation, and social semantics.

According to BCC Research, the global market for machine learning-enabled solutions will grow at a 43.6 percent annual rate from now until 2022, reaching $8.8 billion. However, although the interest in ML is increasing rapidly, the use of ML methods in marketing is still at an early stage.

Machine learning methods (supervised, unsupervised, and reinforcement) can effectively process large-scale and unstructured data and have flexible structures to approximate complex functions, yielding strong predictive performance instead of statistical and econometric models traditionally used in marketing.

However, significant differences exist between machine learning methods and the statistical and econometric models commonly used in quantitative marketing research. This post briefly discusses the relative strengths and weaknesses of machine learning methods used in marketing.

Strengths of ML methods

The first key strength of machine learning methods is that they can readily handle unstructured data (such as text, image, audio, and video), which have been the key driver of the data explosion in recent years. Furthermore, they can process data with complex structures such as large-scale networks or tracking data. In addition, machine learning methods can accommodate hybrid formats, such as a combination of text, image, and structured data, in an integrated manner.

Second, machine learning methods can handle larger data volumes than econometric models. Using econometric models, data is typically collected from hundreds or thousands of consumers, with a small number of variables and limited choice sets. Forward-looking models may use even smaller samples. In contrast, in machine learning, efficient optimization algorithms such as stochastic gradient descent and parallel computing enable efficient training on large datasets. Implementation is also facilitated by off-the-shelf tools with high-performance computing capabilities.

The third advantage is their adaptability, which starts with input construction and progresses through feature engineering. Observed variables are usually entered directly for statistical inference using econometric models, with manipulations limited to normalization, monotonic transformation, or the addition of selected interaction terms. On the other hand, machine learning encourages a lot of work upfront to create and transform input variables. For example, one variable can be input in its original form, binned form, higher-order terms, and interaction terms. Additional transformations are carried out based on the domain knowledge of the researchers.

Furthermore, the model structure reflects flexibility. Machine learning methods strive for flexibility, whereas marketing models typically prescribe specific functional forms, such as linear utility functions or solutions to DP problems. A regression tree can be used to carve out arbitrary regions of the feature space. The original variables are mapped into higher or even infinite-dimensional areas using SVM. Many layers lie between the input and the output in deep neural networks, performing complex transformations. When feature engineering and a flexible model structure are combined, the chances of capturing the true linkage between input and output variables increase.

Fourth, machine learning methods excel at prediction, especially in real-world settings, due partly to the reasons stated above. Machine learning methods are evaluated based on their out-of-sample predictive accuracy, whereas econometric models typically focus on causal identification and interpretation. Teams that use machine learning methods such as deep neural networks or ensemble methods usually win popular open data competitions, such as Kaggle competitions. Prediction is becoming more important as products, markets, and decision contexts become more complex, which is another significant advantage of machine learning methods.

Limitations of ML methods

The first fundamental limitation of ML methods is that they often lack interpretability to have a transparent model structure and clear linkage between variables.

In econometric models, the type of model is chosen according to the theory, e.g., choices models represent utility-maximizing decisions, and DP models represent consumers’ intertemporal optimization. The functional form may also reflect theory, e.g., the Bass model represents diffusion driven by innovators and imitators. Finally, variables are chosen based on theory and included so that statistical hypothesis testing can be used to interpret how they are related, such as exposure to an ad that increases utility or assesses interest characteristics, such as risk aversion in consumers.

In contrast, machine learning methods rely on heavily engineered features and flexible model structures, resulting in a black box that delivers predictive accuracy but not on interpretive insights, which marketing researchers have expected. Furthermore, these methods are often developed in an optimization framework, making parametric statistical hypothesis testing infeasible, presenting another hurdle for interpretation.

Second, machine learning methods frequently uncover correlational rather than causal relationships. Third, endogeneity concerns have received little attention when developing machine learning methods due to the predictive focus. Selection, omitted variables, and simultaneity are all meticulously addressed in econometric models but are frequently overlooked in machine learning analysis. As a result, machine-learned functions cannot be assumed to be causal, and predictive performance may not hold in the event of a regime shift or policy change. Finally, due to a lack of causal capacity, performing counterfactual analysis, which is critical for designing and evaluating marketing mix and other vital decisions like segmentation and engagement, is difficult. As a result, they can’t be used as a primary marketing tool.

Third, while machine learning methods have demonstrated their ability to capture individual consumer heterogeneity and dynamics, their ability to capture individual consumer level heterogeneity and dynamics is still unknown. Traditionally, dynamic data has been handled by time series models or specific PGMs like HMM, which are arguably more related to statistics than machine learning. Even the most advanced RNN-based models are only successful in tasks like speech recognition and translation, leaving their potential for capturing consumer dynamics largely untapped. In many areas of marketing research, accounting for unobserved heterogeneity and dynamics is critical, and it remains to be seen how machine learning methods can help in this regard.

Strengths and weaknesses of machine learning methods.

Strength

  • Ability to handle unstructured data and data of hybrid formats
  • Flexible model structure
  • Ability to handle large data volume
  • Strong predictive performance

Weakness

  • Not easy to interpret
  • Unproven on analyzing individual consumer level heterogeneity and dynamics
  • The relationship typically correlational instead of causal