Thirsty for more expert insights?

Subscribe to our Tea O'Clock newsletter!

Subscribe

The Crucial Role of Exploratory Data Analysis in MMM

Mathieu Lepoutre
No items found.
Published on
22/9/2025
In this article, Mathieu Lepoutre, Global media and measurement offer director at fifty-five, demonstrates the importance of Exploratory Data Analysis (EDA) in the MMM process, highlighting key tests and practical applications to enhance the reliability of your model.

Did you know that the quality of your marketing data can significantly impact the effectiveness of your marketing mix model (MMM)? As marketers increasingly rely on data-driven strategies for optimal ROI, ensuring the accuracy and completeness of this data is crucial. In this article, we will explore the importance of Exploratory Data Analysis (EDA) in the MMM process, highlighting key tests and practical applications to enhance the reliability of your model.

The Crucial Role of Exploratory Data Analysis in MMM

Exploratory Data Analysis (EDA) is an indispensable step in the MMM process, often underestimated or taken too lightly in many MMM projects, whereas this phase is crucial for the final quality of the model. Its primary goal is to directly improve the reliability and accuracy of the model's output. 

First, it is necessary to verify if the data is accurate. However, this is not enough; it is also essential to ensure that the data is compatible with regression models. Without these essential checks, the model results can be severely biased, leading to wrong interpretations and poor decision-making due to issues like missing values, inconsistencies across sources, or unaddressed multicollinearity.

1/ Data completeness and consistency checklist:

  • Identify missing or incomplete data by conducting a thorough data review. Graphs showing the percentage of data completeness by variable (channel) can be helpful.
  • Resolve data inconsistencies by confirming annual and monthly expenditures per channel.
  • Verify consistency across different levels of granularity (e.g., annual data vs. sum of weekly data).
  • Ensure metrics are not null/zero at the weekly level when the budget is not zero.
  • Check the consistency of weekly metric/budget variations by channel.
  • Confirm the number of campaigns (or active weeks) per channel over the period to ensure sufficient temporal variation.
  • Verify the quantity (and cost per quantity) for each channel, ensuring no null or zero quantities per week, per channel, and per geographic area. Additionally, check that the cost per quantity does not vary by more than twice from one week to the next and compare it to benchmarks.
  • Detect outliers in expenditures and cost per quantity using methods like Z-Score, IQR, or Box Plot analysis.
  • Identify abnormally low observations for any time period.

2/ Verifying Correlations and media patterns

Checking correlations during EDA helps to identify relationships between the Key Performance Indicator (KPI), media variables, and control variables, which is vital for building an accurate and robust MMM.

Multicollinearity occurs when two or more independent variables (e.g., media and control variables) are highly correlated with each other. This can make it difficult for regression models to accurately calculate the individual impact of these collinear variables, leading to unstable and hard-to-interpret model coefficients. For instance, a media channel expected to have a positive impact might show a negative coefficient due to high correlation with another variable.

Correlation Checks :

  • Verify the correlation between the KPI, media, and control variables to identify unexpected relationships and multicollinearity issues.
  • Produce correlation graphs and use coefficients like Pearson, Spearman, or Kendall's Tau to measure relationships.
  • Identify seasonal patterns in sales and decide if a control variable for the high season (with forward and backward carryover effects) should be created. Google’s Meridian is equipped with automatic seasonality and trend adjustment features via time-varying intercept, making the inclusion of separate seasonality variables optional. However, knots can be used for this purpose.
  • Estimate the maximum lag for each media and compare it to benchmarks.
  • Select the appropriate quantitative variable (e.g., impressions, clicks, expenditures) for each media.

Analyzing Media Variable Patterns

  • Calculate the variance of channels relative to cost.
  • Identify channels that are potentially too small (e.g., <1% of total expenditures) or too large (e.g., >80%). It is sometimes recommended to group or remove low-expenditure channels as their posterior may remain very similar to their prior, especially if the data contains little information.

3/ Regular Audits:

Open source models such as Meridian do not provide built-in EDA, meaning users must create their own to perform these checks. 

Numerous tests need to be conducted to ensure data quality and its compatibility with modeling. The more tests that are performed and the more detailed they are, the better the modeling will be. Unfortunately, all of these tests must be carried out with each new update of the model (usually every quarter, but potentially every month), which can be very time-consuming. If artificial intelligence is used, regularly audit AI outputs for unintended bias and anomalies to maintain data integrity.

To ensure maximum comprehensiveness in the tests conducted and to facilitate their execution, fifty-five has implemented data products that allow for the complete customization of each EDA for each project. In many of our projects, this phase is quickly internalized by the advertisers, who become autonomous in managing the quality of the data ingested by the MMM.

4/ Generative AI automatisation

GenAI represents a significant technological breakthrough for Exploratory Data Analysis (EDA). On one hand, it enables the deployment of numerous tests simultaneously and at scale — tests that were sometimes not implemented before. On the other hand, the agents now perform an initial level of analysis and synthesis, allowing them to initiate deeper tests in case of anomalies or uncertainties independently . 

This is a crucial element in enhancing the reliability of the data chain, which is becoming increasingly autonomous. It enables advertisers to internalize these processes, leading to greater speed and control over their data.

Exploratory Data Analysis (EDA) is a critical prerequisite for obtaining reliable results from MMM, including Meridian. By ensuring data completeness, consistency, accuracy, and proper correlation, marketers can enhance the quality of their models and drive more effective marketing strategies. And thanks to GenAI, fifty-five can significantly enhance Exploratory Data Analysis (EDA) by enabling simultaneous, large-scale testing and allowing agents to conduct initial analyses and initiate deeper investigations when anomalies arise. This advancement improves the reliability of the data chain, fostering greater autonomy for advertisers and providing them with increased speed and control over their data processes.

Embrace dynamic EDA as a comprehensive health check for your marketing data, ensuring that all components are fresh, complete, and of high quality. 

Interested in learning more? Contact us to find out how efficient MMM can help you achieve your marketing goals

All articles

Related articles

Do you need an open-source MMM?

03 min
fifty-five

How open-source MMM frameworks democratize MMM usage

04 min
fifty-five

Thirsty for more expert insights? Subscribe to our monthly newsletter.

Discover all the latest news, articles, webinar replays and fifty-five events in our monthly newsletter, Tea O'Clock.

First name*
Last name*
Company*
Preferred language*
Email*
Merci !

Votre demande d'abonnement a bien été prise en compte.
Oops! Something went wrong while submitting the form.