INSTACART GROCERY BASKET ANALYSIS
CONTEXT
Instacart is an American retail company that operates a grocery delivery and pick-up service in the United States and Canada. The company offers its services via a website and mobile app. The service allows customers to order groceries from participating retailers with the shopping being done by a personal shopper.
The Instacart stakeholders are interested in the variety of customers in their database along with their purchasing behaviors. They assume they can't target everyone using the same methods, and they’re considering a targeted marketing strategy. They want to target different customers with applicable marketing campaigns to see whether they have an effect on the sale of their products.
GOAL
The “Instacart Grocery Basket Analysis” aims to answer a list of key questions by the sales department as well as other stakeholders of the project. The main goal is to uncover information about customer sales patterns and suggest strategies for better segmentation based on insights gained from analyzing sales and customer data.
Project Duration: 3 weeks
Production Date: 2022
Tools:
PROJECT INFORMATION
TECHNIQUES
• Data Analytics with Python
• Data Cleaning
• Data Wrangling & Subsetting
• Data Consistency Checks
• Data Combining
• Deriving Variables
• Data Grouping & Aggregation
• Data Visualization with various libraries
• Customer Segmentation & Profiling
DATA CITATION
The Instacart Online Grocery Shopping Dataset 2017 Data Set, Source: Instacart
Instacart Customer & Product Data Set (Data on product prices and customers used in this project are fabricated and are not liable to any PII (personal identifiable information) laws.)
PROCESS
I started the project by clearly defining its objectives and requirements. For that, I reviewed the available data and the initial project information (including the key questions from the sales department and other stakeholders). As previously outlined, the primary goal of the analysis project is to analyze the data to find sales patterns and segment Instacart's customers based on their order history.
KEY QUESTIONS
• What are the busiest days of the week and hours of the day (most orders)?
• When are the least orders placed?
• Are there particular times of the day when people spend the most money?
• After organizing the products into different price groups, what types of products are most popular?
• What departments have the highest frequency of product orders?
• What’s the distribution among users in regards to their brand loyalty?
• Are there differences in ordering habits based on a customer’s loyalty status?
• Are there differences in ordering habits based on a customer’s region?
• Is there a connection between age and family status in terms of ordering habits?
• What different classifications does the demographic information suggest? (Age, Income, Products)
• What differences are there in ordering habits based on different customer profiles?
DATA WRANGLING & CONSISTENCY CHECKS
After formulating the objectives of the project, I examined the provided data sets on their sourcing, collection methods, content, and limitations, and combined them where needed. The goal was to early on identify the data most important for the project and to examine its reliability, including the potential for any data biases.
Following this process, I used Python (Jupyter Notebook) to do a set of data consistency checks and subsequently examined the relevant data on their completeness, uniqueness, and quality and corrected any issues that I identified during these tests (data cleaning). I combined all necessary data sets and completed the step with an additional round of data cleaning. The final data set consists of around 32 million unique entries detailing customers' order history.
ANSWERING QUESTIONS & CUSTOMER SEGMENTATION (PROFILING)
My next step was to start answering key questions, starting with customer order frequency (most & least busy days of the week and time of day) and also the price of customer orders per weekday. After aggregating values with Python, I used different libraries to visualize the data and analyze the results further. I repeated these steps for most key questions, aggregating and summarizing values, visualizing them, and analyzing the results further. Additionally, I added identifier variables (flags) to the data set, which would enable the marketing team to find user groups easier. For example, customers are sorted into different types of spending categories. Feel free to read through my complete findings in the final report.
Customer Order Distribution by Weekday
Customer Order Distribution by Hour of the Day
Average Price per Customer Order per Weekday
Customer Order Distribution by Weekday
Order Distribution by Department
Order Distribution by Age Group
Order Frequency by Age Group
Order Distribution by Department
To segment Instacarts customer base and formulate different profiles I examined the app user data and order history closer. Besides order details, the data contains information about the user's age, income, and number of household dependents. Based on these variables and order details I segmented the users by their loyalty status, spending habits, frequency of order placement, activity status, age group, income, and so on. Using a combination of these characteristics I was able to formulate basic customer profiles. Feel free to look at the final report with detailed data cleaning documentation and my extensive answers to all questions or have a look at all Python scripts used in this project on my GitHub.
Income Types per US Region
Order Frequency by Income Type
Order Distribution by Age Group and Marital Status
Income Types per US Region
Order Distribution by Household Income Status
Order Distribution by Household Size
Order Distribution by Spending Type and Age Group
Order Distribution by Household Income Status
SUMMARY
I successfully answered the key questions posed by the marketing team and other stakeholders. Additionally, I was able to formulate customer profiles and analyze order habits so the team will be able to target user groups with applicable marketing campaigns to see whether they have an effect on the sale of their products.
The process involved analyzing and interpreting large amounts of data, including customer order data and various demographic information. I used my expertise in data analysis and critical thinking skills to identify trends and patterns to segment the customer base and formulate multiple profiles. I also communicated effectively with stakeholders to ensure that the information I provided was accurate and met their expectations.
For the future, it is important to note that the results of targeted marketing campaigns should be analyzed to determine their success. For that, it is necessary to formulate designated KPIs and keep track of changes in collected data over time.