Kakkle Consulting | Interview Preparation

Use Case 1 · Restaurant SQL Analytics DemoSchema setup, joins, KPIs, window functions

Work with a restaurant operations database to analyse reservations, orders, menu performance, table usage, and customer loyalty.

Background & Schema

Customers make reservations and place orders.
Tables are linked to both reservations and orders.
Orders contain menu items through the order-item bridge table.
Core tables: Customer, TableEntity, MenuItem, Reservation, OrderEntity, and OrderItem.

Database Assets

create_tables.txt — schema build script.
insert_customer.txt
insert_table.txt
insert_menu.txt
insert_reservation.txt
insert_order.txt
insert_order_item.txt

Recommended load order: create tables first, then insert customers, tables, menu items, reservations, orders, and order items.

Database Setup Instructions

Step 1: Create the Database and Tables
- Copy and paste the table creation script into sql-workbench.com or MySQL Workbench.
- Run the script to create all tables.
Step 2: Insert Data
- Copy and paste the data insertion scripts.
- Confirm that all tables contain data before continuing.

Business Questions

Identifying High-Value Customers: Which customers have spent the most in total, and how can they be classified as Gold (>300), Silver (between 150 and 300), or Bronze (<150) customers based on total spending?
Menu Performance Analysis: Which menu items generate the highest total revenue and how many times has each item been ordered?
Reservation No-Shows and Lost Opportunities: How many reservations did not result in any customer orders even after being confirmed?
Table Utilisation Efficiency: How efficiently are tables being used based on seating capacity versus number of guests?

Deliverables

SQL script with clear query sections for all 4 business questions.
Short business summary with key insights and recommendations.

Interview Questions (Task Review Style)

Customer Value Segmentation: Walk us through how you calculated total spend per customer and classified Gold/Silver/Bronze tiers. Why did you choose your aggregation and threshold logic, and what alternative segmentation could also work?
Menu Performance Query Design: Explain your SQL for item-level revenue and order frequency. How did you prevent double-counting when joining OrderEntity and OrderItem, and what checks did you run to validate totals?
No-Show / Lost Opportunity Analysis: Describe how you identified confirmed reservations that did not convert to orders. What join type and matching keys did you use, and what business risks exist if customer-table-time matching is incomplete?
Table Utilisation Logic: Explain your method for measuring utilisation using guests vs seating capacity. What assumptions did you make, how could those assumptions fail in real operations, and what additional fields would improve accuracy?
Query Quality and Performance: Which part of your solution was most expensive to run, and how would you optimise it for production-scale data (indexes, pre-aggregation tables, partitioning, or query rewrites)?

Use Case 2 · Excel Data Analysis DemoDynamic arrays, lookups, filters, pivots

Work through the workbook using dynamic array formulas, lookup functions, filtering logic, and pivot tables. Use the Data tab as your single source for every task, and do not complete tasks by copy/pasting raw data into analysis sheets.

The Dataset

Source file: HR Test.xlsx
Source tab: Data
Fields: Employment ID, Level 1, Level 2, Level 3, Organisational Unit, Current Posts, Start Date, Salary Annual, and Employee Summary: Job Titles.

Tasks

Method Rule for All Tasks: Complete each task using linked formulas and/or pivot tables that reference the Data tab directly. No manual copy/paste of source rows into result sheets.
Unique Sales Headcount and Levels: Create a unique count of all Employment IDs in the Sales team, then populate all employee levels from the Data tab into your solution.
Department Headcount by Level 1: Count how many employees there are in each department at Level 1 from the dataset.
Level 1 and Job Title Analysis: Create a unique list of all Level 1 values and job titles, then count how many employees there are for each job title at Level 1.
Average Salary by Department: Find the average salary for each department found at Level 1.
Engineer Filter by Unit: Filter out all employees with engineer in their post title for Expert Solutions, Mechanical & Electrical Services, and Cooling Systems.
2023 Starters by Level: Count the number of employees who started in 2023, between 01/01/2023 and 01/01/2024, for each level.
Role-Based Headcount: Count the number of employees with current post title among Engineer, Apprentice, or Technician.
Monthly Starters in 2023: Count the monthly number of starters in 2023 with current post title among Engineer or Technician.
Dynamic Organisational Unit Filter: Create a unique list of all organisational units and make it dynamically filterable with data-validation drop-downs for Levels 1 to 3 so the employee count updates based on the selected levels.
Pivot Summary View: Build at least one pivot table from the Data tab that summarises headcount and/or average salary by Level 1 (and optionally job title), then explain how it cross-checks your formula outputs.

Deliverables

A completed workbook with clear formulas, structured outputs, and separate tabs for each task or analysis area, including at least one pivot-based summary.
A short summary explaining the dynamic arrays, lookup functions, filtering logic, and pivot tables used, any assumptions made, and the main trends you observed.

Interview Questions (Task Review Style)

Formula and Pivot Design: Which dynamic array formulas and pivot tables did you use most often, and why were they the right choice for each task?
Headcount Logic: How did you ensure your unique counts and department totals were not double-counted, especially when filtering by level and job title?
Salary Analysis: How did you calculate average salary by department, and what checks did you use to make sure the results were accurate and not distorted by blanks or text values?
Filter and Validation Design: How did you build the engineer filter and the level-based drop-downs so the outputs remain dynamic when the source data changes?
Workbook Robustness: If the data grows or new departments and job titles are added, what parts of your workbook (formulas and pivots) will update automatically and what would need manual review?

Use Case 3 · Python Data Analysis DemoData cleaning, transformation, feature engineering

Choose one messy dataset, clean and transform it into analysis-ready data using Python (or R), and document your full data-cleaning journey.

The Mission

You are provided with two messy datasets and their corresponding data dictionaries.
Pick one dataset and convert it into analysis-ready format.
Use a reproducible workflow so the same cleaning can be repeated reliably.

Data Assets

Use one dataset + its matching data dictionary for your final submission.

Tasks

Ensure all variables have appropriate data types.
Identify and handle missing values, duplicates, and outliers.
Standardise text fields (for example, casing, spelling, labels).
Identify highly correlated features and address them appropriately.
Engineer relevant new features for downstream analysis.

Deliverables

The Cleaned Dataset — final cleaned file in .csv format.
The Cleaning Journey Presentation — PowerPoint showing your process and decisions.
Automation Code — data cleaning pipeline code that can be re-run on raw data.

Interview Questions (Task Review Style)

Dataset Choice & Scope: Which dataset did you choose and why? What trade-offs did you consider (business relevance, data quality complexity, timeline), and what did you intentionally leave out of scope for the 2–3 day exercise?
Data Type Choices: Which columns did you convert (for example text to number/date/category), what rules did you apply, and how did you confirm those choices using the data dictionary?
Missing, Duplicate, and Outlier Strategy: Walk us through your handling strategy for each issue type. Why did you impute/remove/flag specific records, what risks did each choice introduce, and what alternatives did you evaluate?
Standardisation and Correlation Handling: How did you standardise text categories and handle highly correlated features? Show one example where your decision improved interpretability or model readiness, and one case where it might reduce useful signal.
Pipeline Reproducibility and QA: Explain how your cleaning pipeline can be re-run on new raw files. What validation checks/tests did you add, and how would you monitor for schema drift or data quality regressions over time?

Use Case 4 · Visualisation & Statistical Analysis DemoBusiness questions, visuals, hypothesis testing

Use the provided house price dataset to answer business questions with visualisation and back up key assumptions using statistical testing.

Data Assets

Part A: Visualisation (The “What” and “How”)

Use BI tools such as Power BI, Tableau, Excel, or Python BI.

Formulate 3 business questions that can be answered using visualisation techniques.
For each business question, provide:
- The Business Question (for example, “Does marketing budget correlate with sales revenue?”)
- Business Importance — why answering it matters to the business.
- The Visualisation — image of your chart (for example, scatter plot, bar chart, boxplot).
- The Insight — brief interpretation of the visual pattern and answer.
Ensure all visuals are created from this use-case dataset and are suitable for stakeholder decision-making.

Part B: Statistical Testing (The “Proof”)

Formulate 2 assumptions (hypotheses) about the data that require statistical verification.
For each assumption, provide:
- The Assumption/Question (for example, “Is there a significant difference in salary between employees with a Master’s degree vs. a Bachelor’s degree?”)
- The Test Selected — name the statistical test (for example, T-test, ANOVA, Chi-squared, Pearson correlation) and why it is appropriate.
- The Conclusion — report the test output (for example, p-value) and your plain-English interpretation.

Deliverables

A report or slide deck covering Part A and Part B.
Three visual business questions with chart images, business importance, and insights.
Two statistically tested assumptions with selected test rationale, output, and conclusions.
BI file or notebook/script used to produce visuals and statistical tests.

Interview Questions (Task Review Style)

Question Framing: Why did you choose your three business questions, and how do they map to real business decisions?
Visual Choice and Integrity: Why were your chosen chart types appropriate, and how did you avoid misleading scales, aggregation errors, or omitted context?
Statistical Test Selection: Why did you choose each statistical test, and what assumptions did you check before trusting the results?
Interpreting Significance: How did you explain p-values and statistical significance to non-technical stakeholders without overstating causality?
Actionability: Based on your visuals and tests, what business actions would you recommend first and why?

Use Case 5 · BI Dashboard DemoPeople Analytics, data modelling, dashboards

You are a BI analyst in the people operations team. You are provided with three datasets containing employee, performance, and attrition data. Your task is to build a Power BI, Tableau, Python, or Excel dashboard that highlights key people metrics and trends.

The Objective

Load and model three connected datasets in your chosen BI tool.
Create calculated measures for key HR metrics.
Design an interactive dashboard for stakeholder decision-making.
Document key people insights based on the data.

Data Assets

Employee_Data.csv — employee records with ID, name, department, hire date, country.
Performance_Data.csv — annual performance ratings by employee ID and year.
Attrition_Data.csv — attrition records with termination dates and reasons.

Tasks

Data Modelling: Load the three datasets and create relationships based on EmployeeID. Validate that no data is lost during joins and check for data quality issues (missing IDs, duplicates, date inconsistencies).
Calculated Measures: Develop the following calculated fields/measures:
- Headcount (current employee count)
- Attrition Rate (%) — terminations divided by average headcount
- Average Performance Rating — across all employees
- Average Tenure (months) — years of service by employee
- Promotion Rate (%) — employees promoted year-over-year
- Female % by Department — gender diversity metric
Dashboard Design: Create an interactive dashboard that includes:
- KPI Cards: Headcount, attrition rate, average performance, average tenure at a glance.
- Trend Visuals: Line or area charts showing attrition, hires, and promotions over years.
- Breakdowns: Bar or stacked charts for metrics by department and by gender.
- Filter Panel: Dynamic slicers for Year, Department, and Country to enable drill-down exploration.
Key Insights Summary: Analyze the dashboard and document 3–5 key findings (e.g., "Attrition in Engineering has risen by X%; female representation in leadership is Y%").

Deliverables

BI File: Power BI (.pbix), Tableau workbook (.twb/.twbx), Python Jupyter notebook, or Excel workbook with completed dashboard.
Written Summary: A short summary document (max 300 words) highlighting 3–5 key insights from the dashboard data and any recommendations for HR leadership.

Interview Questions (Task Review Style)

Data Modelling Decisions: Walk us through how you joined the three datasets. What did you do with missing or orphaned records, and how did you verify the integrity of your relationships?
Calculated Measures Design: How did you define attrition rate and tenure? What assumptions did you make, and how would those change if the data grain or business rules shift?
Dashboard Usability: Why did you choose those KPIs, visuals, and filters? How would a non-technical HR leader navigate the dashboard to answer key business questions?
Insights and Recommendations: What is the most actionable insight you found, and what data or follow-up analysis would strengthen that finding?
Scalability and Maintenance: If the data refreshes monthly, what refresh strategy would you set up? How would you monitor for data quality issues or drift over time?

Use Case 6 · Time Series + ML Forecasting DemoDecomposition, models, backtesting, evaluation

You are the lead data analyst in the police department. Using monthly borough-level crime records, your job is to understand patterns in crime over time and build forecasting models that help allocate resources and plan interventions.

The Mission

You work inside the Metropolitan Police analytics unit.
Senior leadership wants to know: which boroughs are trending up, when crime peaks, and what the next 12 months may look like.
Your outputs will inform patrol scheduling, budget requests, and community safety reports.
You must justify every modelling decision as if presenting to a non-technical director.

The Dataset

Source: London Borough Monthly Crime Data — January 2008 to December 2018.
Columns: Borough (London borough name), year_month (month of recorded crime), count (number of crimes recorded).
Each row represents crime counts for one borough in one month.
london_boroughs_crime.csv — download the dataset to get started.

Tasks

Understand the Structure: Parse year_month into a proper datetime index. Check for missing months, gaps, and boroughs with incomplete records. Summarise total crimes by borough and by year.
Explore and Visualise Trends: Plot aggregate monthly crime over the full period. Identify the top 5 boroughs by total crime count and plot their individual trends. Highlight any structural breaks or anomalies.
Decompose the Series: Apply classical decomposition (additive and/or multiplicative) to separate trend, seasonality, and residual. Identify which months show the highest seasonal crime peaks and explain why that matters operationally.
Build Forecasting Models: Start with a naive seasonal baseline (e.g. seasonal naïve or moving average). Then train at least two additional models — options include SARIMA, Exponential Smoothing (ETS), Prophet, or a tree-based approach with lag features. Use a single borough or the city-wide aggregate as your primary series.
Evaluate and Compare: Use a train/test split (hold out the final 12 months). Score all models with MAE, RMSE, and MAPE. Present results in a comparison table. Identify which model performs best and explain why.
Interpret Results: Translate your best model's forecast into plain-language insights. Which boroughs are projected to see the highest crime counts? Are there months where demand for policing is forecast to spike?
Identify Next Steps: What additional data would improve forecast accuracy (e.g. weather, events, socioeconomic indicators)? What are the limitations of your current approach? How would you productionise and monitor this model?

Deliverables

Forecast Notebook or Code: A well-commented Jupyter notebook (or equivalent script) covering all tasks above — EDA, decomposition, model training, evaluation, and forecasting.
Visualisation Output: At least 4 charts — trend plot, decomposition plot, model forecast vs actuals, and model comparison bar/table chart.
Evaluation Results Table: Side-by-side MAE / RMSE / MAPE for all models tested.
Short Analyst Summary: A brief written or slide-based summary (3–5 points) of what you found, what you recommend, and what the caveats are — written for a non-technical audience.

Interview Questions (Task Review Style)

Data Preparation & Series Construction: How did you parse and validate the time index? What did you do with boroughs that had gaps or very low crime counts, and how could those decisions affect borough-level comparisons or aggregate forecasts?
Decomposition Interpretation: Walk us through what your decomposition revealed. Did you use additive or multiplicative and why? What does the seasonal pattern tell you about policing demand, and how would you communicate that to a non-statistical audience?
Model Selection Rationale: Why did you choose the specific models you trained? What assumptions does each model make about stationarity, seasonality, or non-linearity, and how did you check whether those assumptions hold for this dataset?
Evaluation and Backtesting Design: Describe your train/test split strategy. Why is a random split inappropriate for time series? How did you choose your evaluation metrics, and in the context of policing, which error type (over-forecast vs under-forecast) is more operationally costly?
Forecast Interpretation and Stakeholder Communication: How would you present a 12-month borough-level crime forecast to a police commander with no data science background? What caveats would you highlight, and what questions would you expect them to ask about the model's reliability?

Use Case 7 · Airline Satisfaction Classification DemoCleaning, encoding, classification, deployment

You are the customer experience analyst for an airline. Your task is to predict whether passengers are satisfied or dissatisfied and turn the model into a simple app the business can use to check new records.

The Mission

You support the airline’s customer insight team and operations leaders.
They want to understand which passenger segments are most likely to report satisfaction issues.
Your work should produce a usable classification model and a lightweight Streamlit demo for internal testing.
You must explain results clearly enough for non-technical stakeholders to act on them.

The Dataset

Source: Airline passenger satisfaction dataset.
Target: Satisfaction — satisfaction, neutral or dissatisfaction.
Feature types: passenger profile fields, flight details, delay times, and service-rating columns.
airline_satisfication_data.csv and airline_satisfaction_description.txt.

Tasks

Data Cleaning: Check for missing values, duplicates, inconsistent labels, and outliers. Decide how to handle delay fields, encoded service ratings, and any invalid rows before modelling.
Data Modelling: Prepare the data for classification by encoding categorical variables, splitting train/test sets, and scaling numerical fields where needed. Prevent leakage by making sure preprocessing is fit only on training data.
Machine Learning Models: Develop a systematic baseline and compare multiple classifiers such as logistic regression, random forest, gradient boosting, and XGBoost or LightGBM. Evaluate with accuracy, precision, recall, F1, ROC-AUC, and confusion matrix results.
Deployment Using Streamlit: Build a Streamlit app where a user can enter passenger details and get a satisfaction prediction with probability output and a short explanation of the main drivers.
Testing in Production: Validate the app with realistic test inputs, boundary cases, and missing or malformed data. Confirm that predictions, probability scores, and input validation behave as expected.
Monitoring in Production: Define what you would monitor after deployment — prediction drift, feature drift, class balance shifts, latency, and user feedback. Explain when you would retrain or roll back the model.

Deliverables

Cleaned Dataset: A ready-to-use file with all quality issues resolved.
Modelling Notebook: A notebook showing cleaning, preprocessing, model comparison, and final evaluation.
Streamlit App: A working app or codebase that demonstrates the classifier in a simple interface.
Production Test Notes: A checklist or short report showing how you tested the app before release.
Monitoring Plan: A short summary of the metrics, alerts, and retraining rules you would use in production.

Interview Questions (Task Review Style)

Cleaning Decisions: Which data quality issues did you find, and why did you impute, remove, or keep specific records? How would those decisions affect class balance and model bias?
Feature Engineering and Encoding: Which categorical variables needed encoding, which numerical variables benefited from scaling, and how did you avoid leakage during preprocessing?
Model Comparison: Which classifier performed best and why? How did you compare baseline performance against more advanced models, and which metric mattered most for the business problem?
Streamlit Deployment: How did you structure the app so a business user can enter data safely? What validation did you add, and how would you explain the prediction output in plain language?
Production Testing and Monitoring: What tests would you run before launch, and what signs would tell you the model is degrading after launch? How would you detect data drift or changing customer behaviour?

Use Case 8 · House Price Regression DemoCleaning, encoding, regression, deployment

You are the property analytics lead for a housing team. Your goal is to build a regression model that predicts sale price and package it into a Streamlit app that can support valuation checks and scenario testing.

The Mission

You help a housing business estimate prices using property and sale attributes.
The team wants a model that is accurate, explainable, and easy to use in a simple web app.
Your final solution should show how the model could be tested and monitored once it is live.
You must explain the limitations of the model and what could cause price predictions to drift over time.

The Dataset

Source: House price regression dataset.
Target: Sale price prediction.
Feature types: zoning, lot size, house style, basement size, first-floor area, bathrooms, year built, sale timing, and sale condition fields.
HousePricesDataSet.csv and HousePricesDataDescription.txt.

Tasks

Data Cleaning: Check for missing values, duplicates, invalid categories, and extreme outliers. Decide how to handle skewed numeric fields and categorical inconsistencies before modelling.
Data Modelling: Prepare the regression pipeline by encoding categorical variables, splitting train/test sets, and scaling or transforming numeric features where appropriate. Ensure the target variable is handled correctly and no leakage occurs.
Machine Learning Models: Build a baseline and compare several regression models such as linear regression, random forest, gradient boosting, and XGBoost or LightGBM. Evaluate with MAE, RMSE, and R², then compare error patterns across models.
Deployment Using Streamlit: Create a Streamlit app that accepts property features and returns a predicted house price with a confidence note, key drivers, and a user-friendly explanation.
Testing in Production: Test the app with typical homes, edge cases, and malformed user inputs. Check that the model handles unseen categories, missing fields, and extreme values safely.
Monitoring in Production: Define the checks you would track after deployment, including prediction drift, feature drift, error trends, and response times. Explain how you would know when the model needs retraining.

Deliverables

Cleaned Dataset: A cleaned file ready for modelling.
Modelling Notebook: A notebook showing preprocessing, model training, comparison, and evaluation.
Streamlit App: A simple property price prediction app for internal review.
Production Test Notes: A short testing checklist or report covering the app before launch.
Monitoring Plan: A clear plan for tracking model quality, drift, and retraining triggers after release.

Interview Questions (Task Review Style)

Cleaning and Outlier Handling: What data issues did you find, and how did you decide whether to cap, transform, or remove outliers? How might those choices affect sale price estimates?
Feature Encoding and Transformations: Which features were categorical, which needed scaling or log transforms, and how did you make the preprocessing pipeline reusable for new data?
Regression Model Choice: Which model gave the best balance of accuracy and interpretability? How did you compare it against the baseline, and which error metric mattered most to the business?
App Design and Testing: How did you structure the Streamlit app so a user can test scenarios safely? What boundary cases did you check, and how did you handle bad input values?
Monitoring and Retraining: What signals would tell you the house price model is no longer reliable? How would you detect drift, measure ongoing error, and decide when to retrain?

Interview Preparation

How to use this page

Use-Case Library

Background & Schema

Database Assets

Database Setup Instructions

Business Questions

Deliverables

Interview Questions (Task Review Style)

The Dataset

Tasks

Deliverables

Interview Questions (Task Review Style)

The Mission

Data Assets

Tasks

Deliverables

Interview Questions (Task Review Style)

Data Assets

Part A: Visualisation (The “What” and “How”)

Part B: Statistical Testing (The “Proof”)

Deliverables

Interview Questions (Task Review Style)

The Objective

Data Assets

Tasks

Deliverables

Interview Questions (Task Review Style)

The Mission

The Dataset

Tasks

Deliverables

Interview Questions (Task Review Style)

The Mission

The Dataset

Tasks

Deliverables

Interview Questions (Task Review Style)

The Mission

The Dataset

Tasks

Deliverables

Interview Questions (Task Review Style)

Request 1-on-1 Evaluation (£50)