Data Analyst Portfolio Projects Using Indian Datasets
Data Analyst Portfolio Projects Using Indian Datasets
Two candidates apply for the same analyst role in Hyderabad. Both list "SQL, Power BI, Python." One has a GitHub link with three real projects on datasets the recruiter instantly understands — kirana sales, UPI payments, cab demand. Guess who gets the call. A portfolio is the single best way to stand out in India's crowded analyst market, and using relatable Indian datasets makes your work memorable.
Below are three end-to-end projects. For each you get the business question, the data, the steps, and the insights to showcase. Build all three and you have a complete, hireable portfolio.
Why Indian datasets?
- Relatability — a hiring manager grasps "weekend surge in UPI food orders" faster than a generic Kaggle dataset.
- Storytelling — local context lets you frame real business decisions.
- Differentiation — most applicants reuse the same Titanic/Iris datasets. You won't.
You can generate realistic synthetic data with Python (faker, numpy) or adapt public sources like data.gov.in, RBI/NPCI summaries, and open municipal data.
Project 1 — Kirana Store Sales Analysis (Excel + SQL + Power BI)
Business question: Which products, days and outlets drive revenue, and where is the kirana chain leaking margin?
The data: a sales table — date, outlet_id, city, product, category, quantity, unit_price, cost.
Steps:
- Clean in Power Query: fix date formats, remove duplicate bills, handle negative quantities (returns).
- Model: build a star schema — Sales fact + Date, Product, Outlet dimensions.
- SQL analysis to answer specific questions:
SELECT city, category,
SUM(quantity * unit_price) AS revenue,
SUM(quantity * (unit_price - cost)) AS margin
FROM sales
GROUP BY city, category
ORDER BY margin DESC;
- DAX measures: Total Sales, Gross Margin %, Month-over-Month growth, Top-N products.
- Dashboard: revenue by outlet, category mix, weekday vs. weekend patterns, slow-moving stock.
Insights to highlight: "Top 20% of SKUs drove 68% of revenue; staples had high volume but thin margin while packaged snacks carried the margin — recommend shelf-space rebalancing."
Project 2 — UPI Transactions Trends (SQL + Python)
Business question: When and where do people transact most, and which merchant categories dominate?
The data: a transactions table — txn_id, timestamp, user_id, merchant, category, amount, city, status.
Steps:
- Load into Python with
pandas; parse timestamps to extract hour, day-of-week, month. - Clean: drop failed/duplicate transactions, flag outlier amounts.
- SQL/pandas analysis:
SELECT category,
COUNT(*) AS txns,
ROUND(AVG(amount),0) AS avg_ticket
FROM transactions
WHERE status = 'SUCCESS'
GROUP BY category
ORDER BY txns DESC;
- Visualize with
matplotlib/seaborn: hourly transaction heatmap, category share, success-rate trend. - Optional: a simple cohort of repeat users.
Insights to highlight: "Transactions peak at 8-9 PM; food-delivery payments surge 22% on weekends; failure rate spikes during month-end salary credit — a reliability concern worth flagging."
Project 3 — Cab Demand Analysis (Python + Power BI)
Business question: Where and when is ride demand highest, and how should driver supply be positioned?
The data: a rides table — ride_id, request_time, pickup_zone, drop_zone, city, fare, wait_time, cancelled.
Steps:
- EDA in Python: demand by hour, zone and day; cancellation patterns.
- Feature work: peak vs. off-peak flag, surge windows, zone-level demand density.
- Power BI dashboard: a map/matrix of demand by zone and hour, average wait time, cancellation hotspots, fare distribution.
- Add a what-if slicer (e.g., simulate adding drivers in a high-cancellation zone).
Insights to highlight: "Morning 9-11 AM office-corridor demand outstrips supply, driving a 31% cancellation rate in two zones; reallocating drivers there could recover an estimated ₹X in lost rides."
How to present your portfolio
- GitHub repo per project with a clear README: business question → data → approach → key insight.
- Publish dashboards to the Power BI Service (or share PDF/screenshots).
- Write a short LinkedIn post per project — visibility brings recruiters.
- Link everything from your resume header.
What makes a project "interview-ready"
- It answers a business question, not just "I made charts."
- It shows the full pipeline: clean → model → analyze → visualize → recommend.
- It ends with a decision or recommendation, with a number attached.
- You can explain it in 60 seconds and defend your choices.
Three solid projects beat ten half-finished ones. Pick these, build them properly, and you'll have a portfolio that does the talking before you even enter the room.
Related: Power BI Portfolio Projects That Get You Hired · Build your skill score
Don't just read. Prove your skill on DevWithData.
Shashikant
· Founder, DevWithDataData professional and Power BI instructor. Building DevWithData to help analysts prove their skills, not just collect certificates.
Reading is not enough. Prove your skill.
DevWithData measures your actual ability with the Data Readiness Index. Stop reading — start practicing.
Continue Learning
DAX Time Intelligence: The Complete Guide (YTD, MTD, QTD)
Time intelligence is where Power BI dashboards earn their keep, and where most break. This guide covers TOTALYTD, TOTALMTD and TOTALQTD, the non-negotiable date table requirement, marking it as a date table, and how to handle the Indian fiscal year that starts in April. Includes real, copy-paste DAX measures on Indian sales data and the pitfalls that silently return blanks.
8 min read12 SQL Interview Puzzles with Solutions
Twelve SQL puzzles that show up again and again in Indian data analyst interviews, each with a clean, worked solution and the reasoning behind it. Covers duplicates, second-highest values, running totals, gaps and islands, self-joins, top-N per group, and more. Built on relatable e-commerce and HR tables so you can practice the patterns, not just memorise answers.
8 min readThe Second-Highest Salary SQL Problem (5 Ways to Solve It)
The second-highest salary question is asked in almost every entry-level SQL interview in India. This post solves it five different ways: subquery with MAX, LIMIT/OFFSET, DENSE_RANK, a correlated subquery, and a CTE. Each approach is explained with its trade-offs around ties, NULLs, and what happens when there is no second salary, so you can pick the right one under pressure.
8 min read