Home|Blog|Data Analyst Portfolio Projects Using Indian Datasets
Sign in →
data-analyst
portfolio
projects
sql
power-bi

Data Analyst Portfolio Projects Using Indian Datasets

By Shashikant·27 June 2026·4 min read

Data Analyst Portfolio Projects Using Indian Datasets

Two candidates apply for the same analyst role in Hyderabad. Both list "SQL, Power BI, Python." One has a GitHub link with three real projects on datasets the recruiter instantly understands — kirana sales, UPI payments, cab demand. Guess who gets the call. A portfolio is the single best way to stand out in India's crowded analyst market, and using relatable Indian datasets makes your work memorable.

Below are three end-to-end projects. For each you get the business question, the data, the steps, and the insights to showcase. Build all three and you have a complete, hireable portfolio.

Why Indian datasets?

  • Relatability — a hiring manager grasps "weekend surge in UPI food orders" faster than a generic Kaggle dataset.
  • Storytelling — local context lets you frame real business decisions.
  • Differentiation — most applicants reuse the same Titanic/Iris datasets. You won't.

You can generate realistic synthetic data with Python (faker, numpy) or adapt public sources like data.gov.in, RBI/NPCI summaries, and open municipal data.

Project 1 — Kirana Store Sales Analysis (Excel + SQL + Power BI)

Business question: Which products, days and outlets drive revenue, and where is the kirana chain leaking margin?

The data: a sales table — date, outlet_id, city, product, category, quantity, unit_price, cost.

Steps:

  1. Clean in Power Query: fix date formats, remove duplicate bills, handle negative quantities (returns).
  2. Model: build a star schema — Sales fact + Date, Product, Outlet dimensions.
  3. SQL analysis to answer specific questions:
SELECT city, category,
       SUM(quantity * unit_price)         AS revenue,
       SUM(quantity * (unit_price - cost)) AS margin
FROM sales
GROUP BY city, category
ORDER BY margin DESC;
  1. DAX measures: Total Sales, Gross Margin %, Month-over-Month growth, Top-N products.
  2. Dashboard: revenue by outlet, category mix, weekday vs. weekend patterns, slow-moving stock.

Insights to highlight: "Top 20% of SKUs drove 68% of revenue; staples had high volume but thin margin while packaged snacks carried the margin — recommend shelf-space rebalancing."

Business question: When and where do people transact most, and which merchant categories dominate?

The data: a transactions table — txn_id, timestamp, user_id, merchant, category, amount, city, status.

Steps:

  1. Load into Python with pandas; parse timestamps to extract hour, day-of-week, month.
  2. Clean: drop failed/duplicate transactions, flag outlier amounts.
  3. SQL/pandas analysis:
SELECT category,
       COUNT(*)        AS txns,
       ROUND(AVG(amount),0) AS avg_ticket
FROM transactions
WHERE status = 'SUCCESS'
GROUP BY category
ORDER BY txns DESC;
  1. Visualize with matplotlib/seaborn: hourly transaction heatmap, category share, success-rate trend.
  2. Optional: a simple cohort of repeat users.

Insights to highlight: "Transactions peak at 8-9 PM; food-delivery payments surge 22% on weekends; failure rate spikes during month-end salary credit — a reliability concern worth flagging."

Project 3 — Cab Demand Analysis (Python + Power BI)

Business question: Where and when is ride demand highest, and how should driver supply be positioned?

The data: a rides table — ride_id, request_time, pickup_zone, drop_zone, city, fare, wait_time, cancelled.

Steps:

  1. EDA in Python: demand by hour, zone and day; cancellation patterns.
  2. Feature work: peak vs. off-peak flag, surge windows, zone-level demand density.
  3. Power BI dashboard: a map/matrix of demand by zone and hour, average wait time, cancellation hotspots, fare distribution.
  4. Add a what-if slicer (e.g., simulate adding drivers in a high-cancellation zone).

Insights to highlight: "Morning 9-11 AM office-corridor demand outstrips supply, driving a 31% cancellation rate in two zones; reallocating drivers there could recover an estimated ₹X in lost rides."

How to present your portfolio

  • GitHub repo per project with a clear README: business question → data → approach → key insight.
  • Publish dashboards to the Power BI Service (or share PDF/screenshots).
  • Write a short LinkedIn post per project — visibility brings recruiters.
  • Link everything from your resume header.

What makes a project "interview-ready"

  • It answers a business question, not just "I made charts."
  • It shows the full pipeline: clean → model → analyze → visualize → recommend.
  • It ends with a decision or recommendation, with a number attached.
  • You can explain it in 60 seconds and defend your choices.

Three solid projects beat ten half-finished ones. Pick these, build them properly, and you'll have a portfolio that does the talking before you even enter the room.

Related: Power BI Portfolio Projects That Get You Hired · Build your skill score

Don't just read. Prove your skill on DevWithData.

DevWithData

Shashikant

· Founder, DevWithData

Data professional and Power BI instructor. Building DevWithData to help analysts prove their skills, not just collect certificates.

Reading is not enough. Prove your skill.

DevWithData measures your actual ability with the Data Readiness Index. Stop reading — start practicing.

Continue Learning