SQL JOINs Visual Guide (INNER, LEFT, RIGHT, FULL)
SQL JOINs Visual Guide (INNER, LEFT, RIGHT, FULL)
JOINs combine rows from two or more tables based on a related column. They are the single most important SQL skill for a data analyst — and the most common source of silently wrong numbers. This guide walks through every join type with a visual mental model and a small worked result, then covers the traps that bite even experienced analysts: fan-out, anti-joins, self-joins, and the WHERE-vs-ON subtlety in outer joins.
We'll use two simple tables throughout: customers (who orders) and orders (what they ordered), in the style of a Flipkart or Zomato dataset.
The Sample Data
Keep these two tiny tables in your head — every result below is computed from them.
customers
| customer_id | customer_name | city |
|---|---|---|
| 1 | Aarav | Mumbai |
| 2 | Diya | Delhi |
| 3 | Rohan | Bengaluru |
| 4 | Meera | Pune |
orders
| order_id | customer_id | order_amount |
|---|---|---|
| 101 | 1 | 500 |
| 102 | 1 | 250 |
| 103 | 2 | 900 |
| 104 | 99 | 150 |
Notice two deliberate quirks: customer 3 (Rohan) and 4 (Meera) have no orders, and order 104 references customer_id = 99, which has no matching customer (an orphan row). These edge cases are exactly what separate the join types.
The Mental Model
Picture two overlapping circles (a Venn diagram). The left circle is customers, the right is orders. The overlap is where a customer's customer_id matches an order's customer_id. Each join type keeps a different region of this picture.
INNER JOIN — Only the Overlap
INNER JOIN returns rows where the key exists in both tables. Customers with no orders, and orders with no matching customer, are dropped.
SELECT
c.customer_name,
o.order_id,
o.order_amount
FROM customers c
INNER JOIN orders o
ON c.customer_id = o.customer_id;
Result — only the three matched rows survive (Rohan, Meera, and the orphan order 104 all vanish):
| customer_name | order_id | order_amount |
|---|---|---|
| Aarav | 101 | 500 |
| Aarav | 102 | 250 |
| Diya | 103 | 900 |
Use INNER JOIN when you only care about matched records — e.g., "show customers and their actual orders." If you wanted to count all customers, INNER JOIN will undercount.
LEFT JOIN — Everything on the Left
LEFT JOIN (or LEFT OUTER JOIN) keeps every row from the left table. Where there's no match on the right, the right-side columns come back as NULL.
SELECT
c.customer_name,
o.order_id,
o.order_amount
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id;
Result — all four customers appear; Rohan and Meera get NULL order details:
| customer_name | order_id | order_amount |
|---|---|---|
| Aarav | 101 | 500 |
| Aarav | 102 | 250 |
| Diya | 103 | 900 |
| Rohan | NULL | NULL |
| Meera | NULL | NULL |
This is the most-used join for analysts because it preserves your base population. Want to count orders per customer including zeros?
SELECT
c.customer_name,
COUNT(o.order_id) AS order_count -- counts NULLs as 0
FROM customers c
LEFT JOIN orders o
ON c.customer_id = o.customer_id
GROUP BY c.customer_name;
This returns Aarav 2, Diya 1, Rohan 0, Meera 0. COUNT(o.order_id) ignores NULLs, so non-ordering customers correctly show 0. If you wrote COUNT(*), they'd show 1 — a classic bug.
RIGHT JOIN — Everything on the Right
RIGHT JOIN is the mirror of LEFT JOIN: it keeps every row from the right table. In practice it's rare, because you can always swap table order and use a LEFT JOIN, which reads more naturally.
SELECT c.customer_name, o.order_id, o.order_amount
FROM customers c
RIGHT JOIN orders o
ON c.customer_id = o.customer_id;
Result — all four orders appear; the orphan order 104 has NULL on the customer side:
| customer_name | order_id | order_amount |
|---|---|---|
| Aarav | 101 | 500 |
| Aarav | 102 | 250 |
| Diya | 103 | 900 |
| NULL | 104 | 150 |
This keeps all orders, even any with a missing/invalid customer_id — handy for spotting orphaned data. Most teams standardize on LEFT JOIN and avoid RIGHT JOIN for consistency.
FULL OUTER JOIN — Both Sides, Matched or Not
FULL OUTER JOIN keeps every row from both tables, matching where possible and filling NULLs where not. It's the whole Venn diagram.
SELECT
c.customer_name,
o.order_id,
o.order_amount
FROM customers c
FULL OUTER JOIN orders o
ON c.customer_id = o.customer_id;
Result — every customer and every order, matched where the key lines up:
| customer_name | order_id | order_amount |
|---|---|---|
| Aarav | 101 | 500 |
| Aarav | 102 | 250 |
| Diya | 103 | 900 |
| Rohan | NULL | NULL |
| Meera | NULL | NULL |
| NULL | 104 | 150 |
Use it for reconciliation — comparing two systems where you need to see records missing from either side. (MySQL doesn't support FULL OUTER JOIN directly; emulate it with a LEFT JOIN ... UNION ... RIGHT JOIN.)
CROSS JOIN — Every Combination
CROSS JOIN returns the Cartesian product: every row of the left table paired with every row of the right. With 4 customers and 4 orders you'd get 16 rows. There's no ON clause.
SELECT c.customer_name, o.order_id
FROM customers c
CROSS JOIN orders o;
You rarely want this on raw fact tables — an accidental cross join (forgetting the ON condition) is a common way to blow up row counts. But it's genuinely useful for generating scaffolds: e.g., a calendar of every date × every store_id so you can left-join sales onto it and see days with zero sales.
Anti-Joins — Finding What's Missing
An anti-join returns rows from one table that have no match in another. Perfect for "Which customers have never placed an order?" There's no ANTI JOIN keyword; you use LEFT JOIN ... WHERE ... IS NULL or NOT EXISTS.
-- Customers with zero orders
SELECT c.customer_name
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.customer_id
WHERE o.order_id IS NULL;
The LEFT JOIN brings all customers; non-ordering ones get NULL on the orders side; the WHERE ... IS NULL keeps only those. Result: Rohan and Meera. The NOT EXISTS version is often clearer and handles NULLs more safely:
SELECT c.customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.customer_id
);
Avoid NOT IN with a subquery that can return NULLs — a single NULL makes NOT IN return zero rows, a notorious gotcha.
Self-Joins — A Table Joined to Itself
A self-join joins a table to itself, using two aliases. The classic case is a hierarchy stored in one table — say an employees table where each row has a manager_id pointing at another row's emp_id.
SELECT
e.emp_name AS employee,
m.emp_name AS manager
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.emp_id;
Use LEFT JOIN here so the top boss (whose manager_id is NULL) still appears with a NULL manager. Self-joins also help compare rows within the same table — e.g., finding customers in the same city, or pairing each day's sales with the previous day (though a window function like LAG() is usually cleaner for that).
Multi-Table Joins
Real queries chain several joins. Stack them — each JOIN ... ON reads left to right, building on the running result.
SELECT
c.customer_name,
o.order_id,
p.product_name,
oi.quantity
FROM customers c
INNER JOIN orders o ON o.customer_id = c.customer_id
INNER JOIN order_items oi ON oi.order_id = o.order_id
INNER JOIN products p ON p.product_id = oi.product_id;
Two tips: (1) join on indexed key columns for speed, and (2) be deliberate about INNER vs LEFT at each step — one stray INNER JOIN in a chain can silently drop your base rows. If you start from customers and want to keep them all, every join after it must be a LEFT JOIN.
The Fan-Out Trap
This is the bug that quietly inflates revenue reports. Fan-out (row multiplication) happens when you join to a table that has multiple matching rows per key, and then aggregate.
Suppose each order has multiple line items in order_items. You join orders to items, then sum order_amount:
-- WRONG: order_amount is duplicated per line item
SELECT
SUM(o.order_amount) AS total_revenue -- inflated!
FROM orders o
JOIN order_items i ON i.order_id = o.order_id;
If order 101 (order_amount = ₹500) has 3 line items, the join produces 3 rows each showing ₹500, and SUM returns ₹1500. Your revenue triples silently.
How to detect it: compare row counts before and after the join. If SELECT COUNT(*) FROM orders is 4 but SELECT COUNT(*) after the join is 9, the join is one-to-many and any sum over an order-level column is now wrong. Another tell: the same order_id appearing on multiple rows when you expected it once.
Fixes:
- Aggregate the many-side before joining, using a CTE:
WITH item_counts AS (
SELECT order_id, COUNT(*) AS item_count
FROM order_items
GROUP BY order_id
)
SELECT
o.order_id,
o.order_amount,
ic.item_count
FROM orders o
LEFT JOIN item_counts ic ON ic.order_id = o.order_id;
- Or, if you only need one total, sum the order-level value over distinct orders first, then join the detail you need.
Always ask: "Is this join one-to-one or one-to-many?" If many, expect fan-out before aggregating.
JOIN vs Subquery vs EXISTS — Performance Notes
The same question can often be answered three ways. They are not always equivalent in result or speed.
- JOIN is best when you need columns from both tables in the output. Modern optimizers handle joins extremely well on indexed keys.
EXISTSis ideal for "does at least one match exist?" checks. It can short-circuit — the database stops scanning the inner table as soon as it finds the first match — which makes it fast for existence and anti-join logic. PreferNOT EXISTSoverNOT INwhenever NULLs are possible.IN (subquery)is fine for small, NULL-free lists, but a correlated subquery in theSELECTlist that runs once per outer row can be slow on large tables.- A JOIN can accidentally fan out and change your counts;
EXISTSnever does, because it only tests for existence and returns each outer row at most once. If you're filtering rather than fetching columns,EXISTSis often both safer and faster.
Rule of thumb: need data from the other table → JOIN; need a yes/no filter → EXISTS; and always check the query plan (EXPLAIN) on large tables rather than guessing.
Best Practices
- Always qualify columns with table aliases (
c.customer_id) to avoid ambiguity. - Default to LEFT JOIN to preserve your base table, then verify row counts before and after.
- Check cardinality to dodge fan-out: confirm whether the join key is unique on the right side.
- Use NOT EXISTS for anti-joins, not
NOT IN, when NULLs are possible. - Keep outer-join filters on the right table in the
ONclause, notWHERE(see the FAQ).
FAQ
What is the difference between INNER and LEFT JOIN?
INNER JOIN returns only rows where the key matches in both tables. LEFT JOIN returns every row from the left (first) table and fills the right-side columns with NULL when there's no match. Use INNER when you want matches only; use LEFT when you must keep your full base population — for example, counting orders per customer including customers with zero orders.
Why does my JOIN return duplicate rows?
Almost always because the join is one-to-many: the table on the other side has more than one row per join key, so each left row is repeated once per match. This is fan-out. Check with a COUNT(*) before and after the join, and confirm the join key is unique on the side you expected to be unique. If it isn't, pre-aggregate that table in a CTE before joining, or use EXISTS if you only need a filter.
WHERE vs ON in a LEFT JOIN — what's the difference?
A condition on the right-hand table placed in the ON clause filters before the join preserves unmatched rows, so non-matching left rows still appear with NULLs. The same condition in WHERE runs after the join and discards those NULL rows — silently turning your LEFT JOIN into an INNER JOIN. So LEFT JOIN orders o ON o.customer_id = c.customer_id AND o.order_amount > 300 keeps all customers, whereas moving o.order_amount > 300 to WHERE drops everyone without a qualifying order.
How do I count rows that have no match (an anti-join)?
Use LEFT JOIN ... WHERE right_table.key IS NULL, or NOT EXISTS. For example, customers with no orders: LEFT JOIN orders o ON o.customer_id = c.customer_id WHERE o.order_id IS NULL. Avoid NOT IN with a subquery that can contain NULLs, because a single NULL makes the whole NOT IN return zero rows.
Is RIGHT JOIN ever necessary?
Not really. Any RIGHT JOIN can be rewritten as a LEFT JOIN by swapping the table order, which reads more naturally and keeps your codebase consistent. Most teams standardize on LEFT JOIN and rarely write RIGHT JOIN at all.
Does the order of tables matter in an INNER JOIN?
For the result, no — A INNER JOIN B returns the same rows as B INNER JOIN A. For outer joins, order matters a lot (LEFT keeps the first table, RIGHT the second). For performance, the optimizer usually reorders tables itself, so write whichever order is clearest to read.
Summary
INNER keeps matches only; LEFT keeps the left table whole; RIGHT mirrors LEFT; FULL keeps everything; CROSS pairs every row with every row. The skills that separate beginners from pros are spotting fan-out before it inflates your sums, writing clean anti-joins to surface what's missing, and knowing when EXISTS beats a JOIN. Sketch the Venn diagram in your head, check your row counts, mind WHERE vs ON, and your numbers will stay honest.
Related: SQL + Power BI Challenge for Data Analysts · Practise SQL problems
Don't just read. Prove your skill on DevWithData.
Shashikant
· Founder, DevWithDataData professional and Power BI instructor. Building DevWithData to help analysts prove their skills, not just collect certificates.
Reading is not enough. Prove your skill.
DevWithData measures your actual ability with the Data Readiness Index. Stop reading — start practicing.
Continue Learning
Data Analyst Roadmap for India (2026)
If you are starting from zero and want a data analyst job in India, you do not need a degree in statistics or two years of courses. You need the right five skills in the right order: Excel, SQL, Power BI, a little Python, and a portfolio. This is a realistic 5-6 month roadmap built for Indian freshers and career-switchers, with month-by-month goals, the exact tools to learn, and the projects that get you shortlisted.
9 min read12 SQL Interview Puzzles with Solutions
Twelve SQL puzzles that show up again and again in Indian data analyst interviews, each with a clean, worked solution and the reasoning behind it. Covers duplicates, second-highest values, running totals, gaps and islands, self-joins, top-N per group, and more. Built on relatable e-commerce and HR tables so you can practice the patterns, not just memorise answers.
8 min readThe Second-Highest Salary SQL Problem (5 Ways to Solve It)
The second-highest salary question is asked in almost every entry-level SQL interview in India. This post solves it five different ways: subquery with MAX, LIMIT/OFFSET, DENSE_RANK, a correlated subquery, and a CTE. Each approach is explained with its trade-offs around ties, NULLs, and what happens when there is no second salary, so you can pick the right one under pressure.
8 min read