To find records that exist in one table but not in another, either the LEFT JOIN ... WHERE ... IS NULL approach or the NOT EXISTS subquery, depending on performance and readability.
For example, let’s say I have two tables — Customers and Orders. If I want to find customers who haven’t placed any orders, I’d write something like:
SELECT c.CustomerID, c.CustomerName
FROM Customers c
LEFT JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.CustomerID IS NULL;
This query works by joining all customers with orders and then filtering out those who have matching records in the Orders table. The ones with NULL on the right side are the customers with no orders.
Alternatively, I sometimes use NOT EXISTS, which can perform better, especially when the dataset is large and indexes are properly set:
SELECT c.CustomerID, c.CustomerName
FROM Customers c
WHERE NOT EXISTS (
SELECT 1
FROM Orders o
WHERE o.CustomerID = c.CustomerID
);
In practice, I’ve used this approach for validating data integrity between systems — for instance, checking records that were inserted into a staging table but not yet processed into the main table. In one project, we used it to identify transactions that failed to sync between ERP and reporting databases.
A challenge I’ve faced is performance when both tables are huge — joins or subqueries can become slow. To handle that, I’ve used indexed columns for joins or sometimes relied on temporary or staging tables to pre-filter data.
As for limitations, NOT IN can be tricky because if the subquery returns NULL, the entire result becomes empty, so I generally avoid that for safety. Between the two main methods, NOT EXISTS is usually more reliable and scalable.
