SQL DELETE DUPLICATE ROWS
Duplicate rows can often appear in a database due to data entry errors or merging tables. The DELETE statement in SQL can be used to remove these duplicates while keeping one instance of each unique row. This process often involves using ROW_NUMBER(), CTEs (Common Table Expressions), or subqueries.
The general approach involves identifying duplicates based on a subset of columns and then deleting rows with higher row numbers or IDs.
Example
This example removes duplicate rows from the Orders table, keeping only one instance of each product.
Code Example
-- Identify duplicates using ROW_NUMBER()
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Product ORDER BY Product) AS RowNum
FROM Orders
)
-- Delete rows where RowNum is greater than 1
DELETE FROM Orders
WHERE OrderID IN (
SELECT OrderID FROM CTE WHERE RowNum > 1
);
-- View the updated table
SELECT * FROM Orders;
Output
Product | Category | Price |
---|---|---|
Laptop | Electronics | 1300.00 |
Phone | Mobile Devices | 800.00 |
Tablet | Gadgets | 600.00 |
Smartwatch | Gadgets | 250.00 |
Explanation
- The ROW_NUMBER() function assigns a unique row number to each duplicate group.
- Only rows with RowNum > 1 are deleted.
- The updated table contains only unique rows for each product.