:::: MENU ::::

How to delete duplicate rows from a table

  • Jan 07 / 2010
  • 1
dbDigger, T-SQL Interview Questions, T-SQL Scripts, T-SQL Tips and Tricks

How to delete duplicate rows from a table

A DBA may be often required to delete duplicate rows from a SQL Server database table. Several methods can be used keeping in view the resources and exact situation. Here I will create a specific scenario. After going through this scenario, you will be able to remove the duplicate rows in any given scenario on ground.
Run following script to generate duplicate data in a table named “dups” created in any database that you select. Here I will use “pubs”.

 
-- Create table in any database  
 create table Dups  
 (col1 varchar(50),  
 col2 varchar(50),  
 col3 varchar(50))  
 GO  
 -- Populate the table for duplicate rows  
 INSERT INTO dups  
 SELECT 'Val1','val2','Val3'  
 UNION ALL  
 SELECT 'Val1','Val2','Val13'  
 UNION ALL  
 SELECT 'Val1','Val2','Val3'  
 UNION ALL  
 SELECT 'Val1','Val2','Val13'  
 UNION ALL  
 SELECT 'Val11','Val12','Val3'  
 UNION ALL  
 SELECT 'Val16','Val18','Val20'  
 UNION ALL  
 SELECT 'Val11','Val12','Val3'  
 UNION ALL  
 SELECT 'Val16','Val18','Val20'  
 UNION ALL  
 SELECT 'Val6','Val7','Val6'  
 GO  

Table is created and populated with duplicate rows in it now.
Total rows are 9
Distinct rows are 5, which we have to preserve at the end.
Now we have to remove all duplicate rows, leaving behind just 5 distinct rows in three columns.

Execute the following script to complete the operation

 
-- Remove duplicate rows  
 ALTER TABLE dups  
 ADD ser SMALLINT PRIMARY KEY identity(1,1)  
 GO  
 DELETE FROM dups  
 WHERE ser not in (SELECT max(ser) FROM dups  
 GROUP BY col1,col2,col3)  
 GO  

At the end now we have all duplicate rows removed and our table is with an identity column “ser”. You may also drop this column by using

 -- Drop the temporary column created  
 ALTER TABLE dups  
 DROP COLUMN ser  
 GO  

but for that you have to first drop the constraint attached with it. As in my case it is as follows

 -- Drop PK constraint  
 ALTER TABLE dups  
 DROP PK__dups__286302EC  
 GO  

So it is over. You may get include any number of columns in your distinct rows wish list. You may modify the ‘group by’ parameters to meet any number of columns to consider in duplicate row . Query uses the magic of ‘group by.’ Here it is appropriate to mention that “distinct” and “group by” both produce same result and almost same resource utilization. But one of important difference between both is that group by makes you able to apply an aggregate function on each group of data separately. That fact has used here with “max” in our duplicate removal query.

Click here to read an excellent article on this topic by Arshad Ali. It covers different ways to deal with duplicate rows.

  • Anonymous

    This question is mandatory in all beginner level jobs related to SQL.

Consult us to explore the Databases. Contact us