Finding duplicate values in a SQL table

Alex Source

It's easy to find duplicates with one field:

SELECT name, COUNT(email) 
FROM users
GROUP BY email
HAVING COUNT(email) > 1

So if we have a table

ID   NAME   EMAIL
1    John   [email protected]
2    Sam    [email protected]
3    Tom    [email protected]
4    Bob    [email protected]
5    Tom    [email protected]

This query will give us John, Sam, Tom, Tom because they all have the same email.

However, what I want is to get duplicates with the same email and name.

That is, I want to get "Tom", "Tom".

The reason I need this: I made a mistake, and allowed to insert duplicate name and email values. Now I need to remove/change the duplicates, so I need to find them first.

sqlduplicates

Answers

answered 8 years ago gbn #1

SELECT
    name, email, COUNT(*)
FROM
    users
GROUP BY
    name, email
HAVING 
    COUNT(*) > 1

Simply group on both of the columns.

Note: the older ANSI standard is to have all non-aggregated columns in the GROUP BY but this has changed with the idea of "functional dependency":

In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.

Support is not consistent:

answered 8 years ago Chris Van Opstal #2

Try this:

SELECT name, email
FROM users
GROUP BY name, email
HAVING ( COUNT(*) > 1 )

answered 8 years ago KM. #3

try this:

declare @YourTable table (id int, name varchar(10), email varchar(50))

INSERT @YourTable VALUES (1,'John','John-email')
INSERT @YourTable VALUES (2,'John','John-email')
INSERT @YourTable VALUES (3,'fred','John-email')
INSERT @YourTable VALUES (4,'fred','fred-email')
INSERT @YourTable VALUES (5,'sam','sam-email')
INSERT @YourTable VALUES (6,'sam','sam-email')

SELECT
    name,email, COUNT(*) AS CountOf
    FROM @YourTable
    GROUP BY name,email
    HAVING COUNT(*)>1

OUTPUT:

name       email       CountOf
---------- ----------- -----------
John       John-email  2
sam        sam-email   2

(2 row(s) affected)

if you want the IDs of the dups use this:

SELECT
    y.id,y.name,y.email
    FROM @YourTable y
        INNER JOIN (SELECT
                        name,email, COUNT(*) AS CountOf
                        FROM @YourTable
                        GROUP BY name,email
                        HAVING COUNT(*)>1
                    ) dt ON y.name=dt.name AND y.email=dt.email

OUTPUT:

id          name       email
----------- ---------- ------------
1           John       John-email
2           John       John-email
5           sam        sam-email
6           sam        sam-email

(4 row(s) affected)

to delete the duplicates try:

DELETE d
    FROM @YourTable d
        INNER JOIN (SELECT
                        y.id,y.name,y.email,ROW_NUMBER() OVER(PARTITION BY y.name,y.email ORDER BY y.name,y.email,y.id) AS RowRank
                        FROM @YourTable y
                            INNER JOIN (SELECT
                                            name,email, COUNT(*) AS CountOf
                                            FROM @YourTable
                                            GROUP BY name,email
                                            HAVING COUNT(*)>1
                                        ) dt ON y.name=dt.name AND y.email=dt.email
                   ) dt2 ON d.id=dt2.id
        WHERE dt2.RowRank!=1
SELECT * FROM @YourTable

OUTPUT:

id          name       email
----------- ---------- --------------
1           John       John-email
3           fred       John-email
4           fred       fred-email
5           sam        sam-email

(4 row(s) affected)

answered 5 years ago gaurav singh #4

Try the following:

SELECT * FROM
(
    SELECT Id, Name, Age, Comments, Row_Number() OVER(PARTITION BY Name, Age ORDER By Name)
        AS Rank 
        FROM Customers
) AS B WHERE Rank>1

answered 4 years ago xDBA #5

In case you work with Oracle, this way would be preferable:

create table my_users(id number, name varchar2(100), email varchar2(100));

insert into my_users values (1, 'John', '[email protected]');
insert into my_users values (2, 'Sam', '[email protected]');
insert into my_users values (3, 'Tom', '[email protected]');
insert into my_users values (4, 'Bob', '[email protected]');
insert into my_users values (5, 'Tom', '[email protected]');

commit;

select *
  from my_users
 where rowid not in (select min(rowid) from my_users group by name, email);

answered 4 years ago shekhar singh #6

If you wish to see if there is any duplicate rows in your table, I used below Query:

create table my_table(id int, name varchar(100), email varchar(100));

insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (1, 'shekh', '[email protected]');
insert into my_table values (2, 'Aman', '[email protected]');
insert into my_table values (3, 'Tom', '[email protected]');
insert into my_table values (4, 'Raj', '[email protected]');


Select COUNT(1) As Total_Rows from my_table 
Select Count(1) As Distinct_Rows from ( Select Distinct * from my_table) abc 

answered 4 years ago Tanmay Nehete #7

try this code

WITH CTE AS

( SELECT Id, Name, Age, Comments, RN = ROW_NUMBER()OVER(PARTITION BY Name,Age ORDER BY ccn)
FROM ccnmaster )
select * from CTE 

answered 4 years ago naveed #8

 select emp.ename, emp.empno, dept.loc 
          from emp
 inner join dept 
          on dept.deptno=emp.deptno
 inner join
    (select ename, count(*) from
    emp
    group by ename, deptno
    having count(*) > 1)
 t on emp.ename=t.ename order by emp.ename
/

answered 4 years ago Muhammad Tahir #9

How we can count the duplicated values?? either it is repeated 2 times or greater than 2. just count them, not group wise.

as simple as

select COUNT(distinct col_01) from Table_01

answered 4 years ago Lauri Lubi #10

If you want to find duplicate data (by one or several criterias) and select the actual rows.

with MYCTE as (
    SELECT DuplicateKey1
        ,DuplicateKey2 --optional
        ,count(*) X
    FROM MyTable
    group by DuplicateKey1, DuplicateKey2
    having count(*) > 1
) 
SELECT E.*
FROM MyTable E
JOIN MYCTE cte
ON E.DuplicateKey1=cte.DuplicateKey1
    AND E.DuplicateKey2=cte.DuplicateKey2
ORDER BY E.DuplicateKey1, E.DuplicateKey2, CreatedAt

http://developer.azurewebsites.net/2014/09/better-sql-group-by-find-duplicate-data/

answered 4 years ago Anil #11

SELECT
  FirstName, LastName, MobileNo, COUNT(1) as CNT 
FROM        
  CUSTOMER
GROUP BY
  FirstName, LastName, MobileNo 
HAVING
  COUNT(1) > 1;

answered 3 years ago user4877838 #12

SELECT id, COUNT(id) FROM table1 GROUP BY id HAVING COUNT(id)>1;

I think this will work properly to search repeated values in a particular column.

answered 3 years ago PRADEEPTA VIRLLEY #13

 SELECT name, email 
    FROM users
    WHERE email in
    (SELECT email FROM users
    GROUP BY email 
    HAVING COUNT(*)>1)

answered 3 years ago Indivision Dev #14

A little late to the party but I found a really cool workaround to finding all duplicate IDs:

SELECT GROUP_CONCAT( id )
FROM users
GROUP BY email
HAVING ( COUNT(email) > 1 )

answered 3 years ago AncAinu #15

If you want to delete the duplicates, here's a much simpler way to do it than having to find even/odd rows into a triple sub-select:

SELECT id, name, email 
FROM users u, users u2
WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id

And so to delete:

DELETE FROM users
WHERE id IN (
    SELECT id/*, name, email*/
    FROM users u, users u2
    WHERE u.name = u2.name AND u.email = u2.email AND u.id > u2.id
)

Much more easier to read and understand IMHO

Note: The only issue is that you have to execute the request until there is no rows deleted, since you delete only 1 of each duplicate each time

answered 2 years ago veritaS #16

This should also work, maybe give it try.

  Select * from Users a
            where EXISTS (Select * from Users b 
                where (     a.name = b.name 
                        OR  a.email = b.email)
                     and a.ID != b.id)

Especially good in your case If you search for duplicates who have some kind of prefix or general change like e.g. new domain in mail. then you can use replace() at these columns

answered 2 years ago Darrel Lee #17

This is the easy thing I've come up with. It uses a common table expression (CTE) and a partition window (I think these features are in SQL 2008 and later).

This example finds all students with duplicate name and dob. The fields you want to check for duplication go in the OVER clause. You can include any other fields you want in the projection.

with cte (StudentId, Fname, LName, DOB, RowCnt)
as (
SELECT StudentId, FirstName, LastName, DateOfBirth as DOB, SUM(1) OVER (Partition By FirstName, LastName, DateOfBirth) as RowCnt
FROM tblStudent
)
SELECT * from CTE where RowCnt > 1
ORDER BY DOB, LName

answered 2 years ago Panky031 #18

SELECT * FROM users u where rowid = (select max(rowid) from users u1 where
u.email=u1.email);

answered 2 years ago Narendra #19

select name, email
, case 
when ROW_NUMBER () over (partition by name, email order by name) > 1 then 'Yes'
else 'No'
end "duplicated ?"
from users

answered 2 years ago Debendra Dash #20

select id,name,COUNT(*) from India group by Id,Name having COUNT(*)>1

answered 2 years ago Debendra Dash #21

By Using CTE also we can find duplicate value like this

with MyCTE
as
(
select Name,EmailId,ROW_NUMBER() over(PARTITION BY EmailId order by id) as Duplicate from [Employees]

)
select * from MyCTE where Duplicate>1

answered 2 years ago Martin Silovsk√Ĺ #22

This selects/deletes all duplicate records except one record from each group of duplicates. So, the delete leaves all unique records + one record from each group of the duplicates.

Select duplicates:

SELECT *
FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

Delete duplicates:

DELETE FROM table
WHERE
    id NOT IN (
        SELECT MIN(id)
        FROM table
        GROUP BY column1, column2
);

Be aware of larger amounts of records, it can cause performance problems.

answered 10 months ago rahul kumar #23

SELECT column_name,COUNT(*) FROM TABLE_NAME GROUP BY column1, HAVING COUNT(*) > 1;

comments powered by Disqus