« My first computer | Main | REST architecture »

Deleting duplicated rows in SQL

I came across the problem of duplicate keys in a PostgreSQL database today. There was (unfortunately) no UNIQUE constraint on the email column where there were duplicates. I wanted to manually effect uniqueness on the email column. The solution took awhile, and strained my limited knowledge of SQL...

DELETE FROM table WHERE uid NOT IN
(SELECT min(uid) FROM table GROUP BY email);

Predictably, this took a long time to execute. Is there a more efficient (or graceful) way of doing this? There's no index on the nested SELECT, so this method requires a linear search through the results of the nested SELECT for each row of the table.

TrackBack

TrackBack URL for this entry:
http://www.exothermia.net/comcafcon/mt-tb.cgi/10

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)