Removing duplicate records from a bulk upload

Поиск
Список
Период
Сортировка
От Daniel Begin
Тема Removing duplicate records from a bulk upload
Дата
Msg-id COL129-DS2247F5657A1B6114A3819194640@phx.gbl
обсуждение исходный текст
Ответы Re: Removing duplicate records from a bulk upload  (Andy Colson <andy@squeakycode.net>)
Список pgsql-general

I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the upload and I need to remove them before applying constraints.

 

I understand there are (at least) two approaches to get a table without duplicate records…

-           Delete duplicate records from the table based on an appropriate select clause;

-           Create a new table with the results from a select distinct clause, and then drop the original table.

 

What would be the most efficient procedure in PostgreSQL to do the job considering …

-           I do not know which records were duplicated;

-           There are no indexes applied on tables yet;

-           There is no OIDS on tables yet;

-           The database is currently 1TB but I have plenty of disk space.

Daniel

 

В списке pgsql-general по дате отправления:

Предыдущее
От: David G Johnston
Дата:
Сообщение: Re: FW: SQL rolling window without aggregation
Следующее
От: Francisco Olarte
Дата:
Сообщение: Re: Strange behavior in generate_series(date, date, interval) with DST