Removing duplicate records from a bulk upload

Поиск

Список

Период

Сортировка

От	Daniel Begin
Тема	Removing duplicate records from a bulk upload
Дата	8 декабря 2014 г. 06:31:49
Msg-id	COL129-DS2247F5657A1B6114A3819194640@phx.gbl обсуждение исходный текст
Ответы	Re: Removing duplicate records from a bulk upload (Andy Colson <andy@squeakycode.net>)
Список	pgsql-general

Дерево обсуждения

I have just completed the bulk upload of a large database. Some tables have billions of records and no constraints or indexes have been applied yet. About 0.1% of these records may have been duplicated during the upload and I need to remove them before applying constraints.

I understand there are (at least) two approaches to get a table without duplicate records…

- Delete duplicate records from the table based on an appropriate select clause;

- Create a new table with the results from a select distinct clause, and then drop the original table.

What would be the most efficient procedure in PostgreSQL to do the job considering …

- I do not know which records were duplicated;

- There are no indexes applied on tables yet;

- There is no OIDS on tables yet;

- The database is currently 1TB but I have plenty of disk space.

Daniel

В списке pgsql-general по дате отправления:

Предыдущее

От: David G Johnston
Дата: 08 декабря 2014 г., 05:18:18
Сообщение: Re: FW: SQL rolling window without aggregation

Следующее

От: Francisco Olarte
Дата: 08 декабря 2014 г., 12:28:28
Сообщение: Re: Strange behavior in generate_series(date, date, interval) with DST

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Removing duplicate records from a bulk upload

Предыдущее

Следующее