Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)

Поиск

Список

Период

Сортировка

От	Jonathan Vanasco
Тема	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)
Дата	13 декабря 2014 г. 00:46:21
Msg-id	A8038B1A-B4FD-4D61-A1B8-DB80BE3AB002@2xlp.com обсуждение исходный текст
Ответ на	Re: Removing duplicate records from a bulk upload (rationale behind selecting a method) (Daniel Begin <jfd553@hotmail.com>)
Список	pgsql-general

Дерево обсуждения

On Dec 8, 2014, at 9:35 PM, Scott Marlowe wrote:

> select a,b,c into newtable from oldtable group by a,b,c;
>
> On pass, done.

This is a bit naive, but couldn't this approach potentially be faster (depending on the system)?

    SELECT a, b, c INTO duplicate_records FROM ( SELECT a, b, c, count(*) AS counted FROM source_table GROUP BY a, b, c
)q_inner WHERE q_inner.counted > 1; 
    DELETE FROM source_table USING duplicate_records WHERE source_table.a = duplicate_records.a AND source_table.b =
duplicate_records.bAND source_table.c = duplicate_records.c; 

It would require multiple full table scans, but it would minimize the writing to disk -- and isn't a 'read' operation
usuallymuch more efficient than a 'write' operation?  If the duplicate checking is only done on a small subset of
columns,indexes could speed things up too.

В списке pgsql-general по дате отправления:

Предыдущее

От: Jonathan Vanasco
Дата: 13 декабря 2014 г., 00:40:30
Сообщение: function indexes, index only scan and sorting

Следующее

От: Tom Lane
Дата: 13 декабря 2014 г., 00:58:43
Сообщение: Re: function indexes, index only scan and sorting

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Removing duplicate records from a bulk upload (rationale behind selecting a method)

Предыдущее

Следующее