Re: Optimizing bulk update performance

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Optimizing bulk update performance
Дата
Msg-id 28751.1367037697@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Optimizing bulk update performance  (Yang Zhang <yanghatespam@gmail.com>)
Ответы Re: Optimizing bulk update performance
Список pgsql-general
Yang Zhang <yanghatespam@gmail.com> writes:
> It currently takes up to 24h for us to run a large set of UPDATE
> statements on a database, which are of the form:

>     UPDATE table SET field1 = constant1, field2 = constant2, ...  WHERE
>     id = constid

> (We're just overwriting fields of objects identified by ID.)

Forgive the obvious question, but you do have an index on "id", right?
Have you checked it's being used (ie EXPLAIN ANALYZE on one of these)?

> The tables have handfuls of indices each and no foreign key constraints.

How much is a "handful"?

> It takes 2h to import a `pg_dump` of the entire DB.  This seems like a
> baseline we should reasonably target.

Well, maybe.  You didn't say what percentage of the DB you're updating.

But the thing that comes to mind here is that you're probably incurring
a network round trip for each row, and maybe a query-planning round as
well, so you really can't expect that this is going to be anywhere near
as efficient as a bulk load operation.  You could presumably get rid of
the planner overhead by using a prepared statement.  Cutting the network
overhead is going to require a bit more ingenuity --- could you move
some logic into a stored procedure, perhaps, so that one command from
the client is sufficient to update multiple rows?

            regards, tom lane


В списке pgsql-general по дате отправления:

Предыдущее
От: Yang Zhang
Дата:
Сообщение: Re: Optimizing bulk update performance
Следующее
От: Yang Zhang
Дата:
Сообщение: Re: Optimizing bulk update performance