Performance on large data transformations

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Performance on large data transformations
Дата
Msg-id 200303170938.38365.josh@agliodbs.com
обсуждение исходный текст
Ответы Re: Performance on large data transformations  ("Ron Mayer" <ron@intervideo.com>)
Список pgsql-performance
Folks,

On one database, I have an overnight data transformation procedure that goes
like:

TableA has about 125,000 records.

Begin Transaction:
1) Update 80% of records in TableA
2) Update 10% of records in TableA
3) Update 65% of records in TableA
4) Update 55% of records in TableA
5) Update 15% or records in TableA with references to other records in TableA
6) Flag what hasn't been updated.
Commit

I've found that, no matter what my FSM settings (I've gone as high as
1,000,000) by the time I reach step 4 execution has slowed down considerably,
and for step 5 it takes the server more than 10 minutes to complete the
update statement. During this period, CPU, RAM and disk I/O are almost idle
... the system seems to spend all of its time doing lengthy seeks.  There is,
for that matter, no kernel swap activity, but I'm not sure how to measure
Postgres temp file activity.

(FYI:  Dual Athalon 1600mhz/1gb/Hardware Raid 1 with xlog on seperate SCSI
drive/Red Hat Linux 8.0/PostgreSQL 7.2.4)

The only way around this I've found is to break up the above into seperate
transactions with VACUUMs in between, and "simulate" a transaction by making
a back-up copy of the table and restoring from it if something goes wrong.
I've tried enough different methods to be reasonably certain that there is no
way around this in 7.2.4.

The reason I bring this up is that PostgreSQL's dramatic plunge in performance
in large serial updates is really problematic for us in the OLAP database
market, where large data transformations, as well as extensive use of
calculated temporary tables, is common.    I was particularly distressed when
I had to tell a client considering switching from MSSQL to Postgres for an
OLAP database that they might just be trading one set of problems for
another.

Is there any way we can improve on this kind of operation in future versions
of PostgreSQL?

--
Josh Berkus
Aglio Database Solutions
San Francisco

В списке pgsql-performance по дате отправления:

Предыдущее
От: Chris Sutton
Дата:
Сообщение: Re: postgresql meltdown on PlanetMath.org
Следующее
От: Max Baker
Дата:
Сообщение: Re: postgresql meltdown on PlanetMath.org