Обсуждение: tune postgres for UPDATE
Hi, I have a table with a lot of columns (text and integer). It currently has 3Mio Rows. Updating a column in all rows (integer) takes endless (days). The column I update is not indexed. How can I tune postgres to do this much more quickly? VMstat looks like this: r b swpd free buff cache si so bi bo in cs us sy id wa 0 1 188 14160 16080 867064 0 0 880 888 168 479 1 2 0 97 1 1 188 15288 16080 865980 0 0 832 512 152 474 7 2 0 91 0 1 188 15464 16080 865348 0 0 872 592 144 461 2 1 0 97 so mostly iowait. iostat shows about 10000 block writes per second. My systems is debian-lenny (postgresql 8.3.5) I already increased checkpoint_segments to 32, shared_buffers to 200MB I also tried do disable autovacuum here is a sample statement: update users set price = (select price from prices where type = 'normal_price' and currency = users.currency) (the table price only has 30 rows) thank you very much! sebastian
try rewriting it to something like: update users set price = p.price from prices p where p.type = 'normal_price' and p.currency = users.currency;
2008/12/8 Grzegorz Jaśkiewicz <gryzman@gmail.com>
also avoid "fake" updates:
update users set price = p.price from prices p where p.type =
'normal_price' and p.currency = users.currency
and users.price is distinct from p.price;
try rewriting it to something like:
update users set price = p.price from prices p where p.type =
'normal_price' and p.currency = users.currency;
also avoid "fake" updates:
update users set price = p.price from prices p where p.type =
'normal_price' and p.currency = users.currency
and users.price is distinct from p.price;
If price change is a frequent operation, rethink the design
- maybe you could keep a pointer to "pricing group" instead of keeping separate price for every user.
--
Filip Rembiałkowski
On Mon, Dec 8, 2008 at 3:28 AM, Sebastian Böhm <seb@exse.net> wrote: > Hi, > > I have a table with a lot of columns (text and integer). > > It currently has 3Mio Rows. > > Updating a column in all rows (integer) takes endless (days). I'm afraid you may not understand how postgresql's MVCC implementation works here. Updating a row creates a new copy of the row and leaves the old copy in place. Running such an update several times in a row can result in a table that is mostly dead space and very slow to access, both for reads and writes. What does vacuum verbose tablename say about your table? Is there a valid reason you're updating every row? Do they all really need to change? > How can I tune postgres to do this much more quickly? Get a faster hard drive. > VMstat looks like this: > r b swpd free buff cache si so bi bo in cs us sy id > wa > 0 1 188 14160 16080 867064 0 0 880 888 168 479 1 2 0 > 97 > 1 1 188 15288 16080 865980 0 0 832 512 152 474 7 2 0 > 91 > 0 1 188 15464 16080 865348 0 0 872 592 144 461 2 1 0 > 97 Wow, that's a REALLY REALLY slow drive subsystem. Here's the numbers from my laptop while updating a similar table, with 1.2 million rows (update table xxx set y=y+1 kinda query): procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- 2 0 43124 30344 79804 2709708 0 0 16 21156 913 2456 22 4 38 36 0 3 43124 26472 79808 2713384 0 0 80 20232 725 2163 22 2 44 32 0 2 43124 25656 79508 2714084 0 0 148 24200 706 2187 31 4 36 29 0 2 43124 29336 79400 2710700 0 0 0 23616 788 2577 36 5 33 26 Note that I'm writing out at 20+megs a second, you're not even hitting 1Meg. I've got pretty slow USB memory sticks that hit 8 to 10 megs a second. > so mostly iowait. > > iostat shows about 10000 block writes per second. Then either iostat or vmstat are lying to you. 10000 1k blocks per second is about 10 times as fast as we're seeing in vmstat. > > My systems is debian-lenny (postgresql 8.3.5) > > I already increased checkpoint_segments to 32, shared_buffers to 200MB > > I also tried do disable autovacuum Probably not your best move. it's there for a good reason. You can tune it to make it more or less aggresive, but this kind of update is likely causing plenty of bloating and turning off autovacuum is likely counterproductive. > > > here is a sample statement: > > update users set price = (select price from prices where type = > 'normal_price' and currency = users.currency) Any way to make that selective so it only updates the prices that need to be updated? > (the table price only has 30 rows) Then why don't you just FK to point to it instead of this?
Scott Marlowe wrote: > Sebastian Böhm wrote: > > I have a table with a lot of columns (text and integer). > > > > It currently has 3Mio Rows. > > > > Updating a column in all rows (integer) takes endless (days). > > I'm afraid you may not understand how postgresql's MVCC implementation > works here. Updating a row creates a new copy of the row and leaves > the old copy in place. Running such an update several times in a row > can result in a table that is mostly dead space and very slow to > access, both for reads and writes. As far as I know, that problem can be alleviated to some extent by using PostgreSQL 8.3 and creating the table with a fillfactor substantially less than 100. Then free space is left in database blocks during insert which can be used for later updates. This reduces the number of blocks accessed per update and also the number of index updates if the changed column is not indexed. Moreover, row pruning can kick in if the row is updated more than once, reducing the amount of dead space. Yours, Laurenz Albe
On Tue, Dec 9, 2008 at 3:17 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Scott Marlowe wrote: >> Sebastian Böhm wrote: >> > I have a table with a lot of columns (text and integer). >> > >> > It currently has 3Mio Rows. >> > >> > Updating a column in all rows (integer) takes endless (days). >> >> I'm afraid you may not understand how postgresql's MVCC implementation >> works here. Updating a row creates a new copy of the row and leaves >> the old copy in place. Running such an update several times in a row >> can result in a table that is mostly dead space and very slow to >> access, both for reads and writes. > > As far as I know, that problem can be alleviated to some extent by using > PostgreSQL 8.3 and creating the table with a fillfactor substantially less > than 100. > > Then free space is left in database blocks during insert which can be > used for later updates. This reduces the number of blocks accessed per > update and also the number of index updates if the changed column is not > indexed. > > Moreover, row pruning can kick in if the row is updated more than once, > reducing the amount of dead space. I'm pretty sure you'd have to vacuum still in between runs or the extra fill factor space would only get used the first time. I.e.: create table fill factor 50% load data into table update whole table -- 50% free space gets used. (should vacuum here but didn't) update whole table -- boom, new tuples are added onto the end of the table. What I don't know is if the new tuples added at the end of the table will have a fill factor of 50%. I'd expect so, in which case it might help a bit.
On Tue, Dec 9, 2008 at 9:13 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
I'm pretty sure you'd have to vacuum still in between runs or the
extra fill factor space would only get used the first time. I.e.:
create table fill factor 50%
load data into table
update whole table -- 50% free space gets used.
(should vacuum here but didn't)
update whole table -- boom, new tuples are added onto the end of the table.
Not really with 8.3. Assuming there are no long running transactions covering both the updates, HOT will free up the space used by the old (now DEAD) tuples and the second update will reuse that, at least for for the most common cases.
Thanks,
Pavan
--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com