Обсуждение: BUG #6275: Horrible performance regression
The following bug has been logged online: Bug reference: 6275 Logged by: Finlay Thompson Email address: finlay@dragonfly.co.nz PostgreSQL version: 8.4.9 Operating system: Ubuntu 11.04 Description: Horrible performance regression Details: After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~ 3 hours to not finishing after 24 hours. Context: We have a continuous build script, that loads, grooms, matches data every few hours. The script has been run over 100 times in the last few weeks, and has steadily been taking around 3 hours. It is really a group of scripts, something like 15000 lines of sql, and some python scripts to run it together. After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9, the script suddenly stopped working, and consuming all the ram (16GB) on the computer (i7). I spent about a day trying to diagnose the problem, with some help from #postgresql, and ran through the script step by step, but was unable to find any specific issue. Today I reinstalled the 8.4.8 version, and it built the database in 3:02 hours, as before. From this, I reckon something is broken in the 8.4.9 release. I would love to isolate the problem, and am happy to try anything people suggest. Sorry I can't be more specific about the problem, but I thought I should report the issue. btw: The database uses plpython and postgis. Regards, Finlay Thompson
"Finlay Thompson" <finlay@dragonfly.co.nz> writes: > After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~ > 3 hours to not finishing after 24 hours. > Context: We have a continuous build script, that loads, grooms, matches data > every few hours. The script has been run over 100 times in the last few > weeks, and has steadily been taking around 3 hours. It is really a group of > scripts, something like 15000 lines of sql, and some python scripts to run > it together. > After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9, > the script suddenly stopped working, and consuming all the ram (16GB) on the > computer (i7). Well, the most likely theory seems to be that one of the planner changes in 8.4.9 is resulting in a much less optimal plan choice for one or more queries, but it's impossible to diagnose it on the basis of this much information. > btw: The database uses plpython and postgis. Hmm, did you change postgis versions at the same time? If so, which upgrade caused the problem, postgres or postgis? regards, tom lane
Hi Tom Sorry for the delay. The version of postgis didn't change, and when I rolled back to postgresql 8.4.9 the expected performance returned. I am very confident the regression was caused by the 8.4.9 upgrade. Any suggestions on how to diagnose a long run script like this ? I can easily turn on more logging, in both cases and compare ? Can I log the query plans ? Finlay On Fri, 2011-10-28 at 00:53 -0400, Tom Lane wrote: > "Finlay Thompson" <finlay@dragonfly.co.nz> writes: > > After an upgrade from 8.4.8 to 8.4.9 performance of load script went from ~ > > 3 hours to not finishing after 24 hours. > > > Context: We have a continuous build script, that loads, grooms, matches data > > every few hours. The script has been run over 100 times in the last few > > weeks, and has steadily been taking around 3 hours. It is really a group of > > scripts, something like 15000 lines of sql, and some python scripts to run > > it together. > > > After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9, > > the script suddenly stopped working, and consuming all the ram (16GB) on the > > computer (i7). > > Well, the most likely theory seems to be that one of the planner changes > in 8.4.9 is resulting in a much less optimal plan choice for one or more > queries, but it's impossible to diagnose it on the basis of this much > information. > > > btw: The database uses plpython and postgis. > > Hmm, did you change postgis versions at the same time? If so, which > upgrade caused the problem, postgres or postgis? > > regards, tom lane -- Finlay Thompson www.dragonfly.co.nz Dragonfly, PO Box 27535, Wellington 6141, New Zealand Level 5, 158 Victoria Street, Te Aro, Wellington M: +64 21 426 878 T: +64 4 385 9285
On Sun, Oct 30, 2011 at 10:30 PM, Finlay Thompson <finlay@dragonfly.co.nz> wrote: > Hi Tom > > Sorry for the delay. > > The version of postgis didn't change, and when I rolled back to > postgresql 8.4.9 the expected performance returned. I am very confident > the regression was caused by the 8.4.9 upgrade. > > Any suggestions on how to diagnose a long run script like this ? I can > easily turn on more logging, in both cases and compare ? Can I log the > query plans ? The easiest thing to do is probably EXPLAIN ANALYZE each query and post the output to this thread. If you can't do that for some reason, there's always auto_explain, but that has some overhead and is a bit more work to set up. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Oct 31, 2011 at 1:50 PM, Robert Haas <robertmhaas@gmail.com> wrote: > If you can't do that for some reason, there's always auto_explain, but > that has some overhead and is a bit more work to set up. If it's a massive load of data with a lot of queries, I think his best bet is to log the queries with log_min_duration_statement = <something sensible> (I'd try to start with 30ms) then analyze the log results with pgFouine or any other log analyzer for both versions. -- Guillaume
On 28/10/11 15:42, Finlay Thompson wrote: > After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9, > the script suddenly stopped working, and consuming all the ram (16GB) on the > computer (i7). > > If there is one query exhausting all ram, then it could be tricky to catch it in the logs (log duration and friends log the query time on completion... which might be never). If you have vm.overcommit_memory set to 2 then you may get the offending statement in the log when the postgres backend errors out with ENOMEM. If you know the approx time at which memory starts filling up then getting (most of) the query text from pg_stat_activity would enable you to track down the offending sql in your load script. regards Mark
Hi Mark Thanks for your suggestions (and Robert and Guillaume). I will try the overcommit_memory=2 idea out in a couple of weeks. At the moment I am running everything on 8.4.8, which is working great, and have pressing deadlines looming (next Wednesday). I'll let you know when I do. I want to find a test case that exhibits a real postgresql bug, or informs me as to what I have done wrong. Regards, Finaly On Tue, 2011-11-01 at 17:26 +1300, Mark Kirkwood wrote: > On 28/10/11 15:42, Finlay Thompson wrote: > > After upgrading the postgresql*-8.4 packages on ubuntu, to version 8.4.9, > > the script suddenly stopped working, and consuming all the ram (16GB) on the > > computer (i7). > > > > > > If there is one query exhausting all ram, then it could be tricky to > catch it in the logs (log duration and friends log the query time on > completion... which might be never). If you have vm.overcommit_memory > set to 2 then you may get the offending statement in the log when the > postgres backend errors out with ENOMEM. > > If you know the approx time at which memory starts filling up then > getting (most of) the query text from pg_stat_activity would enable you > to track down the offending sql in your load script. > > regards > > Mark -- Finlay Thompson www.dragonfly.co.nz Dragonfly, PO Box 27535, Wellington 6141, New Zealand Level 5, 158 Victoria Street, Te Aro, Wellington M: +64 21 426 878 T: +64 4 385 9285