Обсуждение: trouble with (lack of) indexing
Hello I am currently involved in a rather large project relying heavily on the use of postgresql[1], but we have run into a rather annoying snag. We currently have two databases set up. One for testing, and one for production. Both run on FreeBSD, and perform very well since the last upgrade. However, here is the problem. When executing a certain select statement (shown below) on the production database, we get a sequential table scan (of a rather large table), which causes the machine on which it runs to max out all possible disk I/O. However, when the excact same query is executed on the test database, we get an index scan instead of a sequential one. Obviously, something differes between the two databases, but we simply cannot track down what it is. I would greatly appreciate any input you might be able to offer. Here are the queries and their accompanying query plans: On the development/testing database: --------------8<----------- => explain SELECT ht.id, -> hq.ip, -> hq.id -> FROM hostsqueue as hq, -> hoststests as ht -> WHERE ht.hostsqueue_id=hq.id -> ; NOTICE: QUERY PLAN: Merge Join (cost=0.00..121.50 rows=1000 width=44) -> Index Scan using hostsqueue_pkey on hostsqueue hq (cost=0.00..52.00 rows=1000 width=36) -> Index Scan using idx_htsts_hq on hoststests ht (cost=0.00..52.00 rows=1000 width=8) EXPLAIN => --------------8<----------- That looks fine and dandy, and performance is ok. This however is on the production database: --------------8<----------- => explain SELECT ht.id, -> hq.ip, -> hq.id -> FROM hostsqueue as hq, -> hoststests as ht -> WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Merge Join (cost=134514.31..136541.15 rows=129756 width=44) -> Sort (cost=76196.94..76196.94 rows=32200 width=36) -> Seq Scan on hostsqueue hq (cost=0.00..73786.00 rows=32200 width=36) -> Sort (cost=58317.37..58317.37 rows=129756 width=8) -> Seq Scan on hoststests ht (cost=0.00..47297.56 rows=129756 width=8) EXPLAIN => --------------8<----------- Obviously something differs. The table definitions are rather large, so I have made them available at: http://tb.tolkien.dk/~boll/devel-definition.txt and http://treebeard.tolkien.dk/~boll/production-definition.txt Any input, be it suggestions or otherwise, as to how we may fix this, would be greatly appreciated. Since I am a relatively new subscriber to the list, could you possibly cc me with any suggestions, since I am not entirely sure if I actually receive mail sent to the list yet. Thanks. [1] We choose postgresql over others because we liked the many features it offers beyond its closest competitors. -- Søren O. ,''`. : :' : public key: finger boll <at> db.debian.org `. `' `-
On Thu, 9 May 2002, Søren Boll Overgaard wrote: > Hello > > I am currently involved in a rather large project relying heavily on the use of > postgresql[1], but we have run into a rather annoying snag. > We currently have two databases set up. One for testing, and one for production. > Both run on FreeBSD, and perform very well since the last upgrade. > However, here is the problem. When executing a certain select statement (shown > below) on the production database, we get a sequential table scan (of a rather > large table), which causes the machine on which it runs to max out all possible > disk I/O. However, when the excact same query is executed on the test > database, we get an index scan instead of a sequential one. Obviously, > something differes between the two databases, but we simply cannot track down > what it is. I would greatly appreciate any input you might be able to offer. > Here are the queries and their accompanying query plans: Couple of quick questions... Have you run vacuum analyze on both databases? What are your settings in the postgresql.conf for cpu_tuple_cost and such? You can check them with 'show all' from psql in 7.2.x and later.
On Thu, May 09, 2002 at 03:38:20PM -0600, Scott Marlowe wrote: > On Thu, 9 May 2002, Søren Boll Overgaard wrote: > > > Hello > > > > I am currently involved in a rather large project relying heavily on the use of > > postgresql[1], but we have run into a rather annoying snag. > > We currently have two databases set up. One for testing, and one for production. > > Both run on FreeBSD, and perform very well since the last upgrade. > > However, here is the problem. When executing a certain select statement (shown > > below) on the production database, we get a sequential table scan (of a rather > > large table), which causes the machine on which it runs to max out all possible > > disk I/O. However, when the excact same query is executed on the test > > database, we get an index scan instead of a sequential one. Obviously, > > something differes between the two databases, but we simply cannot track down > > what it is. I would greatly appreciate any input you might be able to offer. > > Here are the queries and their accompanying query plans: > > Couple of quick questions... > > Have you run vacuum analyze on both databases? Just did. Since I am not the one maintaining the databases on a day to day basis I wasn't sure when it had last been done. I am pretty much the dimwit who was convinced to go and find a solution to the problem :) Oddly enough, after vacuum analyzing the two tables involved on the development database, it now performs sequential scans on bot tables, in excactly the same way as the production database: explain SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Hash Join (cost=776.05..2904.36 rows=84 width=24) -> Seq Scan on hoststests ht (cost=0.00..2126.84 rows=84 width=8) -> Hash (cost=776.04..776.04 rows=4 width=16) -> Seq Scan on hostsqueue hq (cost=0.00..776.04 rows=4 width=16) EXPLAIN > What are your settings in the postgresql.conf for cpu_tuple_cost and such? On both databases: NOTICE: cpu_tuple_cost is 0.01 NOTICE: cpu_index_tuple_cost is 0.001 Again, any input would be greatly appreciated. Perhaps we are missing an index somewhere? -- Søren O. ,''`. : :' : public key: finger boll <at> db.debian.org `. `' `-
=?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes: > explain SELECT ht.id,hq.ip,hq.id FROM > hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; > NOTICE: QUERY PLAN: > Hash Join (cost=776.05..2904.36 rows=84 width=24) > -> Seq Scan on hoststests ht (cost=0.00..2126.84 rows=84 width=8) > -> Hash (cost=776.04..776.04 rows=4 width=16) > -> Seq Scan on hostsqueue hq (cost=0.00..776.04 rows=4 width=16) If there's only 84 rows it's hardly going to matter which plan we choose ;-). Please show us the results from the production database, not the toy-sized tables. If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than plain EXPLAIN. Also, you could try doing "SET enable_seqscan = off" and see how the plan changes. regards, tom lane
On Thu, May 09, 2002 at 07:38:06PM -0400, Tom Lane wrote: > =?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes: > > explain SELECT ht.id,hq.ip,hq.id FROM > > hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; > > NOTICE: QUERY PLAN: > > > Hash Join (cost=776.05..2904.36 rows=84 width=24) > > -> Seq Scan on hoststests ht (cost=0.00..2126.84 rows=84 width=8) > > -> Hash (cost=776.04..776.04 rows=4 width=16) > > -> Seq Scan on hostsqueue hq (cost=0.00..776.04 rows=4 width=16) > > > If there's only 84 rows it's hardly going to matter which plan we choose > ;-). Please show us the results from the production database, not the > toy-sized tables. Sorry, my bad. Here is the real stuff: explain SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Hash Join (cost=75331.89..127823.94 rows=160293 width=24) -> Seq Scan on hoststests ht (cost=0.00..49686.93 rows=160293 width=8) -> Hash (cost=75243.51..75243.51 rows=35351 width=16) -> Seq Scan on hostsqueue hq (cost=0.00..75243.51 rows=35351 width=16) EXPLAIN > > If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than > plain EXPLAIN. Also, you could try doing "SET enable_seqscan = off" > and see how the plan changes. After changing the enable_seqscan, the plan looks like this: explain SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Merge Join (cost=0.00..772324.63 rows=160293 width=24) -> Index Scan using hostsqueue_pkey on hostsqueue hq (cost=0.00..137603.50 rows=35351 width=16) -> Index Scan using idx_htsts_hq on hoststests ht (cost=0.00..632228.35 rows=160293 width=8) EXPLAIN As far as I can tell, that is excactly what we want. I will let the guys who usually mess with the database have a go at it. Thanks for the input so far, I really appreciate it. -- Søren O. ,''`. : :' : public key: finger boll <at> db.debian.org `. `' `-
On Thu, May 09, 2002 at 07:38:06PM -0400, Tom Lane wrote: > =?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes: > > explain SELECT ht.id,hq.ip,hq.id FROM > > hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; > > NOTICE: QUERY PLAN: > > > Hash Join (cost=776.05..2904.36 rows=84 width=24) > > -> Seq Scan on hoststests ht (cost=0.00..2126.84 rows=84 width=8) > > -> Hash (cost=776.04..776.04 rows=4 width=16) > > -> Seq Scan on hostsqueue hq (cost=0.00..776.04 rows=4 width=16) > > > If there's only 84 rows it's hardly going to matter which plan we choose > ;-). Please show us the results from the production database, not the > toy-sized tables. > > If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than > plain EXPLAIN. Also, you could try doing "SET enable_seqscan = off" > and see how the plan changes. I missed the outputs of the explain analyze queries in the previous mail: explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Hash Join (cost=75331.89..127823.94 rows=160293 width=24) (actual time=122572.70..146683.59 rows=142807 loops=1) -> Seq Scan on hoststests ht (cost=0.00..49686.93 rows=160293 width=8) (actual time=170.79..21255.68 rows=142807 loops=1) -> Hash (cost=75243.51..75243.51 rows=35351 width=16) (actual time=122401.10..122401.10 rows=0 loops=1) -> Seq Scan on hostsqueue hq (cost=0.00..75243.51 rows=35351 width=16) (actual time=518.88..122024.50 rows=28610loops=1) Total runtime: 147598.79 msec EXPLAIN And after setting the enable_seqscan to off, I got this: => SET enable_seqscan = off; SET VARIABLE => explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Merge Join (cost=0.00..772324.63 rows=160293 width=24) (actual time=8319.78..116061.18 rows=141645 loops=1) -> Index Scan using hostsqueue_pkey on hostsqueue hq (cost=0.00..137603.50 rows=35351 width=16) (actual time=34.90..90831.97rows=28623 loops=1) -> Index Scan using idx_htsts_hq on hoststests ht (cost=0.00..632228.35 rows=160293 width=8) (actual time=18.41..23343.75rows=141645 loops=1) Total runtime: 116562.03 msec EXPLAIN Actually, I was rather hoping for a large gain in performance, but maybe the "total runtimes" are not actally representative of performance of the actual selects? -- Søren O. ,''`. : :' : public key: finger boll <at> db.debian.org `. `' `-
On Thu, 9 May 2002, Tom Lane wrote: > =?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes: > > explain SELECT ht.id,hq.ip,hq.id FROM > > hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; > > NOTICE: QUERY PLAN: > > > Hash Join (cost=776.05..2904.36 rows=84 width=24) > > -> Seq Scan on hoststests ht (cost=0.00..2126.84 rows=84 width=8) > > -> Hash (cost=776.04..776.04 rows=4 width=16) > > -> Seq Scan on hostsqueue hq (cost=0.00..776.04 rows=4 width=16) > > > If there's only 84 rows it's hardly going to matter which plan we choose > ;-). Please show us the results from the production database, not the > toy-sized tables. The original post had it, but presumably not from the most recent analyze: [quote] This however is on the production database: --------------8<----------- => explain SELECT ht.id, -> hq.ip, -> hq.id -> FROM hostsqueue as hq, -> hoststests as ht -> WHERE ht.hostsqueue_id=hq.id; NOTICE: QUERY PLAN: Merge Join (cost=134514.31..136541.15 rows=129756 width=44) -> Sort (cost=76196.94..76196.94 rows=32200 width=36) -> Seq Scan on hostsqueue hq (cost=0.00..73786.00 rows=32200 width=36) -> Sort (cost=58317.37..58317.37 rows=129756 width=8) -> Seq Scan on hoststests ht (cost=0.00..47297.56 rows=129756 width=8) EXPLAIN => [/end quote] It's quote a result set. I wouldn't want to be paging/scrolling through that lot. > If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than > plain EXPLAIN. Also, you could try doing "SET enable_seqscan = off" > and see how the plan changes. Would it be worth changing the explain output to include the stats., or a summary of them, used by the plan? Does the 10% rule still apply? If so it might be a nice addition to show the total number of rows expected in the table so that a quick 'eyeball' can confirm that is why a seqscan is being performed. Or am I just barking up a dead horse here since I'm now thinking it might be nice to include information about potential indexes and at what row count from them it's decided a seqscan is better. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
On Fri, 10 May 2002, Søren Boll Overgaard wrote: > [snip] > I missed the outputs of the explain analyze queries in the previous mail: > explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; > NOTICE: QUERY PLAN: > > Hash Join (cost=75331.89..127823.94 rows=160293 width=24) (actual time=122572.70..146683.59 rows=142807 loops=1) > -> Seq Scan on hoststests ht (cost=0.00..49686.93 rows=160293 width=8) (actual time=170.79..21255.68 rows=142807 loops=1) > -> Hash (cost=75243.51..75243.51 rows=35351 width=16) (actual time=122401.10..122401.10 rows=0 loops=1) > -> Seq Scan on hostsqueue hq (cost=0.00..75243.51 rows=35351 width=16) (actual time=518.88..122024.50 rows=28610loops=1) > Total runtime: 147598.79 msec > > EXPLAIN > > And after setting the enable_seqscan to off, I got this: > > => SET enable_seqscan = off; > SET VARIABLE > => explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE ht.hostsqueue_id=hq.id; > NOTICE: QUERY PLAN: > > Merge Join (cost=0.00..772324.63 rows=160293 width=24) (actual time=8319.78..116061.18 rows=141645 loops=1) > -> Index Scan using hostsqueue_pkey on hostsqueue hq (cost=0.00..137603.50 rows=35351 width=16) (actual time=34.90..90831.97rows=28623 loops=1) > -> Index Scan using idx_htsts_hq on hoststests ht (cost=0.00..632228.35 rows=160293 width=8) (actual time=18.41..23343.75rows=141645 loops=1) > Total runtime: 116562.03 msec > > EXPLAIN > > Actually, I was rather hoping for a large gain in performance, but maybe the > "total runtimes" are not actally representative of performance of the actual > selects? Yes they are. That is the time that is taken to produce the results since that is what explain analyze is doing. It doesn't cover the time taken to get those results somewhere where they can be used, which for such large result set could be significant (network transmission plus buffering in the client all before the application can use any of it). BTW, are you aware that those two things seem to be returning different numbers of rows from each table and consequently from the whole query? --- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > Or am I just barking up a dead horse here since I'm now thinking it > might be nice to include information about potential indexes and at > what row count from them it's decided a seqscan is better. There is no "row count at which it's decided a seqscan is better". The planner does cost estimates and picks the plan with the lowest estimated cost, so all the behavior is emergent from the cost estimation equations. I have not been able to think of any output format that would show rejected plan estimates without being so verbose as to be unreadable --- except in the most trivial cases, there are a *lot* of rejected plans, and the planner has no idea which of them might be interesting to a human. (You can try defining OPTIMIZER_DEBUG if you want to see it in action, but I don't think the output is very useful in practice.) What's worse is that (a) the only reason why you'd care is if the estimates are wrong, but (b) there is no way to tell anything about the actual cost that might be associated with a rejected plan fragment. To compare estimates to reality you really have to arrange for the plan to be the selected one, so that you can run it. regards, tom lane
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > BTW, are you aware that those two things seem to be returning > different numbers of rows from each table and consequently from the > whole query? What's even stranger is that the whole-table seqscan on hostsqueue is apparently running slower than the whole-table indexscan: > -> Seq Scan on hostsqueue hq (cost=0.00..75243.51 rows=35351 width=16) (actual time=518.88..122024.50 rows=28610loops=1) > -> Index Scan using hostsqueue_pkey on hostsqueue hq (cost=0.00..137603.50 rows=35351 width=16) (actual time=34.90..90831.97rows=28623 loops=1) Now that just ain't physically possible, given comparable starting conditions: an indexscan has to do more work, both CPU and I/O. I suspect that the indexscan test was run second and benefited from most of the table having been brought into RAM during the seqscan. If so, a repeat test will show different results. In any case, it'd seem that the planner is not making any major errors here: its estimates were not that far out of line from reality. It did pick the slightly slower plan, but the difference is well within what I'd call reasonable estimation error. regards, tom lane
On Thu, 9 May 2002, Tom Lane wrote: > "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > > Or am I just barking up a dead horse here since I'm now thinking it > > might be nice to include information about potential indexes and at > > what row count from them it's decided a seqscan is better. > > There is no "row count at which it's decided a seqscan is better". > The planner does cost estimates and picks the plan with the lowest > estimated cost, so all the behavior is emergent from the cost > estimation equations. But those cost estimates can be extrapolated and intersect points determined can't they? > I have not been able to think of any output format that would show > rejected plan estimates without being so verbose as to be unreadable > --- except in the most trivial cases, there are a *lot* of rejected > plans, and the planner has no idea which of them might be interesting > to a human. (You can try defining OPTIMIZER_DEBUG if you want to see > it in action, but I don't think the output is very useful in practice.) That's why I started thinking I was barking. > What's worse is that (a) the only reason why you'd care is if the > estimates are wrong, but (b) there is no way to tell anything about the > actual cost that might be associated with a rejected plan fragment. > To compare estimates to reality you really have to arrange for the plan > to be the selected one, so that you can run it. Good point. I was thinking that it might be nice for explain to show some justification for the choice of plan. However, I'm quite prepared to agree that this isn't a good idea to try and do. I haven't looked to see what explanation of the statistics and how the planner uses them there is in the documentation. It might be a good idea if there was some detail about it there, I mean detail such that someone can go to the documentation, query the tables and say 'Ah, so that's why it chose that plan.' (Well at least for reasonably simple cases). The trouble is I'm no where near qualified enough to it and so can't add it if it's not there already. -- Nigel J. Andrews Director --- Logictree Systems Limited Computer Consultants
"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes: > On Thu, 9 May 2002, Tom Lane wrote: >> There is no "row count at which it's decided a seqscan is better". >> The planner does cost estimates and picks the plan with the lowest >> estimated cost, so all the behavior is emergent from the cost >> estimation equations. > But those cost estimates can be extrapolated and intersect points determined > can't they? Well, many of the equations are nonlinear, and they all have multiple input values, not to mention various tunable parameters. Solving for the intersect points is theoretically possible but doesn't strike me as likely to be informative. > I haven't looked to see what explanation of the statistics and how the > planner uses them there is in the documentation. It might be a good > idea if there was some detail about it there, I mean detail such that > someone can go to the documentation, query the tables and say 'Ah, so > that's why it chose that plan.' The nitty-gritty documentation is in the source code. This is an open-source project, remember; we operate on the assumption that people who want to know how things work will look at the sources. Most of the interesting details for this purpose are in src/backend/optimizer/path/costsize.c with some tidbits in src/backend/utils/adt/selfuncs.c regards, tom lane
On Fri, 10 May 2002, Søren Boll Overgaard wrote: > > What are your settings in the postgresql.conf for cpu_tuple_cost and such? > > On both databases: > NOTICE: cpu_tuple_cost is 0.01 > NOTICE: cpu_index_tuple_cost is 0.001 Here's a quick and dirty explanation of what these settings mean to the query planner, as listed in src/backend/optimizer/path/costsize.c: random_page_cost Cost of a non-sequential page fetch cpu_tuple_cost Cost of typical CPU time to process a tuple cpu_index_tuple_cost Cost of typical CPU time to process an index tuple Since the initial numbers are just SWAGs (scientific wild ass guesses) it would behoove you to figure out where the break points really are in your setup. You can change the setting of enable_seqscan and enable_indexscan to force your planner to ignore one or the other, and time the output. It's pretty easy to do a quick timing test like so: time psql -c 'set enable_seqscan=off;select * from tellers' >/dev/null When you find a query that is just a little faster with seq scan than index or vice versa, then you can tune the three listed parameters above so that the planner just barely makes the right choices for your data. I've found that lower cpu_index_tuple_cost from the default 0.001 to 0.0005 or so made a big difference in my database, and setting random_page_cost = 1.5 or so was a good help too. Your mileage will certainly vary...
Scott Marlowe <scott.marlowe@ihs.com> writes: > You can change the setting of enable_seqscan and enable_indexscan to force > your planner to ignore one or the other, and time the output. It's pretty > easy to do a quick timing test like so: > time psql -c 'set enable_seqscan=off;select * from tellers' >/dev/null It's difficult to get accurate measurements that way. I'd recommend using 7.2's EXPLAIN ANALYZE instead, and looking at the total runtime it displays. Even with EXPLAIN ANALYZE, I'd recommend repeating the command a couple of times to see how much the runtime varies before you trust the results very far. Disk caching can cause the second execution to take far less time than the first, even with the identical plan. regards, tom lane