Обсуждение: trouble with (lack of) indexing

Поиск

Список

Период

Сортировка

trouble with (lack of) indexing

От

Søren Boll Overgaard

Дата:

09 мая 2002 г., 20:25:35

Hello

I am currently involved in a rather large project relying heavily on the use of
postgresql[1], but we have run into a rather annoying snag.
We currently have two databases set up. One for testing, and one for production.
Both run on FreeBSD, and perform very well since the last upgrade.
However, here is the problem. When executing a certain select statement (shown
below) on the production database, we get a sequential table scan (of a rather
large table), which causes the machine on which it runs to max out all possible
disk I/O. However, when the excact same query is executed on the test
database, we get an index scan instead of a sequential one. Obviously,
something differes between the two databases, but we simply cannot track down
what it is. I would greatly appreciate any input you might be able to offer.
Here are the queries and their accompanying query plans:

On the development/testing database:


--------------8<-----------
=> explain SELECT ht.id,
->                      hq.ip,
->                      hq.id
->               FROM   hostsqueue as hq,
->                      hoststests as ht
->               WHERE  ht.hostsqueue_id=hq.id
-> ;
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..121.50 rows=1000 width=44)
  ->  Index Scan using hostsqueue_pkey on hostsqueue hq  (cost=0.00..52.00 rows=1000 width=36)
  ->  Index Scan using idx_htsts_hq on hoststests ht  (cost=0.00..52.00 rows=1000 width=8)

EXPLAIN
=>
--------------8<-----------
That looks fine and dandy, and performance is ok.

This however is on the production database:
--------------8<-----------

=> explain SELECT ht.id,
->                      hq.ip,
->                      hq.id
->               FROM   hostsqueue as hq,
->                      hoststests as ht
->               WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Merge Join  (cost=134514.31..136541.15 rows=129756 width=44)
  ->  Sort  (cost=76196.94..76196.94 rows=32200 width=36)
        ->  Seq Scan on hostsqueue hq  (cost=0.00..73786.00 rows=32200 width=36)
  ->  Sort  (cost=58317.37..58317.37 rows=129756 width=8)
        ->  Seq Scan on hoststests ht  (cost=0.00..47297.56 rows=129756 width=8)

EXPLAIN
=>

--------------8<-----------

Obviously something differs.
The table definitions are rather large, so I have made them available at:
http://tb.tolkien.dk/~boll/devel-definition.txt
and
http://treebeard.tolkien.dk/~boll/production-definition.txt

Any input, be it suggestions or otherwise, as to how we may fix this, would be
greatly appreciated.

Since I am a relatively new subscriber to the list, could you possibly cc me
with any suggestions, since I am not entirely sure if I actually receive mail
sent to the list yet.

Thanks.

[1] We choose postgresql over others because we liked the many features it
offers beyond its closest competitors.
--
Søren O.                                       ,''`.
                                              : :' :
public key: finger boll <at> db.debian.org    `. `'
                                                `-

Re: trouble with (lack of) indexing

От

Scott Marlowe

Дата:

09 мая 2002 г., 20:43:54

On Thu, 9 May 2002, Søren Boll Overgaard wrote:

> Hello
>
> I am currently involved in a rather large project relying heavily on the use of
> postgresql[1], but we have run into a rather annoying snag.
> We currently have two databases set up. One for testing, and one for production.
> Both run on FreeBSD, and perform very well since the last upgrade.
> However, here is the problem. When executing a certain select statement (shown
> below) on the production database, we get a sequential table scan (of a rather
> large table), which causes the machine on which it runs to max out all possible
> disk I/O. However, when the excact same query is executed on the test
> database, we get an index scan instead of a sequential one. Obviously,
> something differes between the two databases, but we simply cannot track down
> what it is. I would greatly appreciate any input you might be able to offer.
> Here are the queries and their accompanying query plans:

Couple of quick questions...

Have you run vacuum analyze on both databases?
What are your settings in the postgresql.conf for cpu_tuple_cost and such?
You can check them with 'show all' from psql in 7.2.x and later.

Re: trouble with (lack of) indexing

От

Søren Boll Overgaard

Дата:

09 мая 2002 г., 22:00:26

On Thu, May 09, 2002 at 03:38:20PM -0600, Scott Marlowe wrote:
> On Thu, 9 May 2002, Søren Boll Overgaard wrote:
>
> > Hello
> >
> > I am currently involved in a rather large project relying heavily on the use of
> > postgresql[1], but we have run into a rather annoying snag.
> > We currently have two databases set up. One for testing, and one for production.
> > Both run on FreeBSD, and perform very well since the last upgrade.
> > However, here is the problem. When executing a certain select statement (shown
> > below) on the production database, we get a sequential table scan (of a rather
> > large table), which causes the machine on which it runs to max out all possible
> > disk I/O. However, when the excact same query is executed on the test
> > database, we get an index scan instead of a sequential one. Obviously,
> > something differes between the two databases, but we simply cannot track down
> > what it is. I would greatly appreciate any input you might be able to offer.
> > Here are the queries and their accompanying query plans:
>
> Couple of quick questions...
>
> Have you run vacuum analyze on both databases?

Just did. Since I am not the one maintaining the databases on a day to day
basis I wasn't sure when it had last been done. I am pretty much the dimwit who
was convinced to go and find a solution to the problem :)
Oddly enough, after vacuum analyzing the two tables involved on the development
database, it now performs sequential scans on bot tables, in excactly the same
way as the production database:

 explain SELECT ht.id,hq.ip,hq.id FROM
 hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Hash Join  (cost=776.05..2904.36 rows=84 width=24)
  ->  Seq Scan on hoststests ht  (cost=0.00..2126.84 rows=84 width=8)
  ->  Hash  (cost=776.04..776.04 rows=4 width=16)
        ->  Seq Scan on hostsqueue hq  (cost=0.00..776.04 rows=4 width=16)

EXPLAIN

> What are your settings in the postgresql.conf for cpu_tuple_cost and such?

On both databases:
NOTICE:  cpu_tuple_cost is 0.01
NOTICE:  cpu_index_tuple_cost is 0.001

Again, any input would be greatly appreciated.
Perhaps we are missing an index somewhere?

--
Søren O.                                       ,''`.
                                              : :' :
public key: finger boll <at> db.debian.org    `. `'
                                                `-

Re: trouble with (lack of) indexing

От

Tom Lane

Дата:

09 мая 2002 г., 22:38:48

=?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes:
>  explain SELECT ht.id,hq.ip,hq.id FROM
>  hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
> NOTICE:  QUERY PLAN:

> Hash Join  (cost=776.05..2904.36 rows=84 width=24)
>   ->  Seq Scan on hoststests ht  (cost=0.00..2126.84 rows=84 width=8)
>   ->  Hash  (cost=776.04..776.04 rows=4 width=16)
>         ->  Seq Scan on hostsqueue hq  (cost=0.00..776.04 rows=4 width=16)

If there's only 84 rows it's hardly going to matter which plan we choose
;-).  Please show us the results from the production database, not the
toy-sized tables.

If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than
plain EXPLAIN.  Also, you could try doing "SET enable_seqscan = off"
and see how the plan changes.

            regards, tom lane

Re: trouble with (lack of) indexing

От

Søren Boll Overgaard

Дата:

09 мая 2002 г., 22:52:44

On Thu, May 09, 2002 at 07:38:06PM -0400, Tom Lane wrote:
> =?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes:
> >  explain SELECT ht.id,hq.ip,hq.id FROM
> >  hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
> > NOTICE:  QUERY PLAN:
>
> > Hash Join  (cost=776.05..2904.36 rows=84 width=24)
> >   ->  Seq Scan on hoststests ht  (cost=0.00..2126.84 rows=84 width=8)
> >   ->  Hash  (cost=776.04..776.04 rows=4 width=16)
> >         ->  Seq Scan on hostsqueue hq  (cost=0.00..776.04 rows=4 width=16)
>
>
> If there's only 84 rows it's hardly going to matter which plan we choose
> ;-).  Please show us the results from the production database, not the
> toy-sized tables.

Sorry, my bad.
Here is the real stuff:

explain SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Hash Join  (cost=75331.89..127823.94 rows=160293 width=24)
  ->  Seq Scan on hoststests ht  (cost=0.00..49686.93 rows=160293 width=8)
  ->  Hash  (cost=75243.51..75243.51 rows=35351 width=16)
        ->  Seq Scan on hostsqueue hq  (cost=0.00..75243.51 rows=35351 width=16)

EXPLAIN




>
> If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than
> plain EXPLAIN.  Also, you could try doing "SET enable_seqscan = off"
> and see how the plan changes.

After changing the enable_seqscan, the plan looks like this:

explain SELECT ht.id,hq.ip,hq.id FROM   hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..772324.63 rows=160293 width=24)
  ->  Index Scan using hostsqueue_pkey on hostsqueue hq  (cost=0.00..137603.50 rows=35351 width=16)
  ->  Index Scan using idx_htsts_hq on hoststests ht  (cost=0.00..632228.35 rows=160293 width=8)

EXPLAIN


As far as I can tell, that is excactly what we want. I will let the guys who
usually mess with the database have a go at it.

Thanks for the input so far, I really appreciate it.

--
Søren O.                                       ,''`.
                                              : :' :
public key: finger boll <at> db.debian.org    `. `'
                                                `-

Re: trouble with (lack of) indexing

От

Søren Boll Overgaard

Дата:

09 мая 2002 г., 23:00:49

On Thu, May 09, 2002 at 07:38:06PM -0400, Tom Lane wrote:
> =?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes:
> >  explain SELECT ht.id,hq.ip,hq.id FROM
> >  hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
> > NOTICE:  QUERY PLAN:
>
> > Hash Join  (cost=776.05..2904.36 rows=84 width=24)
> >   ->  Seq Scan on hoststests ht  (cost=0.00..2126.84 rows=84 width=8)
> >   ->  Hash  (cost=776.04..776.04 rows=4 width=16)
> >         ->  Seq Scan on hostsqueue hq  (cost=0.00..776.04 rows=4 width=16)
>
>
> If there's only 84 rows it's hardly going to matter which plan we choose
> ;-).  Please show us the results from the production database, not the
> toy-sized tables.
>
> If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than
> plain EXPLAIN.  Also, you could try doing "SET enable_seqscan = off"
> and see how the plan changes.

I missed the outputs of the explain analyze queries in the previous mail:
explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Hash Join  (cost=75331.89..127823.94 rows=160293 width=24) (actual time=122572.70..146683.59 rows=142807 loops=1)
  ->  Seq Scan on hoststests ht  (cost=0.00..49686.93 rows=160293 width=8) (actual time=170.79..21255.68 rows=142807
loops=1)
  ->  Hash  (cost=75243.51..75243.51 rows=35351 width=16) (actual time=122401.10..122401.10 rows=0 loops=1)
        ->  Seq Scan on hostsqueue hq  (cost=0.00..75243.51 rows=35351 width=16) (actual time=518.88..122024.50
rows=28610loops=1) 
Total runtime: 147598.79 msec

EXPLAIN

And after setting the enable_seqscan to off, I got this:

=> SET enable_seqscan = off;
SET VARIABLE
=> explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Merge Join  (cost=0.00..772324.63 rows=160293 width=24) (actual time=8319.78..116061.18 rows=141645 loops=1)
  ->  Index Scan using hostsqueue_pkey on hostsqueue hq  (cost=0.00..137603.50 rows=35351 width=16) (actual
time=34.90..90831.97rows=28623 loops=1) 
  ->  Index Scan using idx_htsts_hq on hoststests ht  (cost=0.00..632228.35 rows=160293 width=8) (actual
time=18.41..23343.75rows=141645 loops=1) 
Total runtime: 116562.03 msec

EXPLAIN

Actually, I was rather hoping for a large gain in performance, but maybe the
"total runtimes" are not actally representative of performance of the actual
selects?

--
Søren O.                                       ,''`.
                                              : :' :
public key: finger boll <at> db.debian.org    `. `'
                                                `-

Re: trouble with (lack of) indexing

От

"Nigel J. Andrews"

Дата:

09 мая 2002 г., 23:07:18

On Thu, 9 May 2002, Tom Lane wrote:

> =?iso-8859-1?Q?S=F8ren?= Boll Overgaard <postgres@fork.dk> writes:
> >  explain SELECT ht.id,hq.ip,hq.id FROM
> >  hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
> > NOTICE:  QUERY PLAN:
>
> > Hash Join  (cost=776.05..2904.36 rows=84 width=24)
> >   ->  Seq Scan on hoststests ht  (cost=0.00..2126.84 rows=84 width=8)
> >   ->  Hash  (cost=776.04..776.04 rows=4 width=16)
> >         ->  Seq Scan on hostsqueue hq  (cost=0.00..776.04 rows=4 width=16)
>
>
> If there's only 84 rows it's hardly going to matter which plan we choose
> ;-).  Please show us the results from the production database, not the
> toy-sized tables.

The original post had it, but presumably not from the most recent analyze:

[quote]

This however is on the production database:
--------------8<-----------

=> explain SELECT ht.id,
->                      hq.ip,
->                      hq.id
->               FROM   hostsqueue as hq,
->                      hoststests as ht
->               WHERE  ht.hostsqueue_id=hq.id;
NOTICE:  QUERY PLAN:

Merge Join  (cost=134514.31..136541.15 rows=129756 width=44)
  ->  Sort  (cost=76196.94..76196.94 rows=32200 width=36)
        ->  Seq Scan on hostsqueue hq  (cost=0.00..73786.00 rows=32200 width=36)
  ->  Sort  (cost=58317.37..58317.37 rows=129756 width=8)
        ->  Seq Scan on hoststests ht  (cost=0.00..47297.56 rows=129756 width=8)

EXPLAIN
=>
[/end quote]

It's quote a result set. I wouldn't want to be paging/scrolling through that
lot.

> If you have 7.2 then EXPLAIN ANALYZE is much more useful to show than
> plain EXPLAIN.  Also, you could try doing "SET enable_seqscan = off"
> and see how the plan changes.

Would it be worth changing the explain output to include the stats., or a
summary of them, used by the plan? Does the 10% rule still apply? If so it
might be a nice addition to show the total number of rows expected in the table
so that a quick 'eyeball' can confirm that is why a seqscan is being
performed. Or am I just barking up a dead horse here since I'm now thinking it
might be nice to include information about potential indexes and at what row
count from them it's decided a seqscan is better.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: trouble with (lack of) indexing

От

"Nigel J. Andrews"

Дата:

09 мая 2002 г., 23:15:11

On Fri, 10 May 2002, Søren Boll Overgaard wrote:
>  [snip]
> I missed the outputs of the explain analyze queries in the previous mail:
> explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
> NOTICE:  QUERY PLAN:
>
> Hash Join  (cost=75331.89..127823.94 rows=160293 width=24) (actual time=122572.70..146683.59 rows=142807 loops=1)
>   ->  Seq Scan on hoststests ht  (cost=0.00..49686.93 rows=160293 width=8) (actual time=170.79..21255.68 rows=142807
loops=1)
>   ->  Hash  (cost=75243.51..75243.51 rows=35351 width=16) (actual time=122401.10..122401.10 rows=0 loops=1)
>         ->  Seq Scan on hostsqueue hq  (cost=0.00..75243.51 rows=35351 width=16) (actual time=518.88..122024.50
rows=28610loops=1) 
> Total runtime: 147598.79 msec
>
> EXPLAIN
>
> And after setting the enable_seqscan to off, I got this:
>
> => SET enable_seqscan = off;
> SET VARIABLE
> => explain analyze SELECT ht.id,hq.ip,hq.id FROM hostsqueue as hq,hoststests as ht WHERE  ht.hostsqueue_id=hq.id;
> NOTICE:  QUERY PLAN:
>
> Merge Join  (cost=0.00..772324.63 rows=160293 width=24) (actual time=8319.78..116061.18 rows=141645 loops=1)
>   ->  Index Scan using hostsqueue_pkey on hostsqueue hq  (cost=0.00..137603.50 rows=35351 width=16) (actual
time=34.90..90831.97rows=28623 loops=1) 
>   ->  Index Scan using idx_htsts_hq on hoststests ht  (cost=0.00..632228.35 rows=160293 width=8) (actual
time=18.41..23343.75rows=141645 loops=1) 
> Total runtime: 116562.03 msec
>
> EXPLAIN
>
> Actually, I was rather hoping for a large gain in performance, but maybe the
> "total runtimes" are not actally representative of performance of the actual
> selects?


Yes they are. That is the time that is taken to produce the results since that
is what explain analyze is doing. It doesn't cover the time taken to get those
results somewhere where they can be used, which for such large result set could
be significant (network transmission plus buffering in the client all before
the application can use any of it).

BTW, are you aware that those two things seem to be returning different numbers
of rows from each table and consequently from the whole query?


---
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: trouble with (lack of) indexing

От

Tom Lane

Дата:

09 мая 2002 г., 23:30:10

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> Or am I just barking up a dead horse here since I'm now thinking it
> might be nice to include information about potential indexes and at
> what row count from them it's decided a seqscan is better.

There is no "row count at which it's decided a seqscan is better".
The planner does cost estimates and picks the plan with the lowest
estimated cost, so all the behavior is emergent from the cost
estimation equations.

I have not been able to think of any output format that would show
rejected plan estimates without being so verbose as to be unreadable
--- except in the most trivial cases, there are a *lot* of rejected
plans, and the planner has no idea which of them might be interesting
to a human.  (You can try defining OPTIMIZER_DEBUG if you want to see
it in action, but I don't think the output is very useful in practice.)
What's worse is that (a) the only reason why you'd care is if the
estimates are wrong, but (b) there is no way to tell anything about the
actual cost that might be associated with a rejected plan fragment.
To compare estimates to reality you really have to arrange for the plan
to be the selected one, so that you can run it.

            regards, tom lane

Re: trouble with (lack of) indexing

От

Tom Lane

Дата:

09 мая 2002 г., 23:44:14

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> BTW, are you aware that those two things seem to be returning
> different numbers of rows from each table and consequently from the
> whole query?

What's even stranger is that the whole-table seqscan on hostsqueue is
apparently running slower than the whole-table indexscan:

>         ->  Seq Scan on hostsqueue hq  (cost=0.00..75243.51 rows=35351 width=16) (actual time=518.88..122024.50
rows=28610loops=1) 

>   ->  Index Scan using hostsqueue_pkey on hostsqueue hq  (cost=0.00..137603.50 rows=35351 width=16) (actual
time=34.90..90831.97rows=28623 loops=1) 

Now that just ain't physically possible, given comparable starting
conditions: an indexscan has to do more work, both CPU and I/O.
I suspect that the indexscan test was run second and benefited from most
of the table having been brought into RAM during the seqscan.  If so, a
repeat test will show different results.

In any case, it'd seem that the planner is not making any major errors
here: its estimates were not that far out of line from reality.
It did pick the slightly slower plan, but the difference is well within
what I'd call reasonable estimation error.

            regards, tom lane

Re: trouble with (lack of) indexing

От

"Nigel J. Andrews"

Дата:

09 мая 2002 г., 23:52:46

On Thu, 9 May 2002, Tom Lane wrote:
> "Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> > Or am I just barking up a dead horse here since I'm now thinking it
> > might be nice to include information about potential indexes and at
> > what row count from them it's decided a seqscan is better.
>
> There is no "row count at which it's decided a seqscan is better".
> The planner does cost estimates and picks the plan with the lowest
> estimated cost, so all the behavior is emergent from the cost
> estimation equations.

But those cost estimates can be extrapolated and intersect points determined
can't they?

> I have not been able to think of any output format that would show
> rejected plan estimates without being so verbose as to be unreadable
> --- except in the most trivial cases, there are a *lot* of rejected
> plans, and the planner has no idea which of them might be interesting
> to a human.  (You can try defining OPTIMIZER_DEBUG if you want to see
> it in action, but I don't think the output is very useful in practice.)

That's why I started thinking I was barking.

> What's worse is that (a) the only reason why you'd care is if the
> estimates are wrong, but (b) there is no way to tell anything about the
> actual cost that might be associated with a rejected plan fragment.
> To compare estimates to reality you really have to arrange for the plan
> to be the selected one, so that you can run it.

Good point.

I was thinking that it might be nice for explain to show some justification for
the choice of plan. However, I'm quite prepared to agree that this isn't a good
idea to try and do. I haven't looked to see what explanation of the statistics
and how the planner uses them there is in the documentation. It might be a good
idea if there was some detail about it there, I mean detail such that someone
can go to the documentation, query the tables and say 'Ah, so that's why it
chose that plan.' (Well at least for reasonably simple cases). The trouble is
I'm no where near qualified enough to it and so can't add it if it's not there
already.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: trouble with (lack of) indexing

От

Tom Lane

Дата:

10 мая 2002 г., 00:04:41

"Nigel J. Andrews" <nandrews@investsystems.co.uk> writes:
> On Thu, 9 May 2002, Tom Lane wrote:
>> There is no "row count at which it's decided a seqscan is better".
>> The planner does cost estimates and picks the plan with the lowest
>> estimated cost, so all the behavior is emergent from the cost
>> estimation equations.

> But those cost estimates can be extrapolated and intersect points determined
> can't they?

Well, many of the equations are nonlinear, and they all have multiple
input values, not to mention various tunable parameters.  Solving for
the intersect points is theoretically possible but doesn't strike me
as likely to be informative.

> I haven't looked to see what explanation of the statistics and how the
> planner uses them there is in the documentation. It might be a good
> idea if there was some detail about it there, I mean detail such that
> someone can go to the documentation, query the tables and say 'Ah, so
> that's why it chose that plan.'

The nitty-gritty documentation is in the source code.  This is an
open-source project, remember; we operate on the assumption that people
who want to know how things work will look at the sources.  Most of the
interesting details for this purpose are in
src/backend/optimizer/path/costsize.c with some tidbits in
src/backend/utils/adt/selfuncs.c

            regards, tom lane

Re: trouble with (lack of) indexing

От

Scott Marlowe

Дата:

10 мая 2002 г., 15:31:19

On Fri, 10 May 2002, Søren Boll Overgaard wrote:

> > What are your settings in the postgresql.conf for cpu_tuple_cost and such?
>
> On both databases:
> NOTICE:  cpu_tuple_cost is 0.01
> NOTICE:  cpu_index_tuple_cost is 0.001

Here's a quick and dirty explanation of what these settings mean to the
query planner, as listed in src/backend/optimizer/path/costsize.c:

random_page_cost        Cost of a non-sequential page fetch
cpu_tuple_cost          Cost of typical CPU time to process a tuple
cpu_index_tuple_cost    Cost of typical CPU time to process an index tuple

Since the initial numbers are just SWAGs (scientific wild ass guesses) it
would behoove you to figure out where the break points really are in your
setup.

You can change the setting of enable_seqscan and enable_indexscan to force
your planner to ignore one or the other, and time the output.  It's pretty
easy to do a quick timing test like so:

time psql -c 'set enable_seqscan=off;select * from tellers' >/dev/null

When you find a query that is just a little faster with seq scan than
index or vice versa, then you can tune the three listed parameters above
so that the planner just barely makes the right choices for your data.

I've found that lower cpu_index_tuple_cost from the default 0.001 to
0.0005 or so made a big difference in my database, and setting
random_page_cost = 1.5 or so was a good help too.  Your mileage will
certainly vary...

Re: trouble with (lack of) indexing

От

Tom Lane

Дата:

10 мая 2002 г., 16:04:02

Scott Marlowe <scott.marlowe@ihs.com> writes:
> You can change the setting of enable_seqscan and enable_indexscan to force
> your planner to ignore one or the other, and time the output.  It's pretty
> easy to do a quick timing test like so:

> time psql -c 'set enable_seqscan=off;select * from tellers' >/dev/null

It's difficult to get accurate measurements that way.  I'd recommend
using 7.2's EXPLAIN ANALYZE instead, and looking at the total runtime
it displays.

Even with EXPLAIN ANALYZE, I'd recommend repeating the command a couple
of times to see how much the runtime varies before you trust the results
very far.  Disk caching can cause the second execution to take far less
time than the first, even with the identical plan.

            regards, tom lane

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: trouble with (lack of) indexing