Обсуждение: Why no performance boost although I added an index?

Поиск
Список
Период
Сортировка

Why no performance boost although I added an index?

От
Holger Marzen
Дата:
Hi *,

I have an accounting table on postgres 7.2.4, and my favourite select
gets no performance boost if I add an index on the date column. But it
should be faster.

Without index:

|db1=# explain select date,
|db1-#        to_char(sum(in_local),'9 999 999 999')                      as in,
|db1-#        to_char(sum(out_local),'9 999 999 999')                     as out,
|db1-#        to_char(sum(in_forward),'9 999 999 999')                    as in_f,
|db1-#        to_char(sum(out_forward),'9 999 999 999')                   as out_f,
|db1-#        to_char(sum(out_local + out_forward + in_local + in_forward),'9999 999 999 99 9') as total
|db1-#        from netacct
|db1-#        where date > date 'today' - interval '10 days'
|db1-#          and interface = 'ppp0'
|db1-#        group by date;
|NOTICE:  QUERY PLAN:
|
|Aggregate  (cost=214.29..236.19 rows=146 width=20)
| -> Group  (cost=214.29..217.94 rows=1460 width=20)
|   -> Sort  (cost=214.29..214.29 rows=1460 width=20)
|     ->  Seq Scan on netacct  (cost=0.00..137.55 rows=1460 width=20)
|
|Aggregate  (cost=215.13..237.13 rows=147 width=20)
|           (actual time=3152.03..3161.54 rows=11 loops=1)
|->  Group  (cost=215.13..218.80 rows=1467 width=20)
|           (actual time=3150.96..3154.93 rows=265 loops=1)
|  ->  Sort  (cost=215.13..215.13 rows=1467 width=20)
|            (actual time=3150.93..3151.46 rows=265 loops=1)
|    ->  Seq Scan on netacct  (cost=0.00..138.00 rows=1467 width=20)
|                             (actual time=2950.10..3147.15 rows=265 loops=1)
|Total runtime: 3162.27 msec


And now after a "create index netacct_ix1 on netacct(date)" and vacuum analyze:

|Aggregate  (cost=0.00..176.40 rows=147 width=20)
| ->  Group  (cost=0.00..158.07 rows=1467 width=20)
|   ->  Index Scan using netacct_ix1 on netacct  (cost=0.00..154.40 rows=1467 width=20

|Aggregate  (cost=0.00..176.40 rows=147 width=20)
|           (actual time=3128.57..3337.59 rows=11 loops=1)
|->  Group  (cost=0.00..158.07 rows=1467 width=20)
|           (actual time=3108.24..3327.61 rows=265 loops=1)
|  ->  Index Scan using netacct_ix1 on netacct (cost=0.00..154.40 rows=1467 width=20)
|                                              (actual time=3108.21..3322.22 rows=265 loops=1)
|Total runtime: 3338.37 msec

So the index is used, but the execution time is greater than without
index. Why that? Is the overhead using an index the biggest factor
because there are only a few thousand rows in the table?

--
PGP/GPG Key-ID:
http://blackhole.pca.dfn.de:11371/pks/lookup?op=get&search=0xB5A1AFE1


Re: Why no performance boost although I added an index?

От
Stephan Szabo
Дата:
On Mon, 7 Apr 2003, Holger Marzen wrote:

> I have an accounting table on postgres 7.2.4, and my favourite select
> gets no performance boost if I add an index on the date column. But it
> should be faster.
>

Not necessarily (see below)

> So the index is used, but the execution time is greater than without
> index. Why that? Is the overhead using an index the biggest factor
> because there are only a few thousand rows in the table?

If the index scan is reading a large enough percentage of the rows (and
depending on the clustering of values), it may be reading enough pages
that there's no advantage (or even a disadvantage) to using the index.
This is due to both the reads of the index itself and the fact that it'll
often be reading the values in the main table (it still needs to get the
commit info from the table data) in random order rather than sequential
order which can lose some optimizations the OS often gives to sequential
reads.


Re: Why no performance boost although I added an index?

От
Jonathan Bartlett
Дата:
> If the index scan is reading a large enough percentage of the rows (and
> depending on the clustering of values), it may be reading enough pages
> that there's no advantage (or even a disadvantage) to using the index.
> This is due to both the reads of the index itself and the fact that it'll
> often be reading the values in the main table (it still needs to get the
> commit info from the table data) in random order rather than sequential
> order which can lose some optimizations the OS often gives to sequential
> reads.

This is a really big lose if your index and table are on the same disk.

Jon

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>


Re: Why no performance boost although I added an index?

От
"Dan Langille"
Дата:
On 7 Apr 2003 at 15:05, Jonathan Bartlett wrote:

> This is a really big lose if your index and table are on the same
> disk.

Can you quantity "really big" for us please?  Not all of us have more
than one disk per box.  I haven't looked into the ability for PG to
put tables and indexes on specific disks.
--
Dan Langille : http://www.langille.org/


Re: Why no performance boost although I added an index?

От
nolan@celery.tssi.com
Дата:
> This is a really big lose if your index and table are on the same disk.

That implies there is an easy way to ensure that the index and table
aren't on the same disk.  That's not obvious from the documentation.

Maybe I just don't understand which subdirectories the data and index are
kept in?
--
Mike Nolan


Re: Why no performance boost although I added an index?

От
Jonathan Bartlett
Дата:
The problem is that if you are iterating over a large portion of your
dataset using an index, your disk has to do this:

Seek to the next index entry

Seek to the corresponding table entry

Seek to the next index entry

Seek to the corresponding table entry

...

If you get a lot of that, your disk will spend more time seeking
back-and-forth than actually reading data.  Imagine if you used a CD
player and were constantly switching back between track 5 and track 9 of
the CD every 2 seconds - you would spend more time seeking than listening.

With two disks, you have one read-write head on the index, and one on the
table, so although they may be jumping around a little in the file, they
aren't moving nearly as much as if they are on the same disk.  Buffering
may take the jumps out completely.

For small index scans, it makes little difference.  For large index scans,
having the index on the same drive can really kill you.

To put them on different disks, you have to do the following:

Find the OID of the entity you wish to move.
Stop the database.
Move the entity to wherever you want it.
Symlink it to the original location.

The problems with this method are that:

 a) The DB has to be down to do it

 b) The symlink gets destroyed if you rebuild the index or cluster the
table.

Having said all that, I must say that I don't normally do this to my
databases, I just keep myself aware of the potential problems.

Jon


 On Mon, 7 Apr 2003, Dan Langille wrote:

> On 7 Apr 2003 at 15:05, Jonathan Bartlett wrote:
>
> > This is a really big lose if your index and table are on the same
> > disk.
>
> Can you quantity "really big" for us please?  Not all of us have more
> than one disk per box.  I haven't looked into the ability for PG to
> put tables and indexes on specific disks.
> --
> Dan Langille : http://www.langille.org/
>


Re: Why no performance boost although I added an index?

От
Bruce Momjian
Дата:
We have an FAQ item about this, 4.8.

---------------------------------------------------------------------------

Jonathan Bartlett wrote:
> The problem is that if you are iterating over a large portion of your
> dataset using an index, your disk has to do this:
>
> Seek to the next index entry
>
> Seek to the corresponding table entry
>
> Seek to the next index entry
>
> Seek to the corresponding table entry
>
> ...
>
> If you get a lot of that, your disk will spend more time seeking
> back-and-forth than actually reading data.  Imagine if you used a CD
> player and were constantly switching back between track 5 and track 9 of
> the CD every 2 seconds - you would spend more time seeking than listening.
>
> With two disks, you have one read-write head on the index, and one on the
> table, so although they may be jumping around a little in the file, they
> aren't moving nearly as much as if they are on the same disk.  Buffering
> may take the jumps out completely.
>
> For small index scans, it makes little difference.  For large index scans,
> having the index on the same drive can really kill you.
>
> To put them on different disks, you have to do the following:
>
> Find the OID of the entity you wish to move.
> Stop the database.
> Move the entity to wherever you want it.
> Symlink it to the original location.
>
> The problems with this method are that:
>
>  a) The DB has to be down to do it
>
>  b) The symlink gets destroyed if you rebuild the index or cluster the
> table.
>
> Having said all that, I must say that I don't normally do this to my
> databases, I just keep myself aware of the potential problems.
>
> Jon
>
>
>  On Mon, 7 Apr 2003, Dan Langille wrote:
>
> > On 7 Apr 2003 at 15:05, Jonathan Bartlett wrote:
> >
> > > This is a really big lose if your index and table are on the same
> > > disk.
> >
> > Can you quantity "really big" for us please?  Not all of us have more
> > than one disk per box.  I haven't looked into the ability for PG to
> > put tables and indexes on specific disks.
> > --
> > Dan Langille : http://www.langille.org/
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073