Обсуждение: table full scan or index full scan?

Поиск
Список
Период
Сортировка

table full scan or index full scan?

От
旭斌 裴
Дата:
I have a 30,000,000 records table, counts the record number to need for 40 seconds.
The table has a primary key on column id;

perf=# explain select count(*) from test;
...
-----------------------------------------
Aggregate (cost=603702.80..603702.81 rows=1 width=0)
  -> Seq scan on test (cost=0.00..527681.04 rows=30408704 width=0)
...
perf=# select count(*) from test;
count
------------
30408704

perf=#


The postgresql database uses the table full scan.but in oracle, the similar SQL uses the index full scanning,speed quickly many than postgresql.  

postgresql's optimizer whether to have the necessity to make the adjustment?
 
postgresql version:8.3.7
OS : ubuntu 9
kernel:2.6.28-15-generic x86_64






好玩贺卡等你发,邮箱贺卡全新上线!

Re: table full scan or index full scan?

От
Greg Smith
Дата:
On Mon, 12 Oct 2009, ?? ? wrote:

> perf=# select count(*) from test;

In PostgreSQL, if you're selecting every record from the table for a count
of them, you have to visit them all no matter what.  The most efficient
way to do that is with a full table scan.  Using an index instead requires
more disk I/O, because you have to read both the index blocks and the disk
blocks.

> The postgresql database uses the table full scan.but in oracle, the similar SQL uses the index full
> scanning,speed quickly many than postgresql.  

Some other database systems can do just an index scan instead to compute
aggregates like count, but even there the rules are pretty complicated;
http://www.jlcomp.demon.co.uk/faq/count_rows.html covers a lot of the
material there for Oracle's implementation.  Unfortunately this particular
optimization isn't available in Postgres yet, and you'll only switch to an
index scan if you're running a query that only selects a small number of
records where an index on the condition you're checking for exists.

There's some information about alternative ways to solve this problem at
http://wiki.postgresql.org/wiki/Slow_Counting

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: table full scan or index full scan?

От
Scott Marlowe
Дата:
Real quick, plain text is preferred on these lists over html.  I don't
care myself so much.

On Sun, Oct 11, 2009 at 7:17 PM, 旭斌 裴 <peixubin@yahoo.com.cn> wrote:
>
> I have a 30,000,000 records table, counts the record number to need for 40 seconds.
> The table has a primary key on column id;
>
> perf=# explain select count(*) from test;
> ...
> -----------------------------------------
> Aggregate (cost=603702.80..603702.81 rows=1 width=0)
>   -> Seq scan on test (cost=0.00..527681.04 rows=30408704 width=0)
> ...
> perf=# select count(*) from test;
> count
> ------------
> 30408704
>
> perf=#
>
>
> The postgresql database uses the table full scan.but in oracle, the similar SQL uses the index full scanning,speed
quicklymany than postgresql. 

Yep, PostgreSQL isn't Oracle.  It's a trade off.  In pgsql indexes
don't contain visibility info, so all index lookups have to eventually
hit the table itself.  So you either do indexlookup -> table lookup,
repeat as many times as you have index lookups or you just hit the
table since you gotta go there anyway.

On the  bright side, this makes updates faster since you don't have to
lock both table and index and write to both at the same time anymore.

> postgresql's optimizer whether to have the necessity to make the adjustment?

Sorry, it's an architectural difference.  Are you testing in a
realistic scenario including both reads and writes to the database to
see if postgresql is faster overall and identify problem areas that
pop up there?

Re: table full scan or index full scan?

От
Peter Hunsberger
Дата:
2009/10/11 Scott Marlowe <scott.marlowe@gmail.com>:
>> The postgresql database uses the table full scan.but in oracle, the similar SQL uses the index full scanning,speed
quicklymany than postgresql. 
>
> Yep, PostgreSQL isn't Oracle.  It's a trade off.  In pgsql indexes
> don't contain visibility info, so all index lookups have to eventually
> hit the table itself.  So you either do indexlookup -> table lookup,
> repeat as many times as you have index lookups or you just hit the
> table since you gotta go there anyway.
>
> On the  bright side, this makes updates faster since you don't have to
> lock both table and index and write to both at the same time anymore.
>
>> postgresql's optimizer whether to have the necessity to make the adjustment?
>
> Sorry, it's an architectural difference.  Are you testing in a
> realistic scenario including both reads and writes to the database to
> see if postgresql is faster overall and identify problem areas that
> pop up there?
>

This is interesting, I just ran a similar issue the other day.
Clearly there is a wide range of read / write scenarios that Postgres
should be able to cover.  These days, I have a lot of designs leaning
more toward the data warehouse side of the operational spectrum as
opposed to the high transaction scenario and I specifically design DB
management strategies with the knowledge that writes will happen far
less than reads in our applications.  Is this an area where
optimizations are considered hard in Postrgres or hopefully, just
something that is on the todo list but just no one has gotten around
to yet?  Similarly, are accurate table summary stats possible someday
or are they considered close to impossible in order to eliminate race
conditions and lock contention scenarios?

--
Peter Hunsberger

Re: table full scan or index full scan?

От
Martijn van Oosterhout
Дата:
On Sun, Oct 11, 2009 at 10:01:52PM -0500, Peter Hunsberger wrote:
> This is interesting, I just ran a similar issue the other day.
> Clearly there is a wide range of read / write scenarios that Postgres
> should be able to cover.  These days, I have a lot of designs leaning
> more toward the data warehouse side of the operational spectrum as
> opposed to the high transaction scenario and I specifically design DB
> management strategies with the knowledge that writes will happen far
> less than reads in our applications.  Is this an area where
> optimizations are considered hard in Postrgres or hopefully, just
> something that is on the todo list but just no one has gotten around
> to yet?

We consider any optimisation that is feasible. Unfortunatly, "the
number of rows in a table" is a fairly hard number to get in the
general case because it depends on who is asking (different
transactions may get different answers).

>Similarly, are accurate table summary stats possible someday
> or are they considered close to impossible in order to eliminate race
> conditions and lock contention scenarios?

It is possible, it's just not cheap in the general case. The usual
approach is to keep a table that tracks the number of rows. By using
deltas you can make it lockfree. These are however costs most
applications do't need. If you know in your case that the data never
changes, just cache the result somewhere. That will be infinitly more
efficient than any other method.

If you are happy with estimates, they are there and are kept reasonably
uptodate.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: table full scan or index full scan?

От
peixubin
Дата:
 I understood , thanks.
--- 09年10月12日,周一, Martijn van Oosterhout <kleptog@svana.org> 写道:

发件人: Martijn van Oosterhout <kleptog@svana.org>
主题: Re: [GENERAL] table full scan or index full scan?
收件人: "Peter Hunsberger" <peter.hunsberger@gmail.com>
抄送: "Scott Marlowe" <scott.marlowe@gmail.com>, "???? ??" <peixubin@yahoo.com.cn>, pgsql-general@postgresql.org
日期: 2009年10月12日,周一,下午2:25

On Sun, Oct 11, 2009 at 10:01:52PM -0500, Peter Hunsberger wrote:
> This is interesting, I just ran a similar issue the other day.
> Clearly there is a wide range of read / write scenarios that Postgres
> should be able to cover.  These days, I have a lot of designs leaning
> more toward the data warehouse side of the operational spectrum as
> opposed to the high transaction scenario and I specifically design DB
> management strategies with the knowledge that writes will happen far
> less than reads in our applications.  Is this an area where
> optimizations are considered hard in Postrgres or hopefully, just
> something that is on the todo list but just no one has gotten around
> to yet? 

We consider any optimisation that is feasible. Unfortunatly, "the
number of rows in a table" is a fairly hard number to get in the
general case because it depends on who is asking (different
transactions may get different answers).

>Similarly, are accurate table summary stats possible someday
> or are they considered close to impossible in order to eliminate race
> conditions and lock contention scenarios?

It is possible, it's just not cheap in the general case. The usual
approach is to keep a table that tracks the number of rows. By using
deltas you can make it lockfree. These are however costs most
applications do't need. If you know in your case that the data never
changes, just cache the result somewhere. That will be infinitly more
efficient than any other method.

If you are happy with estimates, they are there and are kept reasonably
uptodate.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.



      ___________________________________________________________
  好玩贺卡等你发,邮箱贺卡全新上线!
http://card.mail.cn.yahoo.com/