Обсуждение: Seq Scan

Поиск
Список
Период
Сортировка

Seq Scan

От
"Tyler Durden"
Дата:
Hi,
I'm having some problems in performance in a simple select count(id)
from.... I have 700 000 records in one table, and when I do:

# explain select (id) from table_name;
-[ RECORD 1 ]----------------------------------------------------------------
QUERY PLAN | Seq Scan on table_name  (cost=0.00..8601.30 rows=266730 width=4)

I had created an index for id(btree), but still shows "Seq Scan".
What I'm doing wrong?

Thanks,
Tyler

Re: Seq Scan

От
"John D. Burger"
Дата:
Tyler Durden wrote:

> I'm having some problems in performance in a simple select count(id)
> from.... I have 700 000 records in one table, and when I do:
>
> # explain select (id) from table_name;
> -[ RECORD
> 1 ]----------------------------------------------------------------
> QUERY PLAN | Seq Scan on table_name  (cost=0.00..8601.30
> rows=266730 width=4)
>
> I had created an index for id(btree), but still shows "Seq Scan".
> What I'm doing wrong?

You mention SELECT COUNT(ID), but your example shows SELECT ID.  In
either case, the planner is choosing the correct plan.  Indexes exist
to save the engine from visiting every row in the table, but both of
these queries require every row to be visited anyway.

Perhaps you think that these queries can be satisfied without
visiting the actual table rows at all, using only the index.  This is
incorrect - PG doesn't work that way.

- John Burger
   MITRE


Re: Seq Scan

От
"Joshua D. Drake"
Дата:
Tyler Durden wrote:
> Hi,
> I'm having some problems in performance in a simple select count(id)
> from.... I have 700 000 records in one table, and when I do:
>
> # explain select (id) from table_name;
> -[ RECORD 1
> ]----------------------------------------------------------------
> QUERY PLAN | Seq Scan on table_name  (cost=0.00..8601.30 rows=266730
> width=4)
>
> I had created an index for id(btree), but still shows "Seq Scan".
> What I'm doing wrong?

Nothing. You have to scan the table because you aren't giving postgresql
anything to use the index by.

Joshua D. Drake

>
> Thanks,
> Tyler
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>


--

       === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


Re: Seq Scan

От
"Tyler Durden"
Дата:
Yes,  either case happens the same.
I'm come recently from MySQL and it works in a different way.
I find strange that a simple SELECT COUNT(...) is so slow with only
700 000 records.
Has been a nightmare optimizing this tables/queries.
Sorry about this silly question, but I'm new to Posgresql.

Thanks,
Tyler

On 6/1/07, John D. Burger <john@mitre.org> wrote:
> You mention SELECT COUNT(ID), but your example shows SELECT ID.  In
> either case, the planner is choosing the correct plan.  Indexes exist
> to save the engine from visiting every row in the table, but both of
> these queries require every row to be visited anyway.
>
> Perhaps you think that these queries can be satisfied without
> visiting the actual table rows at all, using only the index.  This is
> incorrect - PG doesn't work that way.
>
> - John Burger
>    MITRE
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: Seq Scan

От
"Tyler Durden"
Дата:
Ok, my bad.
But why this happens:
# explain ANALYZE select id from table_name where id>200000;
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using table_name_pkey on table_name  (cost=0.00..2618.96
rows=68960 width=4) (actual time=220.543..1479.495 rows=66730 loops=1)
   Index Cond: (id > 200000)
 Total runtime: 1504.839 ms
(3 rows)

dun=# explain ANALYZE select id from table_name where id>10;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Seq Scan on table_name  (cost=0.00..9268.12 rows=266703 width=4)
(actual time=107.935..2733.592 rows=266720 loops=1)
   Filter: (id > 10)
 Total runtime: 2833.744 ms
(3 rows)


It uses Index Scan for id>200000 and Seq Scan for id>10?!

On 6/1/07, Joshua D. Drake <jd@commandprompt.com> wrote:

> Nothing. You have to scan the table because you aren't giving postgresql
> anything to use the index by.
>
> Joshua D. Drake
>

Re: Seq Scan

От
Michael Glaesemann
Дата:
On Jun 1, 2007, at 12:24 , Tyler Durden wrote:


> On 6/1/07, Joshua D. Drake <jd@commandprompt.com> wrote:
>
>> Nothing. You have to scan the table because you aren't giving
>> postgresql
>> anything to use the index by.

> # explain ANALYZE select id from table_name where id>200000;
>
> QUERY PLAN
> ----------------------------------------------------------------------
> ----------------------------------------------------------------------
> Index Scan using table_name_pkey on table_name  (cost=0.00..2618.96
> rows=68960 width=4) (actual time=220.543..1479.495 rows=66730 loops=1)
>   Index Cond: (id > 200000)
> Total runtime: 1504.839 ms
> (3 rows)
>
> dun=# explain ANALYZE select id from table_name where id>10;
>                                                      QUERY PLAN
> ----------------------------------------------------------------------
> ------------------------------------------------
> Seq Scan on table_name  (cost=0.00..9268.12 rows=266703 width=4)
> (actual time=107.935..2733.592 rows=266720 loops=1)
>   Filter: (id > 10)
> Total runtime: 2833.744 ms
> (3 rows)
>
>
> It uses Index Scan for id>200000 and Seq Scan for id>10?!

[Please don't top-post. It makes discussions harder to follow]

Because the planner estimates that it will be faster for it to scan
the entire table than to use the index in the latter case. Note that
only about 70,000 rows need to be visited for id > 200000, while
nearly 270,000 rows need to be visited when id > 10.

Michael Glaesemann
grzm seespotcode net



Re: Seq Scan

От
Reece Hart
Дата:
On Fri, 2007-06-01 at 18:24 +0100, Tyler Durden wrote:
It uses Index Scan for id>200000 and Seq Scan for id>10?!

Based on the statistics pg has for your table, and on the cost of using the index, the cost based optimizer decided that it's more efficient to seq scan all of the rows than to incur the index overhead. These decisions are not always correct, but they're usually quite good.

If you don't believe it, try:
# set enable_seqscan=off;
# explain analyze ...
# set enable_seqscan=on;
# explain analyze ...

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Re: Seq Scan

От
Kevin Hunter
Дата:
of the conversation.
breaks the flow
because it
toppost
Please don't

At 1:17p -0400 on 01 Jun 2007, Tyler Durden wrote:
> Yes,  either case happens the same.
> I'm come recently from MySQL and it works in a different way.
> I find strange that a simple SELECT COUNT(...) is so slow with only
> 700 000 records.
> Has been a nightmare optimizing this tables/queries.
> Sorry about this silly question, but I'm new to Posgresql.

The much more knowledgable will correct me, but the abbr. version is
that it is for data integrity and correctness reasons and the
inherent way in which the MVCC model works.  If it's any condolence,
I believe that Oracle also suffers with this particular query, but I
haven't seen any benchmarks to prove that (I think because Oracle
expressly forbids benchmarks and comparisons in their license, but
don't quote me on that).

For a complete discussion, go this page and look for the thread with
the subject 'Performance of count(*)'

http://archives.postgresql.org/pgsql-performance/2007-03/threads.php

I believe that may help.

Kevin

Re: Seq Scan

От
Tom Lane
Дата:
Kevin Hunter <hunteke@earlham.edu> writes:
> At 1:17p -0400 on 01 Jun 2007, Tyler Durden wrote:
>> I find strange that a simple SELECT COUNT(...) is so slow with only
>> 700 000 records.

> The much more knowledgable will correct me, but the abbr. version is
> that it is for data integrity and correctness reasons and the
> inherent way in which the MVCC model works.

The bottom line is that a "correct" implementation (ie, one that fully
respects MVCC behavior) would create enormous overhead, as well as bad
contention bottlenecks for concurrent updates.  It doesn't seem worth it.

If you want a cheap approximate answer, there are a couple of ways to
get one, but SELECT COUNT(*) is not that.

            regards, tom lane

Re: Seq Scan

От
"Tyler Durden"
Дата:
Ok, Thank all for the clarification.