Question about double table scans for a table

Поиск
Список
Период
Сортировка
От Ba Jinsheng
Тема Question about double table scans for a table
Дата
Msg-id SEZPR06MB64949D07C54EA9053425A80E8A01A@SEZPR06MB6494.apcprd06.prod.outlook.com
обсуждение исходный текст
Ответы Re: Question about double table scans for a table  (David Rowley <dgrowleyml@gmail.com>)
Re: Question about double table scans for a table  (jian he <jian.universality@gmail.com>)
Re: Question about double table scans for a table  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-bugs

Hi everyone,

 

Consider the query 11 in the TPC-H benchmark:

select

    ps_partkey,

    sum(ps_supplycost * ps_availqty) as value

from

    PARTSUPP,

    SUPPLIER,

    NATION

where

    ps_suppkey = s_suppkey

    and s_nationkey = n_nationkey

    and n_name = 'MOZAMBIQUE'

group by

    ps_partkey

having

    sum(ps_supplycost * ps_availqty) > (

        select

            sum(ps_supplycost * ps_availqty) * 0.0001000000

        from

            PARTSUPP,

            SUPPLIER,

            NATION

        where

            ps_suppkey = s_suppkey

            and s_nationkey = n_nationkey

            and n_name = 'MOZAMBIQUE'

    )

order by

    value desc;

 

 

PostgreSQL generates the following query plan:

 Sort  (cost=1798.52..1799.32 rows=320 width=36)

   Sort Key: (sum((partsupp.ps_supplycost * (partsupp.ps_availqty)::numeric))) DESC

   InitPlan 1 (returns $0)

     ->  Aggregate  (cost=884.20..884.21 rows=1 width=32)

           ->  Hash Join  (cost=12.40..877.00 rows=960 width=10)

                 Hash Cond: (partsupp_1.ps_suppkey = supplier_1.s_suppkey)

                 ->  Seq Scan on partsupp partsupp_1  (cost=0.00..765.00 rows=24000 width=14)

                 ->  Hash  (cost=12.25..12.25 rows=12 width=4)

                       ->  Hash Join  (cost=1.32..12.25 rows=12 width=4)

                             Hash Cond: (supplier_1.s_nationkey = nation_1.n_nationkey)

                             ->  Seq Scan on supplier supplier_1  (cost=0.00..10.00 rows=300 width=8)

                             ->  Hash  (cost=1.31..1.31 rows=1 width=4)

                                   ->  Seq Scan on nation nation_1  (cost=0.00..1.31 rows=1 width=4)

                                         Filter: (n_name = 'MOZAMBIQUE'::bpchar)

   ->  HashAggregate  (cost=886.60..901.00 rows=320 width=36)

         Group Key: partsupp.ps_partkey

         Filter: (sum((partsupp.ps_supplycost * (partsupp.ps_availqty)::numeric)) > $0)

         ->  Hash Join  (cost=12.40..877.00 rows=960 width=14)

               Hash Cond: (partsupp.ps_suppkey = supplier.s_suppkey)

               ->  Seq Scan on partsupp  (cost=0.00..765.00 rows=24000 width=18)

               ->  Hash  (cost=12.25..12.25 rows=12 width=4)

                     ->  Hash Join  (cost=1.32..12.25 rows=12 width=4)

                           Hash Cond: (supplier.s_nationkey = nation.n_nationkey)

                           ->  Seq Scan on supplier  (cost=0.00..10.00 rows=300 width=8)

                           ->  Hash  (cost=1.31..1.31 rows=1 width=4)

                                 ->  Seq Scan on nation  (cost=0.00..1.31 rows=1 width=4)

                                       Filter: (n_name = 'MOZAMBIQUE'::bpchar)

 

While TiDB has the following query plan:

 

Projection_63                             

 └─Sort_64                                 

   └─Selection_66                          

     └─HashAgg_67                          

       └─Projection_94                     

         └─HashJoin_71                     

           ├─HashJoin_84(Build)            

            ├─TableReader_89(Build)       

            └─Selection_88              

               └─TableFullScan_87        

            └─TableReader_86(Probe)       

              └─TableFullScan_85          

           └─TableReader_91(Probe)         

             └─TableFullScan_90            

...

 

Both query plans include different numbers of table scans, as highlighted in red color. PostgreSQL uses six table scans, while TiDB has only three. I understand that the table scanning operation is expensive and query plans are typically more efficient with fewer table scans. My question is why PostgreSQL uses six table scans to scan each table twice? Is it a more efficient query plan, or does this indicate an optimization that is not performed by PostgreSQL?

 

 

 

Best regards,

Jinsheng Ba

 

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18038: Aliases removed from view definitions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #18038: Aliases removed from view definitions