CTE with JOIN of two tables is much faster than a regular query

Поиск
Список
Период
Сортировка
От
Тема CTE with JOIN of two tables is much faster than a regular query
Дата
Msg-id 004b01d436d6$ff96c840$fec458c0$@gmail.com
обсуждение исходный текст
Ответы Re: CTE with JOIN of two tables is much faster than a regular query  (Andreas Kretschmer <andreas@a-kretschmer.de>)
Re: CTE with JOIN of two tables is much faster than a regular query  (Stephen Frost <sfrost@snowman.net>)
Re: CTE with JOIN of two tables is much faster than a regular query  (Ravi Krishna <sravikrishna@aol.com>)
(was: CTE with JOIN of two tables is much faster than a regularquery)  (Albrecht Dreß <albrecht.dress@arcor.de>)
Список pgsql-general

Running PostgreSQL 9.5 on Windows.

 

The CTE mentioned below completes the query in 4.5 seconds while the regular query takes 66 seconds. I read from EXPLAIN ANALYSE that the regular query starts with a full table scan over “Doc” while the CTE joins the two tables first and applies the filter condition in the 2nd step.

I believe that some rows in “Doc” which are not referenced by “F” contain a large amount of data in the field “szText” and this will slow down the ILIKE operator.

 

What can I do to improve the performance of the regular query without using a CTE?

 

This is a much simplified extract from a larger application:

 

CREATE TABLE Doc (

  oID UUID NOT NULL PRIMARY KEY,

  uDocID UUID NOT NULL UNIQUE,

  szText TEXT

  );

 

CREATE TABLE F (

  oID UUID NOT NULL PRIMARY KEY,

  uDocRef UUID,

  CONSTRAINT F_fkey1 FOREIGN KEY (uDocRef) REFERENCES Doc (uDocID)

  ); 

 

-- just in case …

ALTER TABLE Doc ALTER uDocID SET STATISTICS 10000;

ALTER TABLE Doc ALTER szText SET STATISTICS 10000;

VACUUM ANALYSE Doc;

 

SELECT COUNT(*) FROM Doc;

=> 125946 records

 

ALTER TABLE F ALTER uDocRef SET STATISTICS 10000;

VACUUM ANALYSE F;

 

SELECT COUNT(*) FROM F;

=> 32605 records

 

Result with CTE:

 

EXPLAIN ANALYSE

  WITH a AS (

  SELECT F.oID, Doc.szText

  FROM F

  JOIN Doc ON F.uDocRef = Doc.udocid

)

SELECT *

  FROM a

  WHERE szText ILIKE '%480GB%';

 

"CTE Scan on a  (cost=9463.42..10197.03 rows=52 width=48) (actual time=478.770..4551.613 rows=10 loops=1)"

"  Filter: (sztext ~~* '%480GB%'::text)"

"  Rows Removed by Filter: 32595"

"  CTE a"

"    ->  Hash Join  (cost=973.61..9463.42 rows=32605 width=359) (actual time=36.998..100.337 rows=32605 loops=1)"

"          Hash Cond: (doc.udocid = f.udocref)"

"          ->  Seq Scan on doc  (cost=0.00..7691.46 rows=125946 width=359) (actual time=0.008..18.269 rows=125946 loops=1)"

"          ->  Hash  (cost=566.05..566.05 rows=32605 width=32) (actual time=35.825..35.825 rows=32605 loops=1)"

"                Buckets: 32768  Batches: 1  Memory Usage: 2294kB"

"                ->  Seq Scan on f  (cost=0.00..566.05 rows=32605 width=32) (actual time=0.005..14.677 rows=32605 loops=1)"

"Planning time: 4.689 ms"

"Execution time: 4554.893 ms"

 

Result with regular query:

 

EXPLAIN ANALYSE

SELECT F.oID, Doc.szText

FROM F

JOIN Doc ON F.uDocRef = Doc.udocid

WHERE szText ILIKE '%480GB%';

 

"Hash Join  (cost=8006.56..8694.93 rows=5 width=359) (actual time=66500.415..66506.978 rows=10 loops=1)"

"  Hash Cond: (f.udocref = doc.udocid)"

"  ->  Seq Scan on f  (cost=0.00..566.05 rows=32605 width=32) (actual time=0.002..3.143 rows=32605 loops=1)"

"  ->  Hash  (cost=8006.32..8006.32 rows=19 width=359) (actual time=66500.023..66500.023 rows=16 loops=1)"

"        Buckets: 1024  Batches: 1  Memory Usage: 19kB"

"        ->  Seq Scan on doc  (cost=0.00..8006.32 rows=19 width=359) (actual time=8864.720..66499.991 rows=16 loops=1)"

"              Filter: (sztext ~~* '%480GB%'::text)"

"              Rows Removed by Filter: 125930"

"Planning time: 263.542 ms"

"Execution time: 66507.003 ms"

 

 

 

 

В списке pgsql-general по дате отправления:

Предыдущее
От: 김세훈
Дата:
Сообщение: Re: using graph model with PostgreSQL
Следующее
От: Andreas Kretschmer
Дата:
Сообщение: Re: CTE with JOIN of two tables is much faster than a regular query