Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated

Поиск
Список
Период
Сортировка
От Sasa Vilic
Тема Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated
Дата
Msg-id 557A2555.7090807@gmail.com
обсуждение исходный текст
Ответы Re: Slow query: Postgres chooses nested loop over hash join, whery by hash join is much faster, wrong number of rows estimated  (Merlin Moncure <mmoncure@gmail.com>)
Список pgsql-performance
Hi,

I have a query that takes ridiculously long to complete (over 500ms) but
if I disable nested loop it does it really fast (24.5ms)

Here are links for
* first request (everything enabled): http://explain.depesz.com/s/Q1M
* second request (nested loop disabled): http://explain.depesz.com/s/9ZY

I have also noticed, that setting

set join_collapse_limit = 1;

produces similar results as when nested loops are disabled.

Autovacuumm is running, and I did manually performed both: analyze and
vacuumm analyze. No effect.

I tried increasing statistics for columns (slot, path_id, key) to 5000
for table data. No effect.

I tried increasing statistics for columns (id, parent, key) to 5000 for
table path. No effect.

I can see, that postgres is doing wrong estimation on request count, but
I can't figure it out why.

Table path is used to represent tree-like structure.

== QUERY ==

SELECT p1.value as request_type, p2.value as app_id, p3.value as app_ip,
p3.id as id, data.*, server.name
FROM data
INNER JOIN path p3 ON data.path_id = p3.id
INNER JOIN server on data.server_id = server.id
INNER JOIN path p2 on p2.id = p3.parent
INNER JOIN path p1 on p1.id = p2.parent
WHERE data.slot between '2015-02-18 00:00:00' and '2015-02-19 00:00:00'
     AND p1.key = 'request_type' AND p2.key = 'app_id' AND p3.key = 'app_ip'
;

== TABLES ==
                                              Table "public.path"
  Column |         Type          | Modifiers                     |
Storage  | Description
--------+-----------------------+---------------------------------------------------+----------+-------------
  id     | integer               | not null default
nextval('path_id_seq'::regclass) | plain    |
  parent | integer |                                                   |
plain    |
  key    | character varying(25) | not
null                                          | extended |
  value  | character varying(50) | not
null                                          | extended |
Indexes:
     "path_pkey" PRIMARY KEY, btree (id)
     "path_unique" UNIQUE CONSTRAINT, btree (parent, key, value)
Foreign-key constraints:
     "path.fg.parent->path(id)" FOREIGN KEY (parent) REFERENCES path(id)
Referenced by:
     TABLE "data" CONSTRAINT "data_fkey_path" FOREIGN KEY (path_id)
REFERENCES path(id)
     TABLE "path" CONSTRAINT "path.fg.parent->path(id)" FOREIGN KEY
(parent) REFERENCES path(id)
Has OIDs: no

                                Table "public.data"
   Column   |              Type              | Modifiers | Storage  |
Description
-----------+--------------------------------+-----------+----------+-------------
  slot      | timestamp(0) without time zone | not null  | plain    |
  server_id | integer                        | not null  | plain    |
  path_id   | integer                        | not null  | plain    |
  key       | character varying(50)          | not null  | extended |
  value     | real                           | not null  | plain    |
Indexes:
     "data_pkey" PRIMARY KEY, btree (slot, server_id, path_id, key)
Foreign-key constraints:
     "data_fkey_path" FOREIGN KEY (path_id) REFERENCES path(id)
Has OIDs: no

svilic=> select count(*) from path;
  count
-------
    603

svilic=> select count(*) from path p1 inner join path p2 on p1.id =
p2.parent inner join path p3 on p2.id = p3.parent where p1.parent is null;
  count
-------
    463

svilic=> select count(*) from server;
  count
-------
     37

svilic=> select count(*) from data;
   count
----------
  23495552


svilic=> select version();
version
-------------------------------------------------------------------------------------------------------------
  PostgreSQL 9.1.17 on x86_64-unknown-linux-gnu, compiled by gcc
(Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3, 64-bit

== SERVER CONFIGURATION ==

shared_buffers = 512MB
work_mem = 8MB (I have tried changing it to 32, 128 and 512, no effect)
maintenance_work_mem = 64MB
checkpoint_segments = 100
random_page_cost = 4.0
effective_cache_size = 3072MB

== HARDWARE CONFIGURATION ==

cpu: Intel(R) Core(TM) i3-2100 CPU @ 3.10GHz (4 cores)
mem: 8GB
system is using regular disks, (no raid and no ssd)





В списке pgsql-performance по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: How to reduce writing on disk ? (90 gb on pgsql_tmp)
Следующее
От: Johann Spies
Дата:
Сообщение: Re: Slow query - lots of temporary files.