[PERFORM] Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rowseach

Поиск
Список
Период
Сортировка
От Karl Czajkowski
Тема [PERFORM] Re: Fwd: Slow query from ~7M rows, joined to two tables of ~100 rowseach
Дата
Msg-id 20170624020116.GA27236@moraine.isi.edu
обсуждение исходный текст
Ответ на [PERFORM] Fwd: Slow query from ~7M rows, joined to two tables of ~100 rows each  (Chris Wilson <chris+postgresql@qwirx.com>)
Список pgsql-performance
On Jun 23, Chris Wilson modulated:
> ...
>     create table metric_pos (id serial primary key, pos integer);
>     create index idx_metric_pos_id_pos on metric_pos (id, pos);
> ...
>     create table asset_pos (id serial primary key, pos integer);
> ...

Did you only omit a CREATE INDEX statement on asset_pos (id, pos) from
your problem statement or also from your actual tests?  Without any
index, you are forcing the query planner to do that join the hard way.


>     CREATE TABLE metric_value
>     (
>       id_asset integer NOT NULL,
>       id_metric integer NOT NULL,
>       value double precision NOT NULL,
>       date date NOT NULL,
>       timerange_transaction tstzrange NOT NULL,
>       id bigserial NOT NULL,
>       CONSTRAINT cons_metric_value_pk PRIMARY KEY (id)
>     )
>     WITH (
>       OIDS=FALSE
>     );
>
> ...
>     CREATE INDEX idx_metric_value_id_metric_id_asset_date ON
>     metric_value (id_metric, id_asset, date, timerange_transaction,
>     value);
> ...

Have you tried adding a foreign key constraint on the id_asset and
id_metric columns?  I wonder if you'd get a better query plan if the
DB knew that the inner join would not change the number of result
rows.  I think it's doing the join inside the filter step because
it assumes that the inner join may drop rows.

Also, did you include an ANALYZE step between your table creation
statements and your query benchmarks?  Since you are dropping and
recreating test data, you have no stats on anything.


> This is an example of the kind of query we would like to speed up:
>
>
>     SELECT metric_pos.pos AS pos_metric, asset_pos.pos AS pos_asset,
>     date, value
>     FROM metric_value
>     INNER JOIN asset_pos ON asset_pos.id = metric_value.id_asset
>     INNER JOIN metric_pos ON metric_pos.id = metric_value.id_metric
>     WHERE
>     date >= '2016-01-01' and date < '2016-06-01'
>     AND timerange_transaction @> current_timestamp
>     ORDER BY metric_value.id_metric, metric_value.id_asset, date
>

How sparse is the typical result set selected by these date and
timerange predicates?  If it is sparse, I'd think you want your
compound index to start with those two columns.

Finally, your subject line said you were joining hundreds of rows to
millions.  In queries where we used a similarly small dimension table
in the WHERE clause, we saw massive speedup by pre-evaluating that
dimension query to produce an array of keys, the in-lining the actual
key constants in the where clause of a main fact table query that
no longer had the join in it.

In your case, the equivalent hack would be to compile the small
dimension tables into big CASE statements I suppose...


Karl


В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PERFORM] Efficiently merging and sorting collections of sorted rows
Следующее
От: Rikard Pavelic
Дата:
Сообщение: [PERFORM] slow delete due to reference