planner chooses unoptimal plan on joins with complex key

Поиск

Список

Период

Сортировка

От	Dmitry Potapov
Тема	planner chooses unoptimal plan on joins with complex key
Дата	23 января 2008 г. 12:51:56
Msg-id	878c83960801230548i6bee9353m19bcee7335220c9b@mail.gmail.com обсуждение исходный текст
Ответы	Re: planner chooses unoptimal plan on joins with complex key ("Guillaume Smet" <guillaume.smet@gmail.com>)
Список	pgsql-performance

Дерево обсуждения

I've got two huge tables with one-to-many relationship with complex
key. There's also a view, which JOINs the tables, and planner chooses
unoptimal plan on SELECTs from this view.

The db schema is declared as:  (from on now, I skip the unsignificant
columns for the sake of simplicity)

CREATE TABLE t1 (
   id integer NOT NULL,
   m1 integer NOT NULL DEFAULT 0,
   m2 bigint NOT NULL DEFAULT 0,
   m3 bigint NOT NULL DEFAULT 0,
   time_stamp timestamp without time zone DEFAULT now() NOT NULL,
[...skipped...]
);

CREATE TABLE t2 (
   id integer NOT NULL,
   m1 integer NOT NULL DEFAULT 0,
   m2 bigint NOT NULL DEFAULT 0,
   m3 bigint NOT NULL DEFAULT 0,
   time_stamp timestamp without time zone DEFAULT now() NOT NULL,
[...skipped...]
);

CREATE VIEW t1t2_view AS SELECT ..., t1.m1, t1.m2, t1.m3,
t1.time_stamp  FROM t1 JOIN t2 on ( (t1.m1=t2.m1) AND (t1.m2=t2.m2)
AND (t1.m3=t2.m3));

CREATE UNIQUE INDEX i_t1_ms ON t1(m1,m2,m3);
CREATE INDEX i_t1_ts ON t1(time_stamp);
CREATE INDEX i_t2_ms ON t2(m1,m2,m3);

Table t1 contains ~20M rows, t2 contains ~30M rows. The complex key
that ties one table to another is implied, i.e. (m1,m2,m3) isn't
declared as foreign key. There's a reason for that: an app needs to
push lots of INSERTs to these tables pretty quickly, and additional
foreign key constraint check will kill the performance.

So, here's the query in question:

SELECT * FROM t1t2_view ORDER BY time_stamp ASC LIMIT 100;

EXPLAIN ANALYZE SELECT * FROM t1t2_view ORDER BY time_stamp ASC LIMIT 100:

Limit  (cost=13403340.40..13403340.40 rows=1 width=152)
  ->  Sort  (cost=13403340.40..13403340.40 rows=1 width=152)
        Sort Key: t1.time_stamp
        ->  Merge Join  (cost=6663466.28..13403340.39 rows=1 width=152)
              Merge Cond: ((t1.m1 = t2.m1) AND (t1.m2 = t2.m2) AND
(t1.m3 = t2.m3))
              ->  Index Scan using i_t1_ms on t1
(cost=0.00..6272009.52 rows=21639880 width=121)
              ->  Sort  (cost=6663466.28..6739884.33 rows=30567222 width=51)
                    Sort Key: t2.m1, t2.m2, t2.m3
                    ->  Seq Scan on t2  (cost=0.00..922814.22
rows=30567222 width=51)

When I set enable_sort and enable_mergejoin to off, the planner
chooses  better plan:

EXPLAIN ANALYZE SELECT * FROM t1t2_view ORDER BY time_stamp ASC LIMIT 100

 Limit  (cost=0.00..175299576.86 rows=1 width=152)
  ->  Nested Loop  (cost=0.00..175299576.86 rows=1 width=152)
        ->  Index Scan using i_t1_ts on t1  (cost=0.00..1106505.70
rows=21642342 width=121)
        ->  Index Scan using i_t2_ms on t2  (cost=0.00..8.03 rows=1 width=51)
              Index Cond: ((t1.m1 = t2.m1) AND (t1.m2 = t2.m2) AND
(t1.m3 = t2.m3))

The problem here is, as far as I understand, is the wrong estimate of
row count in join result table.

Postgresql version is 8.2.5. The tables are ANALYZEd, Changing
default_statistics_target from 10 to 100, and even 300 doesn't affect
planner's behaviour.

Is there any possibility to make the planner to choose an optimal plan
without turning off enable_sort and enable_mergejoin?

Thanks in advance.


--
Regards,
            Dmitry

В списке pgsql-performance по дате отправления:

Предыдущее

От: "Guillaume Smet"
Дата: 23 января 2008 г., 11:42:17
Сообщение: *_cost recommendation with 8.3 and a fully cached db

Следующее

От: "Guillaume Smet"
Дата: 23 января 2008 г., 13:04:48
Сообщение: Re: planner chooses unoptimal plan on joins with complex key

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

planner chooses unoptimal plan on joins with complex key

Предыдущее

Следующее