Обсуждение: Unusual slowdown using subselects

Поиск
Список
Период
Сортировка

Unusual slowdown using subselects

От
John Aughey
Дата:
I'm stress testing my application by creating large data sets.  This
particular query selects rows from the schedule table that have a specific
owner_id.  (I'll show you the results of explain)

calendar=# explain select * from schedule where schedule.owner_id=101 or
schedule.owner_id=102;
Index Scan using schedule_id_index, schedule_id_index on schedule
(cost=0.00..78.64 rows=20 width=40)

Looks great and executes very fast.

calendar=# explain select group_id from groups where
user_id=101;
NOTICE:  QUERY PLAN:
Index Scan using groups_id_index on groups  (cost=0.00..2.02 rows=1
width=4)

Again, very fast.  The groups table maps users to groups.

However, this next one is slow.

calendar=# explain select * from schedule where schedule.owner_id in
(select group_id from groups where user_id=101);
NOTICE:  QUERY PLAN:
Seq Scan on schedule  (cost=0.00..2039895.00 rows=1000000 width=40)
  SubPlan
    ->  Materialize  (cost=2.02..2.02 rows=1 width=4)
          ->  Index Scan using groups_id_index on groups  (cost=0.00..2.02
rows=1 width=4)

You'll see in this one, where the first example did a index scan, this one
with a very similar query does a seq scan.  The two queries should be
nearly identical, but this one runs very slowly.

Can anyone explain why this happens and/or how I can do a sub-select like
this and get fast results?

Thank you
John Aughey



Re: Unusual slowdown using subselects

От
"Steve Wolfe"
Дата:
> calendar=# explain select * from schedule where schedule.owner_id in
> (select group_id from groups where user_id=101);

> calendar=# explain select * from schedule where schedule.owner_id=101 or
> schedule.owner_id=102;

  A sub-select is different than a join, and is tretaed differently by the
database, and they are much, much slower than simply doing a join.  So, we
use subselects only when absolutely, positively necessary, otherwise, we
just do a join.

steve



Re: Unusual slowdown using subselects

От
Stephan Szabo
Дата:
> calendar=# explain select * from schedule where schedule.owner_id in
> (select group_id from groups where user_id=101);
> NOTICE:  QUERY PLAN:
> Seq Scan on schedule  (cost=0.00..2039895.00 rows=1000000 width=40)
>   SubPlan
>     ->  Materialize  (cost=2.02..2.02 rows=1 width=4)
>           ->  Index Scan using groups_id_index on groups  (cost=0.00..2.02
> rows=1 width=4)
>
> You'll see in this one, where the first example did a index scan, this one
> with a very similar query does a seq scan.  The two queries should be
> nearly identical, but this one runs very slowly.
>
> Can anyone explain why this happens and/or how I can do a sub-select like
> this and get fast results?

Try using EXISTS rather than IN (see the FAQ for more information).


Re: Unusual slowdown using subselects

От
Alexander Dederer
Дата:
John Aughey wrote:

> I'm stress testing my application by creating large data sets.  This
> particular query selects rows from the schedule table that have a specific
> owner_id.  (I'll show you the results of explain)
>
> calendar=# explain select * from schedule where schedule.owner_id=101 or
> schedule.owner_id=102;
> Index Scan using schedule_id_index, schedule_id_index on schedule
> (cost=0.00..78.64 rows=20 width=40)
>
> Looks great and executes very fast.
>
> calendar=# explain select group_id from groups where
> user_id=101;
> NOTICE:  QUERY PLAN:
> Index Scan using groups_id_index on groups  (cost=0.00..2.02 rows=1
> width=4)
>
> Again, very fast.  The groups table maps users to groups.
>
> However, this next one is slow.
>
> calendar=# explain select * from schedule where schedule.owner_id in
> (select group_id from groups where user_id=101);
> NOTICE:  QUERY PLAN:
> Seq Scan on schedule  (cost=0.00..2039895.00 rows=1000000 width=40)
>   SubPlan
>     ->  Materialize  (cost=2.02..2.02 rows=1 width=4)
>           ->  Index Scan using groups_id_index on groups  (cost=0.00..2.02
> rows=1 width=4)
>

In my DB:
# explain SELECT * FROM grls WHERE grls.ag_id  = 24;
NOTICE:  QUERY PLAN:
Index Scan using grls_ag_id on grls  (cost=0.00..597.87 rows=849 width=122)

# explain SELECT ag_id FROM agncs WHERE ag_id = 24;
NOTICE:  QUERY PLAN:
Seq Scan on agncs  (cost=0.00..1.31 rows=1 width=4)

And together:
# explain select * from grls where grls.ag_id in (select ag_id from agncs
where ag_id = 24);
NOTICE:  QUERY PLAN:
Seq Scan on grls  (cost=0.00..40623.38 rows=30195 width=122)
  SubPlan
    ->  Materialize  (cost=1.31..1.31 rows=1 width=4)
          ->  Seq Scan on agncs  (cost=0.00..1.31 rows=1 width=4)
--------------------------------------
# select count(*) from grls;
 30195

Summarize - with subselect indices ignores  and search look all DB rows.

IT'S BUG.

P.S.
Sorry my English.

Re: Re: Unusual slowdown using subselects

От
Stephan Szabo
Дата:

> In my DB:
> # explain SELECT * FROM grls WHERE grls.ag_id  = 24;
> NOTICE:  QUERY PLAN:
> Index Scan using grls_ag_id on grls  (cost=0.00..597.87 rows=849 width=122)
>
> # explain SELECT ag_id FROM agncs WHERE ag_id = 24;
> NOTICE:  QUERY PLAN:
> Seq Scan on agncs  (cost=0.00..1.31 rows=1 width=4)
>
> And together:
> # explain select * from grls where grls.ag_id in (select ag_id from agncs
> where ag_id = 24);
> NOTICE:  QUERY PLAN:
> Seq Scan on grls  (cost=0.00..40623.38 rows=30195 width=122)
>   SubPlan
>     ->  Materialize  (cost=1.31..1.31 rows=1 width=4)
>           ->  Seq Scan on agncs  (cost=0.00..1.31 rows=1 width=4)
> --------------------------------------
> # select count(*) from grls;
>  30195
>
> Summarize - with subselect indices ignores  and search look all DB rows.

Not quite.  in(<subselect>) doesn't use indexes (postgres doesn't realize
that this is effectively a join), but exists(<subselect>) often will,
and is the work around for the problem until it gets fixed (see FAQ
for more information - I forget the number, the title of the question
is something like "why are my subselects using in so slow."