Обсуждение: nested query vs left join: query planner very confused

Поиск
Список
Период
Сортировка

nested query vs left join: query planner very confused

От
David Rysdam
Дата:
I've got two tables, sigs and mags. It's a one-to-one relationship, mags
is just split out because we store a big, less-often-used field
there. "signum" is the key field.

Sometimes I want to know if I have any orphans in mags, so I do a query
like this:

    select signum from lp.Mags where signum is not null and signum not
    in (select lp.Sigs.signum from lp.Sigs)

(I do this as a subquery because we originally had a old Sybase DB where
outer joins were a non-standard pain and this way works the same and is
DBMS-agnostic.)

At my location, this query runs very fast (~50ms on a ~100k row table)
and 'explain' shows a plan with this structure:

    Seq scan on mags
        Filter:
        SubPlan 1
            Seq scan on sigs

At my client's location, the query is very slow (same table size,
similar hardware/config, although they are running 9.0.x and I'm on
9.2.x). By "slow" I mean like an *hour*. 'explain' has this structure:

    Seq scan on mags
        Filter:
        SubPlan 1
            Materialize
                Seq scan on sigs

I'd never heard of Materialize before, so I looked into it. Seems to
make a virtual table of the subquery so repetitions of the parent query
don't have to re-do the work. Sounds like it should only help, right?

The client's 'explain analyze' shows this:

   Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual time=3004851.889..3004851.889  rows=0
loops=1)
      Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
      SubPlan 1
        -> Materialize  (cost=0.00..3713.93  rows=95862  width=4) (actual time=0.011..16.145  rows=48139  loops=94951)
               -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 width=4) (actual time=0.010..674.201  rows=95862
loops=1)
   Total runtime: 3004852.005 ms

If I'm reading this correctly, the Materialize is running ~95k times,
taking the majority of the time. Why? The only thing I can think of is
this scenario:

    1) server thinks it has a LOT of RAM
    2) decides to Materialize subquery to take advantage
    3) machine does not actually have that RAM, so it gets swapped
    4) server notices it was swapped and decides to re-run rather than
    unswap
    5) goto 2

I don't know if that's a realistic scenario, but it's all I got. I'm
already well into unknown territory, performance-tuning-wise.

I also decided to try doing the query a different way:

     select lp.mags.signum from lp.mags left join lp.sigs on
     lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
     and lp.sigs.signum is null;

This one runs fast for both of us. So I guess my second question is: why
can't the query planner tell these are the same query?


Re: nested query vs left join: query planner very confused

От
bricklen
Дата:

On Wed, Nov 27, 2013 at 7:56 AM, David Rysdam <drysdam@ll.mit.edu> wrote:


At my client's location, the query is very slow (same table size,
similar hardware/config, although they are running 9.0.x and I'm on
9.2.x). By "slow" I mean like an *hour*. 'explain' has this structure:

    Seq scan on mags
        Filter:
        SubPlan 1
            Materialize
                Seq scan on sigs

I'd never heard of Materialize before, so I looked into it. Seems to
make a virtual table of the subquery so repetitions of the parent query
don't have to re-do the work. Sounds like it should only help, right?

The client's 'explain analyze' shows this:

   Seq Scan on mags  (cost=0.00..187700750.56  rows=47476  width=4) (actual time=3004851.889..3004851.889  rows=0  loops=1)
          Filter:  ((signum IS NOT NULL) AND (NOT (SubPlan 1)))
          SubPlan 1
                -> Materialize  (cost=0.00..3713.93  rows=95862  width=4) (actual time=0.011..16.145  rows=48139  loops=94951)
                           -> Seq Scan on sigs (cost=0.00..2906.62 rows=95862 width=4) (actual time=0.010..674.201  rows=95862  loops=1)
   Total runtime: 3004852.005 ms


Has the client ANALYZEd recently? What happens if the client issues the following commands before executing the query?
VACUUM ANALYZE lp.sigs;
VACUUM ANALYZE lp.mags;

If that doesn't change the plan, could you post the values for effective_cache_size, shared_buffers, random_page_cost, cpu_tuple_cost, work_mem and how much RAM is in the client machine?

Re: nested query vs left join: query planner very confused

От
Vik Fearing
Дата:
On 11/27/2013 04:56 PM, David Rysdam wrote:
> I've got two tables, sigs and mags. It's a one-to-one relationship, mags
> is just split out because we store a big, less-often-used field
> there. "signum" is the key field.
>
> Sometimes I want to know if I have any orphans in mags, so I do a query
> like this:
>
>     select signum from lp.Mags where signum is not null and signum not
>     in (select lp.Sigs.signum from lp.Sigs)

[...]

> I also decided to try doing the query a different way:
>
>      select lp.mags.signum from lp.mags left join lp.sigs on
>      lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
>      and lp.sigs.signum is null;
>
> This one runs fast for both of us. So I guess my second question is: why
> can't the query planner tell these are the same query?

Because they're not the same query.  NOT IN has a bunch of semantics
issues regarding nulls which the anti-join in the second query does not
have.

--
Vik



Re: nested query vs left join: query planner very confused

От
Tom Lane
Дата:
David Rysdam <drysdam@ll.mit.edu> writes:
> Sometimes I want to know if I have any orphans in mags, so I do a query
> like this:

>     select signum from lp.Mags where signum is not null and signum not
>     in (select lp.Sigs.signum from lp.Sigs)

> (I do this as a subquery because we originally had a old Sybase DB where
> outer joins were a non-standard pain and this way works the same and is
> DBMS-agnostic.)

DBMS-agnostic frequently means "sucks equally badly on all DBMSes" :-(
This query is hard to optimize because of the weird behavior of NOT IN
when nulls are involved.  Since you aren't complaining that the query
fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
but the planner doesn't know that.  If you can transform it to a NOT
EXISTS, you'll likely get a much better plan:

    select signum from lp.Mags where signum is not null and
    not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)

What you want is an "anti join" plan, or at least a plan that mentions
a "hashed subplan".  Plain subplans are death performance-wise, because
they amount to being nestloop joins rather than anything smarter.  (In
this case it's likely not choosing a hashed subplan because work_mem is
too small to allow that.)

> I also decided to try doing the query a different way:

>      select lp.mags.signum from lp.mags left join lp.sigs on
>      lp.mags.signum = lp.sigs.signum where lp.mags.signum is not null
>      and lp.sigs.signum is null;

That's another way to get an anti-join (at least on recent PGs, I forget
if 9.0 recognizes it).

> This one runs fast for both of us. So I guess my second question is: why
> can't the query planner tell these are the same query?

They aren't.  See comment about behavior with NULLs.

            regards, tom lane


Re: nested query vs left join: query planner very confused

От
David Johnston
Дата:
David Rysdam wrote
> I'd never heard of Materialize before, so I looked into it. Seems to
> make a virtual table of the subquery so repetitions of the parent query
> don't have to re-do the work. Sounds like it should only help, right?

Forgive any inaccuracies but I'm pretty sure about the following:

Materialize is this sense means what you need doesn't fit in memory (likely
work-mem setting) and needs to be saved to disk and streamed from there.
Since IO is expensive this kills.  The virtual table concept is mostly
implemented by hash (tables) and not materialize.

The materialize is only running once and creating a 95k record table, then
scanning that table 95k times to locate a potential match for each input
row.  Since materialize does not index it has to sequential scan which takes
forever.


The other question, why the difference, is that IN has to accomodate NULLs
in the lookup table; join does not.  neither does EXISTS.  If you can
replace the NOT IN with NOT EXISTS and write a correlated sub-query you
should get the same plan as the LEFT JOIN version, IIRC.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/nested-query-vs-left-join-query-planner-very-confused-tp5780585p5780596.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: nested query vs left join: query planner very confused

От
David Rysdam
Дата:
On Wed, 27 Nov 2013 11:21:09 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> DBMS-agnostic frequently means "sucks equally badly on all DBMSes" :-(

We've generally been OK (cf the ~50ms runtime for the same query at our
site), but we also notice problems sooner than our client sometimes does
and can make algorithm improvements where we don't know how to make DB
ones.

> This query is hard to optimize because of the weird behavior of NOT IN
> when nulls are involved.  Since you aren't complaining that the query
> fails entirely, I'm supposing that lp.Sigs.signum contains no nulls,
> but the planner doesn't know that.  If you can transform it to a NOT
> EXISTS, you'll likely get a much better plan:
>
>     select signum from lp.Mags where signum is not null and
>     not exists (select 1 from lp.Sigs where lp.Sigs.signum = lp.Mags.signum)

We've already shipped to the client, but I'm looking at how extensive a
patch would have to be. Very surprising we haven't hit this issue
before.

> What you want is an "anti join" plan, or at least a plan that mentions
> a "hashed subplan".  Plain subplans are death performance-wise, because
> they amount to being nestloop joins rather than anything smarter.  (In
> this case it's likely not choosing a hashed subplan because work_mem is
> too small to allow that.)

I've got the client looking for this and other values already. We'll
soon know...

Вложения

Re: nested query vs left join: query planner very confused

От
David Rysdam
Дата:
On Wed, 27 Nov 2013 11:06:51 -0500, bricklen <bricklen@gmail.com> wrote:
> Has the client ANALYZEd recently? What happens if the client issues
> the following commands before executing the query?
> VACUUM ANALYZE lp.sigs;
> VACUUM ANALYZE lp.mags;
>
> If that doesn't change the plan, could you post the values for
> effective_cache_size, shared_buffers, random_page_cost,
> cpu_tuple_cost, work_mem and how much RAM is in the client machine?

Yes, I did have them do a vacuum analyze with no result. Here's their
reply on configuration:

           It is 24 Gig.

           effective_cache_size - 12000MB
           shared_buffers - 1024MB
           random_page_cost - is commented out
           cpu_tuple_cost -  commented out
           work_mem - commented out

I assume you guys already know the default values for those last 3 on a
9.0.x server...

Вложения

Re: nested query vs left join: query planner very confused

От
Tom Lane
Дата:
David Rysdam <drysdam@ll.mit.edu> writes:
>            effective_cache_size - 12000MB
>            shared_buffers - 1024MB
>            random_page_cost - is commented out
>            cpu_tuple_cost -  commented out
>            work_mem - commented out

> I assume you guys already know the default values for those last 3 on a
> 9.0.x server...

Default work_mem is only 1MB, so that probably explains why you're not
getting a hashed subplan here.  Have them knock it up some, say on the
order of 10MB.  (If none of your queries are any more complicated than
this one, you could go higher.  But keep in mind that a backend can use
work_mem per sort/hash/materialize step, not per query --- so complex
queries can use many times work_mem.  Multiply that by the number of
backends, and you can end up in swap hell pretty quickly with an over
optimistic value.)

            regards, tom lane


Re: nested query vs left join: query planner very confused

От
David Rysdam
Дата:
On Wed, 27 Nov 2013 13:02:20 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> David Rysdam <drysdam@ll.mit.edu> writes:
> >            effective_cache_size - 12000MB
> >            shared_buffers - 1024MB
> >            random_page_cost - is commented out
> >            cpu_tuple_cost -  commented out
> >            work_mem - commented out
>
> > I assume you guys already know the default values for those last 3 on a
> > 9.0.x server...
>
> Default work_mem is only 1MB, so that probably explains why you're not
> getting a hashed subplan here.  Have them knock it up some, say on the
> order of 10MB.  (If none of your queries are any more complicated than
> this one, you could go higher.  But keep in mind that a backend can use
> work_mem per sort/hash/materialize step, not per query --- so complex
> queries can use many times work_mem.  Multiply that by the number of
> backends, and you can end up in swap hell pretty quickly with an over
> optimistic value.)

We deliberately try to keep our queries fairly simple for several
reasons. This isn't the most complicated, but they don't get much more
than this. I'll have them start with 10MB and see what they get.

Вложения

Re: nested query vs left join: query planner very confused

От
David Rysdam
Дата:
On Wed, 27 Nov 2013 13:04:54 -0500, David Rysdam <drysdam@ll.mit.edu> wrote:
> We deliberately try to keep our queries fairly simple for several
> reasons. This isn't the most complicated, but they don't get much more
> than this. I'll have them start with 10MB and see what they get.

10MB was enough to get that query to come back instantly. The same query
on some larger tables were still slow so we ended up bumping up to 50MB
to get the entire job done. That probably sounds like a lot to you guys,
but now that we know the behavior and what kind of queries we have I
think we're OK.

(And if you want a shocker, when I looked at our own DB, we've had
work_mem set to 250MB on a lot of our servers and 1GB on our main
production machine. Heh. I've got some PG tuning books here next to me
now...)

Thanks!

Вложения