Обсуждение: problem with OR'ed AND queriess

Поиск
Список
Период
Сортировка

problem with OR'ed AND queriess

От
Michael McCarthy
Дата:
Using PQexec from libpq in postgresql 6.5.3, I submit a query of the
following form, which has 13 OR'ed AND expressions:
   DECLARE my_cursor CURSOR FOR SELECT col1 FROM testCNF where   ( ( col1=0  and col2=1  ) OR ( col1=1  and col2=2  )
OR    ( col1=2  and col2=3  ) OR ( col1=3  and col2=4  ) OR     ( col1=4  and col2=5  ) OR ( col1=5  and col2=6  ) OR
 ( col1=6  and col2=7  ) OR ( col1=7  and col2=8  ) OR     ( col1=8  and col2=9  ) OR ( col1=9  and col2=10 ) OR     (
col1=10and col2=11 ) OR ( col1=11 and col2=12 ) OR     ( col1=12 and col2=13 ) )
 

After 265 seconds, my test client gets back a NULL response from PQexec.
During the 265 seconds, the backend server machine (Sparc Ultra 2) slows
to a crawl. In the postmaster log, I see the following:
   FATAL 1:  Memory exhausted in AllocSetAlloc()

A similar query with 12 OR'ed AND expresions is successful, but only after
123 seconds. Queries with fewer OR'ed AND expresions get faster; 6 OR'ed
ANDS takes around one second. With other query types, I encounter no such
limitation; AND'ed ORs, all ANDs and all ORs can be as large a query as
the internal buffer can support (around 16k), with no problem.

I have traced the backend server in a debugger; a stack trace is attached
below. What I see in examining the code is a recursive normalization of
the query; postgres is running out of memory trying to convert the OR'ed
ANDs query to conjunctive normal form (CNF).

So, some questions for all you postgres gurus:

1. Has anyone else encountered this problem?

2. Has anyone patched the query optimizer to get around this problem, and  if so, where can I find the patch?

3. If I am truly the first to encounter this (which I doubt), how would I  go about altering the query optimizer to not
failon this valid query? 
 

Thanks,

//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\
Michael McCarthy                             TCSI Corporation
michael@tcsi.com                             1080 Marina Village Parkway
(510) 749-8739                               Alameda, CA 94501

(/opt/packages/SUNWspro/bin/dbx) where        [1] AllocSetReset(0x56d488, 0x40aaf0, 0x1, 0x9c, 0x0, 0x0), at 0x285f80
[2]EndPortalAllocMode(0x502a70, 0x6fd0495c, 0x0, 0x0, 0x0, 0x0), at 0x28a398 [3] PortalResetHeapMemory(0x502a40, 0x0,
0x0,0x0, 0x0, 0x0), at 0x289fcc [4] AtAbort_Memory(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xb68bc [5] AbortTransaction(0x0,
0x0,0x0, 0x0, 0x0, 0x0), at 0xb6d8c [6] AbortOutOfAnyTransaction(0x0, 0x6feaa484, 0x4d88c8, 0x5015a7, 0x6fea2ca4, 0x0),
at0xb740c [7] remove_all_temp_relations(0x0, 0x0, 0x0, 0x0, 0x6fea2ca4, 0x0), at 0x27c5dc [8] shmem_exit(0x0, 0x409c98,
0x0,0x0, 0x0, 0x0), at 0x1df424 [9] proc_exit(0x0, 0x6feaa484, 0x2e, 0x7efefeff, 0x6fea2ca4, 0x27d294), at 0x1df214
[10]elog(0x1, 0x411a00, 0x0, 0x0, 0x0, 0x0), at 0x27d58c [11] AllocSetAlloc(0x56d488, 0xc, 0x0, 0x0, 0x0, 0x0), at
0x2866cc[12] PortalHeapMemoryAlloc(0x502a70, 0xc, 0x0, 0x0, 0x0, 0x0), at 0x287f34 [13] MemoryContextAlloc(0x502a70,
0xc,0x0, 0x0, 0x0, 0x0), at 0x286fdc [14] newNode(0xc, 0x1f5, 0x0, 0x0, 0x0, 0x0), at 0x168928 [15] lcons(0x1f66e5c8,
0x0,0x0, 0x0, 0x0, 0x0), at 0x168b6c [16] copyObject(0x1f662a90, 0x66, 0x0, 0x0, 0x0, 0x0), at 0x16ecf4 [17]
_copyExpr(0x1f662998,0x1f5, 0x0, 0x0, 0x0, 0x0), at 0x16b3e8 [18] copyObject(0x1f662998, 0x0, 0x0, 0x0, 0x0, 0x0), at
0x16e7c4[19] copyObject(0x1f662980, 0x66, 0x0, 0x0, 0x0, 0x0), at 0x16ecdc [20] _copyExpr(0x1f661e50, 0x14, 0x0, 0x0,
0x0,0x0), at 0x16b3e8 [21] copyObject(0x1f661e50, 0x14, 0x0, 0x0, 0x0, 0x0), at 0x16e7c4 [22] copyObject(0x1f6634b8,
0x66,0x0, 0x0, 0x0, 0x0), at 0x16ec94 [23] _copyExpr(0x1f661e28, 0x1f5, 0x0, 0x0, 0x0, 0x0), at 0x16b3e8 [24]
copyObject(0x1f661e28,0x0, 0x0, 0x0, 0x0, 0x0), at 0x16e7c4 [25] copyObject(0x1f661e10, 0x66, 0x0, 0x0, 0x0, 0x0), at
0x16ecdc[26] _copyExpr(0x1f660740, 0x14, 0x0, 0x0, 0x0, 0x0), at 0x16b3e8 [27] copyObject(0x1f660740, 0x14, 0x0, 0x0,
0x0,0x0), at 0x16e7c4 [28] copyObject(0x1f6634e8, 0x66, 0x0, 0x0, 0x0, 0x0), at 0x16ec94 [29] _copyExpr(0x1f660718,
0x1f5,0x0, 0x0, 0x0, 0x0), at 0x16b3e8 [30] copyObject(0x1f660718, 0x0, 0x0, 0x0, 0x0, 0x0), at 0x16e7c4 [31]
copyObject(0x1f660700,0x66, 0x0, 0x0, 0x0, 0x0), at 0x16ecdc [32] _copyExpr(0x1f65d8f0, 0xc, 0x0, 0x0, 0x0, 0x0), at
0x16b3e8[33] copyObject(0x1f65d8f0, 0xc, 0x0, 0x0, 0x0, 0x0), at 0x16e7c4 [34] copyObject(0x1f663518, 0xc, 0x0, 0x0,
0x0,0x0), at 0x16ec94 [35] pull_ors(0x1f669210, 0x1f5, 0x0, 0x0, 0x0, 0x0), at 0x19d584 [36] pull_ors(0x1f669228,
0x1f669210,0x0, 0x0, 0x0, 0x0), at 0x19d5f8 [37] distribute_args(0x69ace48, 0x1f663530, 0x0, 0x0, 0x0, 0x0), at
0x19dd64[38] or_normalize(0x1f6691f8, 0x1f65d870, 0x0, 0x0, 0x0, 0x0), at 0x19dc34 [39] distribute_args(0x69ace48,
0x1f651e90,0x0, 0x0, 0x0, 0x0), at 0x19dd74 [40] or_normalize(0x1f65d858, 0x1f6464d0, 0x0, 0x0, 0x0, 0x0), at 0x19dc34
[41]distribute_args(0x69ace48, 0x1f62f0f0, 0x0, 0x0, 0x0, 0x0), at 0x19dd74 [42] or_normalize(0x1f6464b8, 0x1f617d30,
0x0,0x0, 0x0, 0x0), at 0x19dc34 [43] distribute_args(0x69ace48, 0x1f49fb58, 0x0, 0x0, 0x0, 0x0), at 0x19dd74 [44]
or_normalize(0x1f4ce320,0x1f471398, 0x0, 0x0, 0x0, 0x0), at 0x19dc34 [45] distribute_args(0x69ace48, 0x1f123fd8, 0x0,
0x0,0x0, 0x0), at 0x19dd74 [46] or_normalize(0x1f180fa0, 0x1f0c7018, 0x0, 0x0, 0x0, 0x0), at 0x19dc34 [47]
distribute_args(0x69ace48,0x1e972820, 0x0, 0x0, 0x0, 0x0), at 0x19dd74 [48] or_normalize(0x1ea2c7e8, 0x1e8b8860, 0x0,
0x0,0x0, 0x0), at 0x19dc34 [49] distribute_args(0x69ace48, 0x1e744880, 0x0, 0x0, 0x0, 0x0), at 0x19dd74 [50]
or_normalize(0x1e8b8848,0x1e5d08c0, 0x0, 0x0, 0x0, 0x0), at 0x19dc34 [51] distribute_args(0x69ace48, 0x1c2ae828, 0x0,
0x0,0x0, 0x0), at 0x19dd74 [52] or_normalize(0x1c596810, 0x1bfc6868, 0x0, 0x0, 0x0, 0x0), at 0x19dc34 [53]
distribute_args(0x69ace48,0xf7d2280, 0x0, 0x0, 0x0, 0x0), at 0x19dd74 [54] or_normalize(0x1bfc6850, 0x1bfc6808, 0x0,
0x0,0x0, 0x0), at 0x19dc34 [55] distribute_args(0x1333e468, 0x69ace00, 0x0, 0x0, 0x0, 0x0), at 0x19dd74 [56]
or_normalize(0x1333e490,0x6982cd0, 0x0, 0x0, 0x0, 0x0), at 0x19dc34 [57] or_normalize(0xbc660a0, 0x6982cd0, 0x0, 0x0,
0x0,0x0), at 0x19dc64 [58] or_normalize(0x8ae9e58, 0x6982cd0, 0x0, 0x0, 0x0, 0x0), at 0x19dc64 [59]
or_normalize(0x76afca8,0x6982cd0, 0x0, 0x0, 0x0, 0x0), at 0x19dc64 [60] or_normalize(0x6e9ab00, 0x6982cd0, 0x0, 0x0,
0x0,0x0), at 0x19dc64 [61] or_normalize(0x6b77150, 0x6982cd0, 0x0, 0x0, 0x0, 0x0), at 0x19dc64 [62]
or_normalize(0x6a4a3d0,0x6982cd0, 0x0, 0x0, 0x0, 0x0), at 0x19dc64 [63] or_normalize(0x69df050, 0x6982cd0, 0x0, 0x0,
0x0,0x0), at 0x19dc64 [64] or_normalize(0x69bb5d0, 0x6982cd0, 0x0, 0x0, 0x0, 0x0), at 0x19dc64 [65]
or_normalize(0x69b0ad0,0x6982cd0, 0x69ada90, 0xefffa9e4, 0x3, 0x0), at 0x19dc64 [66] or_normalize(0x69adf90, 0x6982cd0,
0x0,0xefff660b, 0x5734c8, 0x0), at 0x19dc64 [67] or_normalize(0x6982bb0, 0x69adae8, 0x0, 0x5a5ea8, 0x0, 0x0), at
0x19dc64[68] normalize(0x6982a80, 0x0, 0x0, 0x0, 0x0, 0x5a5850), at 0x19cbdc [69] cnfify(0x5a5f78, 0x1, 0x1, 0x0, 0x0,
0x5a5965),at 0x19c368 [70] query_planner(0x5a5a18, 0x1, 0x627458, 0x5a5f78, 0x54a55a, 0xf), at 0x194dfc [71]
union_planner(0x5a5a18,0x0, 0xefff6098, 0xefff66d0, 0x70b, 0x70a), at 0x1958e0 [72] planner(0x5a5a18, 0x627410, 0x0,
0x5015ac,0x51, 0xefffa9db), at 0x195380 [73] pg_parse_and_plan(0xefffaad3, 0x0, 0x0, 0xefffa9e4, 0x3, 0x0), at 0x1fc368
[74]pg_exec_query_dest(0xefffaad3, 0x3, 0x0, 0x5015ac, 0x6fea2ca4, 0x0), at 0x1fc628 [75] pg_exec_query(0xefffaad3,
0x40bdf8,0x20202900, 0x7efefeff, 0x81010100, 0xff00), at 0x1fc568 [76] PostgresMain(0x4, 0xeffff0a4, 0x5, 0xeffff824,
0x0,0xefffeba0), at 0x1febf8
 
=>[77] DoBackend(port = 0x4f5800), line 1628 in "postmaster.c" [78] BackendStartup(port = 0x4f5800), line 1373 in
"postmaster.c"[79] ServerLoop(), line 823 in "postmaster.c" [80] PostmasterMain(argc = 5, argv = 0xeffff824), line 616
in"postmaster.c" [81] main(argc = 5, argv = 0xeffff824), line 93 in "main.c"
 



Re: [SQL] problem with OR'ed AND queriess

От
David C Hartwig Jr
Дата:
Michael McCarthy wrote:

> Using PQexec from libpq in postgresql 6.5.3, I submit a query of the
> following form, which has 13 OR'ed AND expressions:
>
>     DECLARE my_cursor CURSOR FOR SELECT col1 FROM testCNF where
>     ( ( col1=0  and col2=1  ) OR ( col1=1  and col2=2  ) OR
>       ( col1=2  and col2=3  ) OR ( col1=3  and col2=4  ) OR
>       ( col1=4  and col2=5  ) OR ( col1=5  and col2=6  ) OR
>       ( col1=6  and col2=7  ) OR ( col1=7  and col2=8  ) OR
>       ( col1=8  and col2=9  ) OR ( col1=9  and col2=10 ) OR
>       ( col1=10 and col2=11 ) OR ( col1=11 and col2=12 ) OR
>       ( col1=12 and col2=13 ) )
>
> After 265 seconds, my test client gets back a NULL response from PQexec.
> During the 265 seconds, the backend server machine (Sparc Ultra 2) slows
> to a crawl. In the postmaster log, I see the following:
>
>     FATAL 1:  Memory exhausted in AllocSetAlloc()
>
> A similar query with 12 OR'ed AND expresions is successful, but only after
> 123 seconds. Queries with fewer OR'ed AND expresions get faster; 6 OR'ed
> ANDS takes around one second. With other query types, I encounter no such
> limitation; AND'ed ORs, all ANDs and all ORs can be as large a query as
> the internal buffer can support (around 16k), with no problem.
>
> I have traced the backend server in a debugger; a stack trace is attached
> below. What I see in examining the code is a recursive normalization of
> the query; postgres is running out of memory trying to convert the OR'ed
> ANDs query to conjunctive normal form (CNF).
>
> So, some questions for all you postgres gurus:
>
> 1. Has anyone else encountered this problem?

Yes, its on the todo list.   I do not know if it is being actively persued.

>
>
> 2. Has anyone patched the query optimizer to get around this problem, and
>    if so, where can I find the patch?
>

There is a work around feature that works for queries that gererate a parse tree similar to yours.
ODBC tools generate these kinds of queries all the time.    Keyset queries.   To acivate the
feature:    SET ksqo TO 'on';    It rewrites the parse tree into a series of UNIONs.   Not optimal but
it works for rectangular where clauses.   (n ANDs) x  (m ORs)

--   Here is an example using a 3 part key    select ... from foo where        (v1 = "?" AND v2 = "?" AND v3 ="?") OR
    -- line 1        (v1 = "?" AND v2 = "?" AND v3 ="?") OR        -- line 2                ...        (v1 = "?" AND v2
="?" AND v3 ="?") OR        --  line 9        (v1 = "?" AND v2 = "?" AND v3 ="?");              --  line 10    --  The
questionmarks are replaced with the constant key values
 


>
> 3. If I am truly the first to encounter this (which I doubt), how would I
>    go about altering the query optimizer to not fail on this valid query?
>
> Thanks,



Re: [SQL] problem with OR'ed AND queriess

От
Michael McCarthy
Дата:
Hello David:

Thanks for the quick reply; it means a lot to me.

I tried "SET ksqo TO 'on'", and it works fine for my problem query; it
allowed 570 key sets to be resolved in 2 seconds, which is a big step
forward.

I am writing a layer which makes postgres one RDBMS (of several) which can
support a generic object persistence framework. The use of "SET ksqo TO
'on'" is a postgres implementation detail which applications using the
framework will not be aware of. I am faced with the problem of whether or
not to use "SET ksqo TO 'on'" for all queries, or find some criteria for
turning it on and off.

A quick examination of backend/optimizer/prep/prepkeyset.c shows me that
there is some qualification of a query for keyset optimization; is there
any reason why I should not always set ksqo on?

Thanks for your help!

//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\//\\
Michael McCarthy                             TCSI Corporation
michael@tcsi.com                             1080 Marina Village Parkway
(510) 749-8739                               Alameda, CA 94501

On Tue, 21 Dec 1999, David C Hartwig Jr wrote:

> There is a work around feature that works for queries that gererate a parse tree similar to yours.
> ODBC tools generate these kinds of queries all the time.    Keyset queries.   To acivate the
> feature:    SET ksqo TO 'on';    It rewrites the parse tree into a series of UNIONs.   Not optimal but
> it works for rectangular where clauses.   (n ANDs) x  (m ORs)
> 
> --   Here is an example using a 3 part key
>      select ... from foo where
>          (v1 = "?" AND v2 = "?" AND v3 ="?") OR        -- line 1
>          (v1 = "?" AND v2 = "?" AND v3 ="?") OR        -- line 2
>                  ...
>          (v1 = "?" AND v2 = "?" AND v3 ="?") OR        --  line 9
>          (v1 = "?" AND v2 = "?" AND v3 ="?");              --  line 10
>      --  The question marks are replaced with the constant key values



Re: [SQL] problem with OR'ed AND queriess

От
Tom Lane
Дата:
Michael McCarthy <michael@tcsi.com> writes:
> I have traced the backend server in a debugger; a stack trace is attached
> below. What I see in examining the code is a recursive normalization of
> the query; postgres is running out of memory trying to convert the OR'ed
> ANDs query to conjunctive normal form (CNF).

Yes, the CNF-conversion code is pretty awful.  I have reduced its
badness somewhat in current sources, but really we need to rethink the
whole problem --- for queries that are naturally expressed as
OR-of-ANDs, forcing the condition into AND-of-ORs is a loser.

You could try folding prepqual.c from current sources (get the nightly
snapshot or use CVS) into 6.5.  I'm not sure it would merge cleanly,
but it'd beat reinventing the fixes I made.
        regards, tom lane


Re: [SQL] problem with OR'ed AND queriess

От
David C Hartwig Jr
Дата:
Michael McCarthy wrote:

> Hello David:
>
> Thanks for the quick reply; it means a lot to me.
>
> I tried "SET ksqo TO 'on'", and it works fine for my problem query; it
> allowed 570 key sets to be resolved in 2 seconds, which is a big step
> forward.
>
> I am writing a layer which makes postgres one RDBMS (of several) which can
> support a generic object persistence framework. The use of "SET ksqo TO
> 'on'" is a postgres implementation detail which applications using the
> framework will not be aware of. I am faced with the problem of whether or
> not to use "SET ksqo TO 'on'" for all queries, or find some criteria for
> turning it on and off.
>
> A quick examination of backend/optimizer/prep/prepkeyset.c shows me that
> there is some qualification of a query for keyset optimization; is there
> any reason why I should not always set ksqo on?

The qualification is strict so that it can be left on without breaking other queries.   Most MS ODBC users
use it all the time; I have yet to hear of it causing any problems.

>
>
>
> > There is a work around feature that works for queries that gererate a parse tree similar to yours.
> > ODBC tools generate these kinds of queries all the time.    Keyset queries.   To acivate the
> > feature:    SET ksqo TO 'on';    It rewrites the parse tree into a series of UNIONs.   Not optimal but
> > it works for rectangular where clauses.   (n ANDs) x  (m ORs)
> >
> > --   Here is an example using a 3 part key
> >      select ... from foo where
> >          (v1 = "?" AND v2 = "?" AND v3 ="?") OR        -- line 1
> >          (v1 = "?" AND v2 = "?" AND v3 ="?") OR        -- line 2
> >                  ...
> >          (v1 = "?" AND v2 = "?" AND v3 ="?") OR        --  line 9
> >          (v1 = "?" AND v2 = "?" AND v3 ="?");              --  line 10
> >      --  The question marks are replaced with the constant key values
>
> ************



Re: [SQL] problem with OR'ed AND queriess

От
Bruce Momjian
Дата:
> Using PQexec from libpq in postgresql 6.5.3, I submit a query of the
> following form, which has 13 OR'ed AND expressions:
> 
>     DECLARE my_cursor CURSOR FOR SELECT col1 FROM testCNF where
>     ( ( col1=0  and col2=1  ) OR ( col1=1  and col2=2  ) OR
>       ( col1=2  and col2=3  ) OR ( col1=3  and col2=4  ) OR
>       ( col1=4  and col2=5  ) OR ( col1=5  and col2=6  ) OR
>       ( col1=6  and col2=7  ) OR ( col1=7  and col2=8  ) OR
>       ( col1=8  and col2=9  ) OR ( col1=9  and col2=10 ) OR
>       ( col1=10 and col2=11 ) OR ( col1=11 and col2=12 ) OR
>       ( col1=12 and col2=13 ) )

I have this in the cvs log file, and it will appear in 7.0.  You can
check the current cvs snapshot to see how much better it is:

---------------------------------------------------------------------------

revision 1.18
date: 1999/09/07 03:47:06;  author: tgl;  state: Exp;  lines: +236 -249
Performance improvements in cnfify(): get rid of exponential
space consumption in pull_args, and avoid doing the full CNF transform on
operands of operator clauses, where it's really not particularly helpful.
This answers the TODO item about large numbers of OR clauses, at least
partially.  I was able to do a ten-thousand-OR-clause query with about
20Mb memory consumption ... it took an obscenely long time, but it worked...


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] problem with OR'ed AND queriess

От
Tom Lane
Дата:
Michael McCarthy <michael@tcsi.com> writes:
> A quick examination of backend/optimizer/prep/prepkeyset.c shows me that
> there is some qualification of a query for keyset optimization; is there
> any reason why I should not always set ksqo on?

If it seems to win more often than not for your query mix, go for it.

ksqo is a hack that I'm hoping to eliminate in the next release or
three.  But right now, it's a mighty useful hack if you do a lot of
OR-of-AND queries...
        regards, tom lane