Обсуждение: slow view

Поиск

Список

Период

Сортировка

slow view

От

"Stuart"

Дата:

12 октября 2006 г., 02:13:34

I am having a problem with the performance of a 
view that will be a critical part of a database system 
I am working on, and would appreciate some advice.
Apologies for the length of this posting!

I have a parent table P and to child tables A and B,:

CREATE TABLE p (   id INT NOT NULL PRIMARY KEY);
CREATE TABLE a (   id INT NOT NULL PRIMARY KEY,   p INT REFERENCES p(id));   CREATE INDEX ON a(p);
CREATE TABLE b (   id INT NOT NULL PRIMARY KEY,   p INT REFERENCES p(id));   CREATE INDEX ON b(p);

Each "p" row has between 1 and 5 (or so) 
child rows in the "a" table, and between 0 and 4
rows in the "b" table.

Now for most p's the a's and the b's are independent,
and all combinations of a's and b's for that p are ok.  
But for a small percentage of p's (<5%) there are some 
combinations of a's and b's are distinguished (I will call
them "invalid").  
So I created a table to record these "invalid" combinations:

CREATE TABLE x (   a INT NOT NULL REFERENCES a(id),   b INT NOT NULL REFERENCES b(id),   PRIMARY KEY (a,b));   CREATE
INDEXON x(a);   CREATE INDEX ON x(b);
 

Here is some sample data with a single p:

# Create one parent item...
INSERT INTO p VALUES(1)

# Create 4 a-items for that parent...
INSERT INTO a VALUES(1,1)
INSERT INTO a VALUES(2,1)
INSERT INTO a VALUES(3,1)
INSERT INTO a VALUES(4,1)

# Create 3 b-items for that parent...
INSERT INTO b VALUES(11,1)
INSERT INTO b VALUES(12,1)
INSERT INTO b VALUES(13,1)

So for parent p=1, there are 12 combinations
of a and b items (each of the 4 a items can be
paired with any of the 3 b items).
Now, make some combinations of a items 
and b items "invalid"...
 # For a=2, make b=13 invalid, i.e only b=11 and b=12 are valid.
INSERT INTO x VALUES(2,13)  # For a=3, only b=11 is valid.
INSERT INTO x VALUES(3,12)  
INSERT INTO x VALUES(3,13) # For a=4, no b's are valid.
INSERT INTO x VALUES(4,11) 
INSERT INTO x VALUES(4,12)
INSERT INTO p VALUES(4,13)

Now I need a view that will display, for each p, its
a's and for each a, only the valid b's, that is, the 
combinations of a and b that are *not* in table x.
OK, no problem...

(#1)
SELECT p.id AS pid, a.id AS aid, b.id AS bid   FROM p       JOIN a ON a.p=p.id       LEFT JOIN b ON b.p=a.p       LEFT
JOINx ON x.a=a.id AND x.b=b.id   WHERE x.a IS NULL       AND p.id=1;
 

Here is the result on the data given above:pid | aid | bid
-----+-----+-----  1 |   1 |  11  1 |   1 |  12  1 |   1 |  13  1 |   2 |  11  1 |   2 |  13  1 |   3 |  11  1 |   3 |
12

Ok, but I want all a's in the output, even when they
have no valid b's.  aid=4 is not in the output.
So I did what I thought was the obvious answer, a 
left join between a, and the above query...

(#2)
SELECT p.id AS pid, a.id AS aid, sub.bid AS bid   FROM p       JOIN a ON a.p=p.id       LEFT JOIN (           SELECT
a.idAS aid, b.id as bid               FROM a                    LEFT JOIN b ON b.p=a.p                   LEFT JOIN x ON
x.a=a.idAND x.b=b.id               WHERE x.a IS NULL           ) AS sub ON sub.aid=a.id   WHERE p.id=1;
 

Results:pid | aid | bid
-----+-----+-----  1 |   1 |  11  1 |   1 |  12  1 |   1 |  13  1 |   2 |  11  1 |   2 |  13  1 |   3 |  11  1 |   3 |
12 1 |   4 |
 

Exactly what I want.

The problem is that when there are ~100K parent entries
the above query (#2) takes ~10 seconds to run but the first 
query (#1) runs in a few tens of milliseconds.
Is there any way I can get postgresql to better optimize
query #2, or rewrite it to that is is more "postgresql friendly"?


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it helps, here is a python script I used to generate enough
pseudo-data to show the time difference...

#!/usr/bin/python
import psycopg2, random

def main():       cn = psycopg2.connect (database="test",                          user="postgres",password="")       c
=cn.cursor()       pkp = 1;  pka = 1;  pkb = 1;       while pkp < 30000:           c.execute ("INSERT INTO p
VALUES(%s)",(pkp,))           na = random.randint(1,5)           for a in range(na):               c.execute ("INSERT
INTOa VALUES(%s,%s)", (pka+a,pkp))           nb = random.randint(0,4)           for b in range(nb):
c.execute("INSERT INTO b VALUES(%s,%s)", (pkb+b,pkp))           if na*nb > 1 and random.randint (0,99) < 10:
  zlst = [(a,b) for a in range(pka,pka+na)                                for b in range(pkb,pkb+nb)]               for
zin random.sample (zlst, random.randint(1,na*nb-1)):                   c.execute ("INSERT INTO x VALUES(%s,%s)", z)
     pkp += 1; pka += na;  pkb += nb       cn.commit()
 

if __name__ == '__main__': main ()

Re: slow view

От

"Stuart McGraw"

Дата:

12 октября 2006 г., 04:16:09

On 2006/10/11 Stuart wrote:
> [..]

Apologies for following up my own post, but after
struggling with that query for over a day, I figured 
out the answer within thirty minutes of posting.
(Sigh)

My slow query was:
> SELECT p.id AS pid, a.id AS aid, sub.bid AS bid
>     FROM p
>         JOIN a ON a.p=p.id
>         LEFT JOIN (
>             SELECT a.id AS aid, b.id as bid
>                 FROM a 
>                     LEFT JOIN b ON b.p=a.p
>                     LEFT JOIN x ON x.a=a.id AND x.b=b.id
>                 WHERE x.a IS NULL
>             ) AS sub ON sub.aid=a.id
>     WHERE p.id=1;

Including the parent table p in the inner select was 
all it took.

SELECT p.id AS pid, a.id AS aid, sub.bid AS bid   FROM p       JOIN a ON a.p=p.id       LEFT JOIN (           SELECT
p.idAS pid, a.id AS aid, b.id as bid               FROM p                  JOIN a ON a.p=p.id                  LEFT
JOINb ON b.p=a.p                  LEFT JOIN x ON x.a=a.id AND x.b=b.id              WHERE x.a IS NULL       ) AS sub ON
sub.aid=a.idAND sub.pid=p.id   WHERE p.id=1;

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: slow view

slow view

Re: slow view