slow view

Поиск
Список
Период
Сортировка
От Stuart
Тема slow view
Дата
Msg-id egk560$p53$1@sea.gmane.org
обсуждение исходный текст
Ответы Re: slow view  ("Stuart McGraw" <smcg2297@frii.com>)
Список pgsql-sql
I am having a problem with the performance of a 
view that will be a critical part of a database system 
I am working on, and would appreciate some advice.
Apologies for the length of this posting!

I have a parent table P and to child tables A and B,:

CREATE TABLE p (   id INT NOT NULL PRIMARY KEY);
CREATE TABLE a (   id INT NOT NULL PRIMARY KEY,   p INT REFERENCES p(id));   CREATE INDEX ON a(p);
CREATE TABLE b (   id INT NOT NULL PRIMARY KEY,   p INT REFERENCES p(id));   CREATE INDEX ON b(p);

Each "p" row has between 1 and 5 (or so) 
child rows in the "a" table, and between 0 and 4
rows in the "b" table.

Now for most p's the a's and the b's are independent,
and all combinations of a's and b's for that p are ok.  
But for a small percentage of p's (<5%) there are some 
combinations of a's and b's are distinguished (I will call
them "invalid").  
So I created a table to record these "invalid" combinations:

CREATE TABLE x (   a INT NOT NULL REFERENCES a(id),   b INT NOT NULL REFERENCES b(id),   PRIMARY KEY (a,b));   CREATE
INDEXON x(a);   CREATE INDEX ON x(b);
 

Here is some sample data with a single p:

# Create one parent item...
INSERT INTO p VALUES(1)

# Create 4 a-items for that parent...
INSERT INTO a VALUES(1,1)
INSERT INTO a VALUES(2,1)
INSERT INTO a VALUES(3,1)
INSERT INTO a VALUES(4,1)

# Create 3 b-items for that parent...
INSERT INTO b VALUES(11,1)
INSERT INTO b VALUES(12,1)
INSERT INTO b VALUES(13,1)

So for parent p=1, there are 12 combinations
of a and b items (each of the 4 a items can be
paired with any of the 3 b items).
Now, make some combinations of a items 
and b items "invalid"...
 # For a=2, make b=13 invalid, i.e only b=11 and b=12 are valid.
INSERT INTO x VALUES(2,13)  # For a=3, only b=11 is valid.
INSERT INTO x VALUES(3,12)  
INSERT INTO x VALUES(3,13) # For a=4, no b's are valid.
INSERT INTO x VALUES(4,11) 
INSERT INTO x VALUES(4,12)
INSERT INTO p VALUES(4,13)

Now I need a view that will display, for each p, its
a's and for each a, only the valid b's, that is, the 
combinations of a and b that are *not* in table x.
OK, no problem...

(#1)
SELECT p.id AS pid, a.id AS aid, b.id AS bid   FROM p       JOIN a ON a.p=p.id       LEFT JOIN b ON b.p=a.p       LEFT
JOINx ON x.a=a.id AND x.b=b.id   WHERE x.a IS NULL       AND p.id=1;
 

Here is the result on the data given above:pid | aid | bid
-----+-----+-----  1 |   1 |  11  1 |   1 |  12  1 |   1 |  13  1 |   2 |  11  1 |   2 |  13  1 |   3 |  11  1 |   3 |
12

Ok, but I want all a's in the output, even when they
have no valid b's.  aid=4 is not in the output.
So I did what I thought was the obvious answer, a 
left join between a, and the above query...

(#2)
SELECT p.id AS pid, a.id AS aid, sub.bid AS bid   FROM p       JOIN a ON a.p=p.id       LEFT JOIN (           SELECT
a.idAS aid, b.id as bid               FROM a                    LEFT JOIN b ON b.p=a.p                   LEFT JOIN x ON
x.a=a.idAND x.b=b.id               WHERE x.a IS NULL           ) AS sub ON sub.aid=a.id   WHERE p.id=1;
 

Results:pid | aid | bid
-----+-----+-----  1 |   1 |  11  1 |   1 |  12  1 |   1 |  13  1 |   2 |  11  1 |   2 |  13  1 |   3 |  11  1 |   3 |
12 1 |   4 |
 

Exactly what I want.

The problem is that when there are ~100K parent entries
the above query (#2) takes ~10 seconds to run but the first 
query (#1) runs in a few tens of milliseconds.
Is there any way I can get postgresql to better optimize
query #2, or rewrite it to that is is more "postgresql friendly"?


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If it helps, here is a python script I used to generate enough
pseudo-data to show the time difference...

#!/usr/bin/python
import psycopg2, random

def main():       cn = psycopg2.connect (database="test",                          user="postgres",password="")       c
=cn.cursor()       pkp = 1;  pka = 1;  pkb = 1;       while pkp < 30000:           c.execute ("INSERT INTO p
VALUES(%s)",(pkp,))           na = random.randint(1,5)           for a in range(na):               c.execute ("INSERT
INTOa VALUES(%s,%s)", (pka+a,pkp))           nb = random.randint(0,4)           for b in range(nb):
c.execute("INSERT INTO b VALUES(%s,%s)", (pkb+b,pkp))           if na*nb > 1 and random.randint (0,99) < 10:
  zlst = [(a,b) for a in range(pka,pka+na)                                for b in range(pkb,pkb+nb)]               for
zin random.sample (zlst, random.randint(1,na*nb-1)):                   c.execute ("INSERT INTO x VALUES(%s,%s)", z)
     pkp += 1; pka += na;  pkb += nb       cn.commit()
 

if __name__ == '__main__': main ()




В списке pgsql-sql по дате отправления:

Предыдущее
От: Daniel Drotos
Дата:
Сообщение: Re: deleting rows in specific order
Следующее
От: "Stuart McGraw"
Дата:
Сообщение: Re: slow view