Обсуждение: SELECT all the rows where id is children of other node.

Поиск
Список
Период
Сортировка

SELECT all the rows where id is children of other node.

От
pabloa98
Дата:
Hello,

I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:

CREATE TABLE relations (
    pid INTEGER NOT NULL,
    cid INTEGER NOT NULL,
)

This table has parent-child relations references between nodes by id. Like:

pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222

I would like to get a list of all the nodes being children (direct or indirect) of any other node.

Example. The children of:

1) n3: []  (n3 has not children)
2) n22: [n221, n222]  (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222]  (n1 has 6 children including indirect children).

this pseudo SQL: 

SELECT *
FROM relations
WHERE has_parent(myId)

It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?

It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.


Pablo

Re: SELECT all the rows where id is children of other node.

От
Rob Sargent
Дата:



On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:

Hello,

I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:

CREATE TABLE relations (
    pid INTEGER NOT NULL,
    cid INTEGER NOT NULL,
)

This table has parent-child relations references between nodes by id. Like:

pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222

I would like to get a list of all the nodes being children (direct or indirect) of any other node.

Example. The children of:

1) n3: []  (n3 has not children)
2) n22: [n221, n222]  (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222]  (n1 has 6 children including indirect children).

this pseudo SQL: 

SELECT *
FROM relations
WHERE has_parent(myId)

It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?

It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.


Pablo

Are you asking for just the function (always with a seed Id) or the complete transformation in a single select? Would you like the descendants on one line (eg n22,[n221,n222])?
I wonder if you might add  n3 -> null to explicitly terminate the hierarchy ?

Re: SELECT all the rows where id is children of other node.

От
Rob Sargent
Дата:


On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:

Hello,

I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:

CREATE TABLE relations (
    pid INTEGER NOT NULL,
    cid INTEGER NOT NULL,
)

This table has parent-child relations references between nodes by id. Like:

pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222

I would like to get a list of all the nodes being children (direct or indirect) of any other node.

Example. The children of:

1) n3: []  (n3 has not children)
2) n22: [n221, n222]  (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222]  (n1 has 6 children including indirect children).

this pseudo SQL: 

SELECT *
FROM relations
WHERE has_parent(myId)

It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?

It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.


Pablo

Back at my desk now, to show the possibilities.

with recursive descendants(parent, child) as 
(select p.p, p.c from kids p where not exists (select 1 from kids c where c.c = p.p) group by p.p, p.c
 union all
 select k.* from kids k, descendants d where k.p = d.child)
 select * from descendants;

 parent | child 
--------+-------
      1 |     3
      1 |     2
      1 |     4
      2 |    21
      2 |    22
      2 |    23
     22 |   221
     22 |   222
(8 rows)

with recursive descendants(parent, child) as 
(select p.p, p.c from kids p where not exists (select 1 from kids c where c.c = p.p) group by p.p, p.c
 union all
 select k.* from kids k, descendants d where k.p = d.child)
 select d.parent, array_agg(d.child) from descendants d group by d.parent;

 parent | array_agg  
--------+------------
      1 | {3,2,4}
     22 | {221,222}
      2 | {21,22,23}
(3 rows)


Re: SELECT all the rows where id is children of other node.

От
Rob Sargent
Дата:


On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:

Hello,

I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:

CREATE TABLE relations (
    pid INTEGER NOT NULL,
    cid INTEGER NOT NULL,
)

This table has parent-child relations references between nodes by id. Like:

pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222

I would like to get a list of all the nodes being children (direct or indirect) of any other node.

Example. The children of:

1) n3: []  (n3 has not children)
2) n22: [n221, n222]  (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222]  (n1 has 6 children including indirect children).

this pseudo SQL: 

SELECT *
FROM relations
WHERE has_parent(myId)

It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?

It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.


Pablo

ooops. didn’t get all generations. sorry

Re: SELECT all the rows where id is children of other node.

От
Rob Sargent
Дата:


On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:

Hello,

I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:

CREATE TABLE relations (
    pid INTEGER NOT NULL,
    cid INTEGER NOT NULL,
)

This table has parent-child relations references between nodes by id. Like:

pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222

I would like to get a list of all the nodes being children (direct or indirect) of any other node.

Example. The children of:

1) n3: []  (n3 has not children)
2) n22: [n221, n222]  (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222]  (n1 has 6 children including indirect children).

this pseudo SQL: 

SELECT *
FROM relations
WHERE has_parent(myId)

It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?

It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.


Pablo
I could not find away to handle the branches but this is more complete.
with recursive descendants (last, children) as 
(select c.c, array[null::int] from kids c where not exists (select 1 from kids p where c.c = p.p)
 union all
 select k.p, array[k.c] || l.children from kids k, descendants l where k.c = l.last)
 select last, children from descendants where children[1] is not null order by last
 last |    children     
------+-----------------
    1 | {2,22,222,NULL}
    1 | {4,NULL}
    1 | {2,21,NULL}
    1 | {2,23,NULL}
    1 | {2,22,221,NULL}
    1 | {3,NULL}
    2 | {22,221,NULL}
    2 | {22,222,NULL}
    2 | {21,NULL}
    2 | {23,NULL}
   22 | {221,NULL}
   22 | {222,NULL}
(12 rows)

I’ll throw in the towel now

Re: SELECT all the rows where id is children of other node.

От
pabloa98
Дата:
Thank you for your responses Rob. Appreciated. The problem with recursive queries is that they are executed several times and it has and impact in performance.

I need a subset of those rows and I want them in one pass.

I discovered that ltree extension could be useful. I will play with it today. I am sure there's a way to find al the nodes in O(n) time with n = size of the resulset ...

On Tue, Aug 20, 2019, 6:10 AM Rob Sargent <robjsargent@gmail.com> wrote:


On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:

Hello,

I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:

CREATE TABLE relations (
    pid INTEGER NOT NULL,
    cid INTEGER NOT NULL,
)

This table has parent-child relations references between nodes by id. Like:

pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222

I would like to get a list of all the nodes being children (direct or indirect) of any other node.

Example. The children of:

1) n3: []  (n3 has not children)
2) n22: [n221, n222]  (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222]  (n1 has 6 children including indirect children).

this pseudo SQL: 

SELECT *
FROM relations
WHERE has_parent(myId)

It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?

It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.


Pablo
I could not find away to handle the branches but this is more complete.
with recursive descendants (last, children) as 
(select c.c, array[null::int] from kids c where not exists (select 1 from kids p where c.c = p.p)
 union all
 select k.p, array[k.c] || l.children from kids k, descendants l where k.c = l.last)
 select last, children from descendants where children[1] is not null order by last
 last |    children     
------+-----------------
    1 | {2,22,222,NULL}
    1 | {4,NULL}
    1 | {2,21,NULL}
    1 | {2,23,NULL}
    1 | {2,22,221,NULL}
    1 | {3,NULL}
    2 | {22,221,NULL}
    2 | {22,222,NULL}
    2 | {21,NULL}
    2 | {23,NULL}
   22 | {221,NULL}
   22 | {222,NULL}
(12 rows)

I’ll throw in the towel now

Re: SELECT all the rows where id is children of other node.

От
Francisco Olarte
Дата:
Pablo:

On Tue, Aug 20, 2019 at 6:49 PM pabloa98 <pabloa98@gmail.com> wrote:
> Thank you for your responses Rob. Appreciated. The problem with recursive queries is that they are executed several
timesand it has and impact in performance.
 
> I need a subset of those rows and I want them in one pass.
> I discovered that ltree extension could be useful. I will play with it today. I am sure there's a way to find al the
nodesin O(n) time with n = size of the resulset ...
 

Unless you have some extra conditions in a table ( like
"autoincremented immutable primary key and parents are always created
before childs" ) I think your problem of "get all the descendant ( i
do not like to call them children ) nodes of a given id" can not be
solved in one pass.

I mean, if you are getting descendants of the node N1, you need to
read the last node, NL, of the table to know if it is a child of N1.
But then you have to read the table again to find childs of NL.

Of course, if you have something like "hierarchical ids" you can
traverse ordering by it and know NL MUST be childless, and build the
tree rooted on node N1 as you go, but without some of this conditions
I do not think it can be done in an "ideal" db ( which lets you scan
in any order you can define from just a row without cost ) in one scan
( storing and prunning the whole tree as you go is cheating ).

Also, if your node ids come from a serial and are immutables, or you
take a little care when mutating them, you can do it traversing by id,
but you need a full scan, a recursive query with several index scans
may easily be faster in wide trees.


Francisco Olarte.



Re: SELECT all the rows where id is children of other node.

От
Rob Sargent
Дата:


On Aug 21, 2019, at 3:35 AM, Francisco Olarte <folarte@peoplecall.com> wrote:

Pablo:

On Tue, Aug 20, 2019 at 6:49 PM pabloa98 <pabloa98@gmail.com> wrote:
Thank you for your responses Rob. Appreciated. The problem with recursive queries is that they are executed several times and it has and impact in performance.
I need a subset of those rows and I want them in one pass.
I discovered that ltree extension could be useful. I will play with it today. I am sure there's a way to find al the nodes in O(n) time with n = size of the resulset ...

Unless you have some extra conditions in a table ( like
"autoincremented immutable primary key and parents are always created
before childs" ) I think your problem of "get all the descendant ( i
do not like to call them children ) nodes of a given id" can not be
solved in one pass.

I mean, if you are getting descendants of the node N1, you need to
read the last node, NL, of the table to know if it is a child of N1.
But then you have to read the table again to find childs of NL.

Of course, if you have something like "hierarchical ids" you can
traverse ordering by it and know NL MUST be childless, and build the
tree rooted on node N1 as you go, but without some of this conditions
I do not think it can be done in an "ideal" db ( which lets you scan
in any order you can define from just a row without cost ) in one scan
( storing and prunning the whole tree as you go is cheating ).

Also, if your node ids come from a serial and are immutables, or you
take a little care when mutating them, you can do it traversing by id,
but you need a full scan, a recursive query with several index scans
may easily be faster in wide trees.


Francisco Olarte.
If you accept Francisco’s thesis then you may be interested in this
with recursive descendants (last, children) as 
(select c.c, array[null::int] from kids c where not exists (select 1 from kids p where c.c = p.p)
 union all
 select k.p, array[k.c] || l.children from kids k, descendants l where k.c = l.last)
 select a.last, array_agg(distinct(a.kids))as clan from (select last, unnest(array_remove(children, null)) as kids from descendants where children[1] is not null) as a group by last order by last
 last |           clan           
------+--------------------------
    1 | {2,3,4,21,22,23,221,222}
    2 | {21,22,23,221,222}
   22 | {221,222}
(3 rows)
No comment on performance other than to say that if you are interested in the result for a given seed parent then performance would likely correlate with the average depth of your lineages.

I believe the ascending order of the members of each clan is completely fortuitous unless it’s a consequence of distinct?