Обсуждение: SELECT all the rows where id is children of other node.
Hello,
I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:
CREATE TABLE relations (
pid INTEGER NOT NULL,
cid INTEGER NOT NULL,
)This table has parent-child relations references between nodes by id. Like:
pid -> cid
n1 -> n2
n1 -> n3
n1 -> n4
n2 -> n21
n2 -> n22
n2 -> n23
n22 -> n221
n22 -> n222
I would like to get a list of all the nodes being children (direct or indirect) of any other node.
Example. The children of:
2) n22: [n221, n222] (n22 has 2 children: n221 and n222)
3) n1: [n2, n21, n22, n23, n221, n222] (n1 has 6 children including indirect children).
this pseudo SQL:
SELECT *
FROM relations
WHERE has_parent(myId)
It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?
It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.
Pablo
Hello,I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:CREATE TABLE relations (pid INTEGER NOT NULL,cid INTEGER NOT NULL,)This table has parent-child relations references between nodes by id. Like:pid -> cidn1 -> n2n1 -> n3n1 -> n4n2 -> n21n2 -> n22n2 -> n23n22 -> n221n22 -> n222I would like to get a list of all the nodes being children (direct or indirect) of any other node.Example. The children of:1) n3: [] (n3 has not children)
2) n22: [n221, n222] (n22 has 2 children: n221 and n222)3) n1: [n2, n21, n22, n23, n221, n222] (n1 has 6 children including indirect children).this pseudo SQL:SELECT *FROM relationsWHERE has_parent(myId)It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.Pablo
Are you asking for just the function (always with a seed Id) or the complete transformation in a single select? Would you like the descendants on one line (eg n22,[n221,n222])?
I wonder if you might add n3 -> null to explicitly terminate the hierarchy ?
On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:Hello,I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:CREATE TABLE relations (pid INTEGER NOT NULL,cid INTEGER NOT NULL,)This table has parent-child relations references between nodes by id. Like:pid -> cidn1 -> n2n1 -> n3n1 -> n4n2 -> n21n2 -> n22n2 -> n23n22 -> n221n22 -> n222I would like to get a list of all the nodes being children (direct or indirect) of any other node.Example. The children of:1) n3: [] (n3 has not children)
2) n22: [n221, n222] (n22 has 2 children: n221 and n222)3) n1: [n2, n21, n22, n23, n221, n222] (n1 has 6 children including indirect children).this pseudo SQL:SELECT *FROM relationsWHERE has_parent(myId)It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.Pablo
with recursive descendants(parent, child) as
(select p.p, p.c from kids p where not exists (select 1 from kids c where c.c = p.p) group by p.p, p.c
union all
select k.* from kids k, descendants d where k.p = d.child)
select * from descendants;
parent | child
--------+-------
1 | 3
1 | 2
1 | 4
2 | 21
2 | 22
2 | 23
22 | 221
22 | 222
(8 rows)
with recursive descendants(parent, child) as
(select p.p, p.c from kids p where not exists (select 1 from kids c where c.c = p.p) group by p.p, p.c
union all
select k.* from kids k, descendants d where k.p = d.child)
select d.parent, array_agg(d.child) from descendants d group by d.parent;
parent | array_agg
--------+------------
1 | {3,2,4}
22 | {221,222}
2 | {21,22,23}
(3 rows)
On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:Hello,I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:CREATE TABLE relations (pid INTEGER NOT NULL,cid INTEGER NOT NULL,)This table has parent-child relations references between nodes by id. Like:pid -> cidn1 -> n2n1 -> n3n1 -> n4n2 -> n21n2 -> n22n2 -> n23n22 -> n221n22 -> n222I would like to get a list of all the nodes being children (direct or indirect) of any other node.Example. The children of:1) n3: [] (n3 has not children)
2) n22: [n221, n222] (n22 has 2 children: n221 and n222)3) n1: [n2, n21, n22, n23, n221, n222] (n1 has 6 children including indirect children).this pseudo SQL:SELECT *FROM relationsWHERE has_parent(myId)It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.Pablo
ooops. didn’t get all generations. sorry
On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:Hello,I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:CREATE TABLE relations (pid INTEGER NOT NULL,cid INTEGER NOT NULL,)This table has parent-child relations references between nodes by id. Like:pid -> cidn1 -> n2n1 -> n3n1 -> n4n2 -> n21n2 -> n22n2 -> n23n22 -> n221n22 -> n222I would like to get a list of all the nodes being children (direct or indirect) of any other node.Example. The children of:1) n3: [] (n3 has not children)
2) n22: [n221, n222] (n22 has 2 children: n221 and n222)3) n1: [n2, n21, n22, n23, n221, n222] (n1 has 6 children including indirect children).this pseudo SQL:SELECT *FROM relationsWHERE has_parent(myId)It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.Pablo
with recursive descendants (last, children) as
(select c.c, array[null::int] from kids c where not exists (select 1 from kids p where c.c = p.p)
union all
select k.p, array[k.c] || l.children from kids k, descendants l where k.c = l.last)
select last, children from descendants where children[1] is not null order by last
last | children
------+-----------------
1 | {2,22,222,NULL}
1 | {4,NULL}
1 | {2,21,NULL}
1 | {2,23,NULL}
1 | {2,22,221,NULL}
1 | {3,NULL}
2 | {22,221,NULL}
2 | {22,222,NULL}
2 | {21,NULL}
2 | {23,NULL}
22 | {221,NULL}
22 | {222,NULL}
(12 rows)
I’ll throw in the towel now
Thank you for your responses Rob. Appreciated. The problem with recursive queries is that they are executed several times and it has and impact in performance.
I need a subset of those rows and I want them in one pass.
I discovered that ltree extension could be useful. I will play with it today. I am sure there's a way to find al the nodes in O(n) time with n = size of the resulset ...
On Tue, Aug 20, 2019, 6:10 AM Rob Sargent <robjsargent@gmail.com> wrote:
I could not find away to handle the branches but this is more complete.On Aug 19, 2019, at 7:42 PM, pabloa98 <pabloa98@gmail.com> wrote:Hello,I have a huge table (100 million rows) of relations between nodes by id in a Postgresql 11 server. Like this:CREATE TABLE relations (pid INTEGER NOT NULL,cid INTEGER NOT NULL,)This table has parent-child relations references between nodes by id. Like:pid -> cidn1 -> n2n1 -> n3n1 -> n4n2 -> n21n2 -> n22n2 -> n23n22 -> n221n22 -> n222I would like to get a list of all the nodes being children (direct or indirect) of any other node.Example. The children of:1) n3: [] (n3 has not children)
2) n22: [n221, n222] (n22 has 2 children: n221 and n222)3) n1: [n2, n21, n22, n23, n221, n222] (n1 has 6 children including indirect children).this pseudo SQL:SELECT *FROM relationsWHERE has_parent(myId)It can be solved with a recursive function or stored procedure. But that requires several passes. Is it possible to solve it in one pass? Perhaps using some low-level function or join or some index expression or auxiliary columns?It is OK to create an index or similar using recursive expressions. However, the SELECT expressions should be solved in one pass because of speed.Pablowith recursive descendants (last, children) as(select c.c, array[null::int] from kids c where not exists (select 1 from kids p where c.c = p.p)union allselect k.p, array[k.c] || l.children from kids k, descendants l where k.c = l.last)select last, children from descendants where children[1] is not null order by lastlast | children------+-----------------1 | {2,22,222,NULL}1 | {4,NULL}1 | {2,21,NULL}1 | {2,23,NULL}1 | {2,22,221,NULL}1 | {3,NULL}2 | {22,221,NULL}2 | {22,222,NULL}2 | {21,NULL}2 | {23,NULL}22 | {221,NULL}22 | {222,NULL}(12 rows)I’ll throw in the towel now
Pablo: On Tue, Aug 20, 2019 at 6:49 PM pabloa98 <pabloa98@gmail.com> wrote: > Thank you for your responses Rob. Appreciated. The problem with recursive queries is that they are executed several timesand it has and impact in performance. > I need a subset of those rows and I want them in one pass. > I discovered that ltree extension could be useful. I will play with it today. I am sure there's a way to find al the nodesin O(n) time with n = size of the resulset ... Unless you have some extra conditions in a table ( like "autoincremented immutable primary key and parents are always created before childs" ) I think your problem of "get all the descendant ( i do not like to call them children ) nodes of a given id" can not be solved in one pass. I mean, if you are getting descendants of the node N1, you need to read the last node, NL, of the table to know if it is a child of N1. But then you have to read the table again to find childs of NL. Of course, if you have something like "hierarchical ids" you can traverse ordering by it and know NL MUST be childless, and build the tree rooted on node N1 as you go, but without some of this conditions I do not think it can be done in an "ideal" db ( which lets you scan in any order you can define from just a row without cost ) in one scan ( storing and prunning the whole tree as you go is cheating ). Also, if your node ids come from a serial and are immutables, or you take a little care when mutating them, you can do it traversing by id, but you need a full scan, a recursive query with several index scans may easily be faster in wide trees. Francisco Olarte.
On Aug 21, 2019, at 3:35 AM, Francisco Olarte <folarte@peoplecall.com> wrote:Pablo:
On Tue, Aug 20, 2019 at 6:49 PM pabloa98 <pabloa98@gmail.com> wrote:Thank you for your responses Rob. Appreciated. The problem with recursive queries is that they are executed several times and it has and impact in performance.
I need a subset of those rows and I want them in one pass.
I discovered that ltree extension could be useful. I will play with it today. I am sure there's a way to find al the nodes in O(n) time with n = size of the resulset ...
Unless you have some extra conditions in a table ( like
"autoincremented immutable primary key and parents are always created
before childs" ) I think your problem of "get all the descendant ( i
do not like to call them children ) nodes of a given id" can not be
solved in one pass.
I mean, if you are getting descendants of the node N1, you need to
read the last node, NL, of the table to know if it is a child of N1.
But then you have to read the table again to find childs of NL.
Of course, if you have something like "hierarchical ids" you can
traverse ordering by it and know NL MUST be childless, and build the
tree rooted on node N1 as you go, but without some of this conditions
I do not think it can be done in an "ideal" db ( which lets you scan
in any order you can define from just a row without cost ) in one scan
( storing and prunning the whole tree as you go is cheating ).
Also, if your node ids come from a serial and are immutables, or you
take a little care when mutating them, you can do it traversing by id,
but you need a full scan, a recursive query with several index scans
may easily be faster in wide trees.
Francisco Olarte.
No comment on performance other than to say that if you are interested in the result for a given seed parent then performance would likely correlate with the average depth of your lineages.with recursive descendants (last, children) as(select c.c, array[null::int] from kids c where not exists (select 1 from kids p where c.c = p.p)union allselect k.p, array[k.c] || l.children from kids k, descendants l where k.c = l.last)select a.last, array_agg(distinct(a.kids))as clan from (select last, unnest(array_remove(children, null)) as kids from descendants where children[1] is not null) as a group by last order by lastlast | clan------+--------------------------1 | {2,3,4,21,22,23,221,222}2 | {21,22,23,221,222}22 | {221,222}(3 rows)
I believe the ascending order of the members of each clan is completely fortuitous unless it’s a consequence of distinct?