Обсуждение: Complex Recursive Query

Поиск
Список
Период
Сортировка

Complex Recursive Query

От
Jim Garrison
Дата:
I have a collection of relationship rows of the form

Table: graph
    key1 varchar
    key2 varchar

A row of the form ('a','b') indicates that 'a' and 'b' are related.
The table contains many relationships between keys, forming several
disjoint sets. All relationships are bi-directional, and both
directions are present.  I.e. the table contains a set of disjoint
graphs specified as node pairs.

For example the set of values

    key1    key2
    -----   -----
      a       x
      a       y
      b       w
      c       t
      x       a
      y       a
      y       z
      z       y
      t       c
      w       b
      w       d
      d       w

defines three disjoint groups of connected keys:

      a x y z
      c t
      b w d

What I would like to achieve is a single SQL query that returns

      group key
      ----- ---
        1    a
        1    x
        1    y
        1    z
        2    c
        2    t
        3    b
        3    w
        3    d

I don't care about preserving the node-to-node relationships, only
the group membership for each node.

I've been playing with "WITH RECURSIVE" CTEs but haven't had any
success.  I'm not really sure how to express what I want in SQL, and
it's not completely clear to me that recursive CTEs will help here.
Also I'm not sure how to generate the sequence numbers for the groups


Re: Complex Recursive Query

От
matt@byrney.com
Дата:
I wouldn't do this with recursion; plain old iteration is your friend
(yes, WITH RECURSIVE is actually iterative, not recursive...)

The algorithm goes like this:

1. Extend your graph relation to be symmetric and transitive.
2. Assign a integer group id to each node.
3. Repeatedly join the node list to the (extended) relation, updating each
node's group id to be the minimum of the group ids of every node it
touches.
4. Stop when the group ids stop changing.

Here's some example code, using your data:

DROP SCHEMA IF EXISTS test CASCADE;
CREATE SCHEMA test;
SET SEARCH_PATH TO test;

CREATE TABLE graph(key1 TEXT, key2 TEXT);

INSERT INTO graph VALUES
('a', 'x'),
('a', 'y'),
('b', 'w'),
('c', 't'),
('x', 'a'),
('y', 'a'),
('y', 'z'),
('z', 'y'),
('t', 'c'),
('w', 'b'),
('w', 'd'),
('d', 'w');

DO
$$
DECLARE
  prev INT = 0;
  curr INT;
BEGIN
  CREATE TABLE rel AS
  SELECT key1, key2 FROM graph
  UNION
  SELECT key2, key1 FROM graph
  UNION
  SELECT key1, key1 FROM graph
  UNION
  SELECT key2, key2 FROM graph;

  CREATE TABLE group_ids AS
  SELECT
    key,
    ROW_NUMBER() OVER (ORDER BY key) AS group_id
  FROM
    (
      SELECT key1 AS key FROM graph
      UNION
      SELECT key2 FROM graph
    ) _;

  SELECT SUM(group_id) INTO curr FROM group_ids;
  WHILE prev != curr LOOP
    prev = curr;
    DROP TABLE IF EXISTS min_ids;
    CREATE TABLE min_ids AS
    SELECT
      a.key,
      MIN(c.group_id) AS group_id
    FROM
      group_ids a
    INNER JOIN
      rel b
    ON
      a.key = b.key1
    INNER JOIN
      group_ids c
    ON
      b.key2 = c.key
    GROUP BY
      a.key;

    UPDATE
      group_ids
    SET
      group_id = min_ids.group_id
    FROM
      min_ids
    WHERE
      group_ids.key = min_ids.key;

    SELECT SUM(group_id) INTO curr FROM group_ids;
  END LOOP;

  DROP TABLE IF EXISTS rel;
  DROP TABLE IF EXISTS min_ids;
END
$$;

SELECT * FROM group_ids;


Hope it helps,

Matthew




> I have a collection of relationship rows of the form
>
> Table: graph
>     key1 varchar
>     key2 varchar
>
> A row of the form ('a','b') indicates that 'a' and 'b' are related.
> The table contains many relationships between keys, forming several
> disjoint sets. All relationships are bi-directional, and both
> directions are present.  I.e. the table contains a set of disjoint
> graphs specified as node pairs.
>
> For example the set of values
>
>     key1    key2
>     -----   -----
>       a       x
>       a       y
>       b       w
>       c       t
>       x       a
>       y       a
>       y       z
>       z       y
>       t       c
>       w       b
>       w       d
>       d       w
>
> defines three disjoint groups of connected keys:
>
>       a x y z
>       c t
>       b w d
>
> What I would like to achieve is a single SQL query that returns
>
>       group key
>       ----- ---
>         1    a
>         1    x
>         1    y
>         1    z
>         2    c
>         2    t
>         3    b
>         3    w
>         3    d
>
> I don't care about preserving the node-to-node relationships, only
> the group membership for each node.
>
> I've been playing with "WITH RECURSIVE" CTEs but haven't had any
> success.  I'm not really sure how to express what I want in SQL, and
> it's not completely clear to me that recursive CTEs will help here.
> Also I'm not sure how to generate the sequence numbers for the groups
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



Re: Complex Recursive Query

От
John W Higgins
Дата:

Sorry for the 3rd party site - just easier to get the layout correct.....

A CTE and dense_rank is all it takes. I am always amazed at what one can now pack into such small amounts of code.


On Wed, Jul 23, 2014 at 4:00 PM, Jim Garrison <jim.garrison@nwea.org> wrote:
I have a collection of relationship rows of the form

Table: graph
    key1 varchar
    key2 varchar

A row of the form ('a','b') indicates that 'a' and 'b' are related.
The table contains many relationships between keys, forming several
disjoint sets. All relationships are bi-directional, and both
directions are present.  I.e. the table contains a set of disjoint
graphs specified as node pairs.

For example the set of values

    key1    key2
    -----   -----
      a       x
      a       y
      b       w
      c       t
      x       a
      y       a
      y       z
      z       y
      t       c
      w       b
      w       d
      d       w

defines three disjoint groups of connected keys:

      a x y z
      c t
      b w d

What I would like to achieve is a single SQL query that returns

      group key
      ----- ---
        1    a
        1    x
        1    y
        1    z
        2    c
        2    t
        3    b
        3    w
        3    d

I don't care about preserving the node-to-node relationships, only
the group membership for each node.

I've been playing with "WITH RECURSIVE" CTEs but haven't had any
success.  I'm not really sure how to express what I want in SQL, and
it's not completely clear to me that recursive CTEs will help here.
Also I'm not sure how to generate the sequence numbers for the groups


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general