Обсуждение: Avoiding cycles in a directed graph

Поиск
Список
Период
Сортировка

Avoiding cycles in a directed graph

От
Tony Cebzanov
Дата:
I'm using the following table to represent an acyclic directed graph:
   CREATE TABLE edge(       id SERIAL PRIMARY KEY,       child INTEGER NOT NULL,       parent INTEGER,       UNIQUE
(child,parent)   );
 

I see there is an example in the online docs for detecting cycles in
recursive queries, and I've adapted the example query to the table above:
   WITH RECURSIVE search_graph(parent, child, id, depth, path, cycle)   AS (           SELECT e.parent, e.child, e.id,
1,            ARRAY[e.id],             false           FROM edge e         UNION ALL           SELECT e.parent,
e.child,e.id, sg.depth + 1,             path || e.id,             e.id = ANY(path)           FROM edge e, search_graph
sg          WHERE e.parent = sg.child AND NOT cycle   )   SELECT * FROM search_graph;
 

That's great to avoid looping forever on queries, but what about
preventing anyone from inserting edges that would create cycles in the
first place?  I reckon I'll need a trigger of some sort, but nothing
I've tried has enabled me to do the cycle check as part of the trigger
to avoid inserting an edge that would create a cycle.  I tried having
the non-recursive SELECT use NEW.parent, NEW.child, etc. but that isn't
working.  Is there any way to do this, or do I have to just insert the
edge, check if it cycles, and delete it if it does?

Thanks.
-Tony


Re: Avoiding cycles in a directed graph

От
Richard Huxton
Дата:
On 16/03/10 19:45, Tony Cebzanov wrote:
> I'm using the following table to represent an acyclic directed graph:
[snip]
> I see there is an example in the online docs for detecting cycles in
> recursive queries, and I've adapted the example query to the table above:
[snip]
> That's great to avoid looping forever on queries, but what about
> preventing anyone from inserting edges that would create cycles in the
> first place?  I reckon I'll need a trigger of some sort, but nothing
> I've tried has enabled me to do the cycle check as part of the trigger
> to avoid inserting an edge that would create a cycle.

You've got the right idea.

If you know that the table doesn't contain any cycles at the moment, 
then all the trigger has to do is:
1. Check "parent" <> "child"
2. Build the set of parents of "parent".
3. Check "child" isn't in that set.
4. If there is a cycle, raise an error (which will abort the insert or 
update)

If you have a "before" trigger, then step 4 could return NULL rather 
than raise an error. That would just skip the insert.

Also, step #1 could be done with a CHECK constraint which would be 
checked before your trigger gets run.

--   Richard Huxton  Archonet Ltd


Re: Avoiding cycles in a directed graph

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
> On 16/03/10 19:45, Tony Cebzanov wrote:
>> That's great to avoid looping forever on queries, but what about
>> preventing anyone from inserting edges that would create cycles in the
>> first place?  I reckon I'll need a trigger of some sort, but nothing
>> I've tried has enabled me to do the cycle check as part of the trigger
>> to avoid inserting an edge that would create a cycle.

> You've got the right idea.

> If you know that the table doesn't contain any cycles at the moment, 
> then all the trigger has to do is:
> 1. Check "parent" <> "child"
> 2. Build the set of parents of "parent".
> 3. Check "child" isn't in that set.
> 4. If there is a cycle, raise an error (which will abort the insert or 
> update)

The problem with this approach is that it's not safe against concurrent
insertions.  If concurrent transactions T1 and T2 each insert a row
that, taken together (and in combination with existing entries), create
a cycle, then neither of their triggers will see the other's row so
they'll both think they can commit.

The same kind of problem exists for unique and foreign key constraints,
both of which use low-level locking mechanisms to catch such cases.
There's no way that I can see to express the "no cycle" constraint as a
uniqueness constraint unfortunately.  You could solve limited forms of
it using the exclusion-constraint mechanism that's new in 9.0, but
AFAICS not the general case :-(
        regards, tom lane


Re: Avoiding cycles in a directed graph

От
Tony Cebzanov
Дата:
On 3/16/10 4:34 PM, Tom Lane wrote:
> The same kind of problem exists for unique and foreign key constraints,
> both of which use low-level locking mechanisms to catch such cases.
> There's no way that I can see to express the "no cycle" constraint as a
> uniqueness constraint unfortunately.  You could solve limited forms of
> it using the exclusion-constraint mechanism that's new in 9.0, but
> AFAICS not the general case :-(

Are you saying there's no way to avoid cycles at all, or just no way to
do it using uniqueness constraints?

I'm okay with running the big, fat WITH RECURSIVE query in my insert
trigger if I have to -- it won't be great for performance, but I don't
expect this to be a frequent operation, so I'll accept the performance
hit if it works.

Unfortunately I can't even get that working.  Here's the (not at all
functional) trigger I've got right now, which only detects the cycle
*after* it's been inserted, which is of no help at all.  Any way I can
modify this to do the right thing?


CREATE OR REPLACE FUNCTION check_edge_cycle() RETURNS trigger AS $$
DECLARE   cycles INTEGER;
BEGIN   WITH RECURSIVE search_graph(parent, child, id, depth, path, cycle) AS (        SELECT NEW.parent, NEW.child,
NEW.id,1,         ARRAY[NEW.id], false         UNION ALL           SELECT g.parent, g.child, g.id, sg.depth + 1,
path || g.id,         g.id = ANY(path)           FROM catalog_edge g, search_graph sg           WHERE g.parent =
sg.childAND NOT cycle   )   SELECT INTO cycles COUNT(*) FROM search_graph WHERE cycle=true;   RAISE NOTICE 'cycles: %',
cycles;  IF cycles > 0 THEN      RAISE EXCEPTION 'cycle';   END IF;   RETURN NEW;
 
END
$$ LANGUAGE plpgsql;


Re: Avoiding cycles in a directed graph

От
Tom Lane
Дата:
Tony Cebzanov <tonyceb@andrew.cmu.edu> writes:
> I'm okay with running the big, fat WITH RECURSIVE query in my insert
> trigger if I have to -- it won't be great for performance, but I don't
> expect this to be a frequent operation, so I'll accept the performance
> hit if it works.

> Unfortunately I can't even get that working.  Here's the (not at all
> functional) trigger I've got right now, which only detects the cycle
> *after* it's been inserted, which is of no help at all.  Any way I can
> modify this to do the right thing?

Run it in an AFTER trigger?

If you don't expect this to be common, maybe you could fix the
concurrency issue by taking a table-wide lock that locks out
other writers.
        regards, tom lane


Re: Avoiding cycles in a directed graph

От
Richard Huxton
Дата:
On 16/03/10 21:09, Tom Lane wrote:
> Tony Cebzanov<tonyceb@andrew.cmu.edu>  writes:
>> I'm okay with running the big, fat WITH RECURSIVE query in my insert
>> trigger if I have to -- it won't be great for performance, but I don't
>> expect this to be a frequent operation, so I'll accept the performance
>> hit if it works.
>
>> Unfortunately I can't even get that working.  Here's the (not at all
>> functional) trigger I've got right now, which only detects the cycle
>> *after* it's been inserted, which is of no help at all.  Any way I can
>> modify this to do the right thing?
>
> Run it in an AFTER trigger?
>
> If you don't expect this to be common, maybe you could fix the
> concurrency issue by taking a table-wide lock that locks out
> other writers.

Surely SELECT FOR UPDATE on the parents would be sufficient? If there's 
no overlap between (currently non-cyclic) graphs being altered then 
there can't be any conflict.

--   Richard Huxton  Archonet Ltd


Re: Avoiding cycles in a directed graph

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
> On 16/03/10 21:09, Tom Lane wrote:
>> If you don't expect this to be common, maybe you could fix the
>> concurrency issue by taking a table-wide lock that locks out
>> other writers.

> Surely SELECT FOR UPDATE on the parents would be sufficient? If there's 
> no overlap between (currently non-cyclic) graphs being altered then 
> there can't be any conflict.

Um, what if the cycle is being formed from whole cloth?  For instance
T1 inserts an edge A->B while T2 is inserting B->A.  There are no
pre-existing rows to lock, but there will still be a cycle after they
both commit.

Also it seems pretty deadlock-prone if there are multiple existing rows
to try to lock.  Perhaps you could work around the risk by locking those
rows one at a time in an application-defined ordering ... but I'm afraid
the performance would be poor, unless the connected graphs are always
very small.

On the whole I think Tony's better off with a KISS approach, ie just
lock the whole table against other writers.
        regards, tom lane


Re: Avoiding cycles in a directed graph

От
Tony Cebzanov
Дата:
On 3/16/10 6:22 PM, Tom Lane wrote:
> Richard Huxton <dev@archonet.com> writes:
> 
> 
> Um, what if the cycle is being formed from whole cloth?  For instance
> T1 inserts an edge A->B while T2 is inserting B->A.  There are no
> pre-existing rows to lock, but there will still be a cycle after they
> both commit.

For what it's worth, when I did give the "FOR UPDATE" strategy a try,
but with the recursive query rather than the simpler approach Richard
suggested, I got the following error (v8.4.2):

ERROR:  FOR UPDATE/SHARE in a recursive query is not implemented

I'm sticking with the recursive query, because it seems to me the only
way to ensure there are no cycles is to check the whole graph for
cycles, and the only way I know how to do that is the recursive
approach.  Since "FOR UPDATE" isn't implemented for recursive queries,
I'll just lock the entire table for now.

Thanks, all!
-Tony