Обсуждение: Avoiding cycles in a directed graph
I'm using the following table to represent an acyclic directed graph: CREATE TABLE edge( id SERIAL PRIMARY KEY, child INTEGER NOT NULL, parent INTEGER, UNIQUE (child,parent) ); I see there is an example in the online docs for detecting cycles in recursive queries, and I've adapted the example query to the table above: WITH RECURSIVE search_graph(parent, child, id, depth, path, cycle) AS ( SELECT e.parent, e.child, e.id, 1, ARRAY[e.id], false FROM edge e UNION ALL SELECT e.parent, e.child,e.id, sg.depth + 1, path || e.id, e.id = ANY(path) FROM edge e, search_graph sg WHERE e.parent = sg.child AND NOT cycle ) SELECT * FROM search_graph; That's great to avoid looping forever on queries, but what about preventing anyone from inserting edges that would create cycles in the first place? I reckon I'll need a trigger of some sort, but nothing I've tried has enabled me to do the cycle check as part of the trigger to avoid inserting an edge that would create a cycle. I tried having the non-recursive SELECT use NEW.parent, NEW.child, etc. but that isn't working. Is there any way to do this, or do I have to just insert the edge, check if it cycles, and delete it if it does? Thanks. -Tony
On 16/03/10 19:45, Tony Cebzanov wrote: > I'm using the following table to represent an acyclic directed graph: [snip] > I see there is an example in the online docs for detecting cycles in > recursive queries, and I've adapted the example query to the table above: [snip] > That's great to avoid looping forever on queries, but what about > preventing anyone from inserting edges that would create cycles in the > first place? I reckon I'll need a trigger of some sort, but nothing > I've tried has enabled me to do the cycle check as part of the trigger > to avoid inserting an edge that would create a cycle. You've got the right idea. If you know that the table doesn't contain any cycles at the moment, then all the trigger has to do is: 1. Check "parent" <> "child" 2. Build the set of parents of "parent". 3. Check "child" isn't in that set. 4. If there is a cycle, raise an error (which will abort the insert or update) If you have a "before" trigger, then step 4 could return NULL rather than raise an error. That would just skip the insert. Also, step #1 could be done with a CHECK constraint which would be checked before your trigger gets run. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > On 16/03/10 19:45, Tony Cebzanov wrote: >> That's great to avoid looping forever on queries, but what about >> preventing anyone from inserting edges that would create cycles in the >> first place? I reckon I'll need a trigger of some sort, but nothing >> I've tried has enabled me to do the cycle check as part of the trigger >> to avoid inserting an edge that would create a cycle. > You've got the right idea. > If you know that the table doesn't contain any cycles at the moment, > then all the trigger has to do is: > 1. Check "parent" <> "child" > 2. Build the set of parents of "parent". > 3. Check "child" isn't in that set. > 4. If there is a cycle, raise an error (which will abort the insert or > update) The problem with this approach is that it's not safe against concurrent insertions. If concurrent transactions T1 and T2 each insert a row that, taken together (and in combination with existing entries), create a cycle, then neither of their triggers will see the other's row so they'll both think they can commit. The same kind of problem exists for unique and foreign key constraints, both of which use low-level locking mechanisms to catch such cases. There's no way that I can see to express the "no cycle" constraint as a uniqueness constraint unfortunately. You could solve limited forms of it using the exclusion-constraint mechanism that's new in 9.0, but AFAICS not the general case :-( regards, tom lane
On 3/16/10 4:34 PM, Tom Lane wrote: > The same kind of problem exists for unique and foreign key constraints, > both of which use low-level locking mechanisms to catch such cases. > There's no way that I can see to express the "no cycle" constraint as a > uniqueness constraint unfortunately. You could solve limited forms of > it using the exclusion-constraint mechanism that's new in 9.0, but > AFAICS not the general case :-( Are you saying there's no way to avoid cycles at all, or just no way to do it using uniqueness constraints? I'm okay with running the big, fat WITH RECURSIVE query in my insert trigger if I have to -- it won't be great for performance, but I don't expect this to be a frequent operation, so I'll accept the performance hit if it works. Unfortunately I can't even get that working. Here's the (not at all functional) trigger I've got right now, which only detects the cycle *after* it's been inserted, which is of no help at all. Any way I can modify this to do the right thing? CREATE OR REPLACE FUNCTION check_edge_cycle() RETURNS trigger AS $$ DECLARE cycles INTEGER; BEGIN WITH RECURSIVE search_graph(parent, child, id, depth, path, cycle) AS ( SELECT NEW.parent, NEW.child, NEW.id,1, ARRAY[NEW.id], false UNION ALL SELECT g.parent, g.child, g.id, sg.depth + 1, path || g.id, g.id = ANY(path) FROM catalog_edge g, search_graph sg WHERE g.parent = sg.childAND NOT cycle ) SELECT INTO cycles COUNT(*) FROM search_graph WHERE cycle=true; RAISE NOTICE 'cycles: %', cycles; IF cycles > 0 THEN RAISE EXCEPTION 'cycle'; END IF; RETURN NEW; END $$ LANGUAGE plpgsql;
Tony Cebzanov <tonyceb@andrew.cmu.edu> writes: > I'm okay with running the big, fat WITH RECURSIVE query in my insert > trigger if I have to -- it won't be great for performance, but I don't > expect this to be a frequent operation, so I'll accept the performance > hit if it works. > Unfortunately I can't even get that working. Here's the (not at all > functional) trigger I've got right now, which only detects the cycle > *after* it's been inserted, which is of no help at all. Any way I can > modify this to do the right thing? Run it in an AFTER trigger? If you don't expect this to be common, maybe you could fix the concurrency issue by taking a table-wide lock that locks out other writers. regards, tom lane
On 16/03/10 21:09, Tom Lane wrote: > Tony Cebzanov<tonyceb@andrew.cmu.edu> writes: >> I'm okay with running the big, fat WITH RECURSIVE query in my insert >> trigger if I have to -- it won't be great for performance, but I don't >> expect this to be a frequent operation, so I'll accept the performance >> hit if it works. > >> Unfortunately I can't even get that working. Here's the (not at all >> functional) trigger I've got right now, which only detects the cycle >> *after* it's been inserted, which is of no help at all. Any way I can >> modify this to do the right thing? > > Run it in an AFTER trigger? > > If you don't expect this to be common, maybe you could fix the > concurrency issue by taking a table-wide lock that locks out > other writers. Surely SELECT FOR UPDATE on the parents would be sufficient? If there's no overlap between (currently non-cyclic) graphs being altered then there can't be any conflict. -- Richard Huxton Archonet Ltd
Richard Huxton <dev@archonet.com> writes: > On 16/03/10 21:09, Tom Lane wrote: >> If you don't expect this to be common, maybe you could fix the >> concurrency issue by taking a table-wide lock that locks out >> other writers. > Surely SELECT FOR UPDATE on the parents would be sufficient? If there's > no overlap between (currently non-cyclic) graphs being altered then > there can't be any conflict. Um, what if the cycle is being formed from whole cloth? For instance T1 inserts an edge A->B while T2 is inserting B->A. There are no pre-existing rows to lock, but there will still be a cycle after they both commit. Also it seems pretty deadlock-prone if there are multiple existing rows to try to lock. Perhaps you could work around the risk by locking those rows one at a time in an application-defined ordering ... but I'm afraid the performance would be poor, unless the connected graphs are always very small. On the whole I think Tony's better off with a KISS approach, ie just lock the whole table against other writers. regards, tom lane
On 3/16/10 6:22 PM, Tom Lane wrote: > Richard Huxton <dev@archonet.com> writes: > > > Um, what if the cycle is being formed from whole cloth? For instance > T1 inserts an edge A->B while T2 is inserting B->A. There are no > pre-existing rows to lock, but there will still be a cycle after they > both commit. For what it's worth, when I did give the "FOR UPDATE" strategy a try, but with the recursive query rather than the simpler approach Richard suggested, I got the following error (v8.4.2): ERROR: FOR UPDATE/SHARE in a recursive query is not implemented I'm sticking with the recursive query, because it seems to me the only way to ensure there are no cycles is to check the whole graph for cycles, and the only way I know how to do that is the recursive approach. Since "FOR UPDATE" isn't implemented for recursive queries, I'll just lock the entire table for now. Thanks, all! -Tony