Обсуждение: [PATCH] Support for Array ELEMENT Foreign Keys
Hi, please find attached version 1 of the patch introducing "Array ELEMENT Foreign Keys" support. This new thread and related patch substitute any previous discussion about "Support for foreign keys with arrays", as anticipated in http://archives.postgresql.org/pgsql-hackers/2012-07/msg01098.php This patch adds: * support for ELEMENT REFERENCES column constraint on array types - e.g. c1 INT[] ELEMENT REFERENCES t1 * support for array ELEMENT foreign key table constraints - e.g. FOREIGN KEY (ELEMENT c1) REFERENCES t1 * support for array ELEMENT foreign keys in multi-column foreign key table constraints - e.g. FOREIGN KEY (c1, ELEMENT c2) REFERENCES t1 (u1, u2) Array ELEMENT foreign keys are a special kind of foreign key constraint requiring the referencing column to be an array of elements of the same type as (or a compatible type to) the referenced column in the referenced table. Array ELEMENT foreign keys are an extension of PostgreSQL and are not included in the SQL standard. An usage example for this feature is the following: CREATE TABLE drivers ( driver_id integer PRIMARY KEY, first_name text, last_name text, ... ); CREATE TABLE races ( race_id integer PRIMARY KEY, title text, race_day DATE, ... final_positions integer[] ELEMENT REFERENCES drivers ); This initial patch present the following limitations: * Only one "ELEMENT" column allowed in a multi-column key - e.g. FOREIGN KEY (c1, ELEMENT c2, ELEMENT c3) REFERENCES t1 (u1, u2, u3) will throw an error * Supported actions: - NO ACTION - RESTRICT As noted in the last 9.2 commitfest, we feel it is important to consolidate the "array ELEMENT foreign key" syntax and to postpone decisions about referential integrity actions, allowing the community to have a clearer understanding of the feature goals and requirements. However, having array_replace() and array_remove() functions already being committed and using our previous patch as a basis, we are confident that a generally accepted syntax will come out in the next months through community collaborative dynamics. The patch includes documentation and an extensive coverage of tests (element_foreign_key.sql regression test file). Co-authors of this patch are Gabriele and Gianni from our Italian team at 2ndQuadrant. Thank you. Cheers, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Вложения
Hi, please find attached the refreshed v1 patch. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Вложения
On Tue, Sep 18, 2012 at 05:52:51PM +0200, Marco Nenciarini wrote: > please find attached the refreshed v1 patch. I perused this version in comparison to the last version I reviewed, finding minor problems. First, a warning: tablecmds.c: In function `ATExecAddConstraint': tablecmds.c:5898: warning: `fk_element_type' may be used uninitialized in this function tablecmds.c:5898: note: `fk_element_type' was declared here I don't see an actual bug; add a dead store to silence the compiler. > *** a/src/test/regress/parallel_schedule > --- b/src/test/regress/parallel_schedule > *************** test: event_trigger > *** 94,100 **** > # ---------- > # Another group of parallel tests > # ---------- > ! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data windowxmlmap functional_deps advisory_lock json > > # ---------- > # Another group of parallel tests > --- 94,100 ---- > # ---------- > # Another group of parallel tests > # ---------- > ! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data windowxmlmap functional_deps advisory_lock element_foreign_key Keep that json test. > + errmsg("array ELEMENT foreign keys only support NO ACTION " > + "and RESTRICT actions"))); Project style is not to break message literals; instead, let the line run long. There are a few more examples of this in your patch. Those problems are isolated and do not impugn design, so committer time would be just as well-spent on the latest version. As such, I'm marking the patch Ready for Committer. Thanks to Rafal Pietrak for his helpful review. nm
Thanks for your review. Please find the attached refreshed patch (v2) which fixes the loose ends you found. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciarini@2ndQuadrant.it | www.2ndQuadrant.it
Вложения
Marco Nenciarini <marco.nenciarini@2ndquadrant.it> writes: > Please find the attached refreshed patch (v2) which fixes the loose ends > you found. I've started looking at this patch, and the first thing I notice is that the syntax doesn't work. It's ambiguous, and this: %left JOIN CROSS LEFT FULL RIGHT INNER_P NATURAL /* kluge to keep xml_whitespace_option from causing shift/reduceconflicts */ %right PRESERVE STRIP_P + %nonassoc ELEMENT %% is not in any way an acceptable fix. All that that will do is cause an arbitrary choice to be made when it's not clear what to do. Half the time the arbitrary choice will be wrong. Consider for example regression=# create table t1 (f1 int[] default 4! element references t2); ERROR: column "element" does not exist The parser has resolved the ambiguity about whether "!" is an infix or postfix operator by assuming it's infix. (Yeah, I realize we've "fixed" some similar cases with precedence hacks, but they are cases that were forced on us by brain-dead syntax choices in the SQL standard. We don't need to go there for syntax we're making up ourselves.) We could get around that by making ELEMENT a fully reserved word, but I don't think that's a really acceptable solution. ELEMENT is reserved according to more recent versions of the SQL standard, but only as a built-in function name, and in any case reserving it is very likely to break people's existing applications. Another possibility is to forget about the column constraint ELEMENT REFERENCES syntax, and only support the table-constraint syntax with ELEMENT inside the column list --- I've not checked, but I think that syntax doesn't have any ambiguity problems. Or we could go back to using ARRAY here --- that should be safe since ARRAY is already fully reserved. Or we could choose some other syntax. I'm wondering about dropping the use of a keyword entirely, and instead using '[]' decoration. This wouldn't work too badly in the table constraint case: FOREIGN KEY (foo, bar[]) REFERENCES t (x,y) but I'm less sure where to put the decoration for the column constraint case. Thoughts? regards, tom lane
I wrote: > Or we could go back to using ARRAY here --- that should be safe since > ARRAY is already fully reserved. Ugh ... no, that doesn't work, because ARRAY[...] is allowed in c_expr and hence b_expr. So the ambiguity would still be there. We'd need a different fully-reserved keyword to go this way. regards, tom lane
On 10/18/2012 10:26 PM, Tom Lane wrote: > Another possibility is to forget about the column constraint ELEMENT > REFERENCES syntax, and only support the table-constraint syntax with > ELEMENT inside the column list --- I've not checked, but I think that > syntax doesn't have any ambiguity problems. > > Or we could go back to using ARRAY here --- that should be safe since > ARRAY is already fully reserved. > > Or we could choose some other syntax. I'm wondering about dropping the > use of a keyword entirely, and instead using '[]' decoration. This > wouldn't work too badly in the table constraint case: > > FOREIGN KEY (foo, bar[]) REFERENCES t (x,y) > > but I'm less sure where to put the decoration for the column constraint > case. > > Thoughts? > > I'm late to this party, so I apologize in advance if this has already been considered, but do we actually need a special syntax? Can't we just infer that we have one of these when the referring column is an array and the referenced column is of the base type of the array? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I'm late to this party, so I apologize in advance if this has already > been considered, but do we actually need a special syntax? Can't we just > infer that we have one of these when the referring column is an array > and the referenced column is of the base type of the array? Yeah, that was suggested before. I for one think it's a seriously bad idea. It takes away, or at least weakens, a fundamental sanity check on foreign-key references. Another point (which is not well handled by my []-syntax idea, I guess) is that it's not clear that there is one and only one sensible semantics for the case of an array referencing a scalar. We debated about "all elements of array must have a match" versus "at least one element of array must have a match". If we have some special syntax in there then there's room to change the syntax to select a different semantics, whereas if we just automatically do something when the column types are inconsistent, we're not gonna have a lot of wiggle room to support other behaviors. This thought also crystallizes something else that had been bothering me, which is that "ELEMENT" alone is a pretty bad choice of syntax because it entirely fails to make clear which of these semantics is meant. I'm tempted to propose that we use FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES ... which is certainly more verbose than just "ELEMENT" but I think it makes it clearer that each array element is required to have a match separately. If we ever implemented the other behavior it could be written as "ANY ELEMENT OF". That doesn't get us any closer to having a working column-constraint syntax unfortunately, because EACH is not a reserved word either so "EACH ELEMENT REFERENCES" still isn't gonna work. I'm getting more willing to give up on having a column-constraint form of this. regards, tom lane
On 10/19/2012 03:55 PM, Tom Lane wrote: > This thought also crystallizes something else that had been bothering me, > which is that "ELEMENT" alone is a pretty bad choice of syntax because > it entirely fails to make clear which of these semantics is meant. > I'm tempted to propose that we use > > FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES ... > > which is certainly more verbose than just "ELEMENT" but I think it > makes it clearer that each array element is required to have a match > separately. If we ever implemented the other behavior it could be > written as "ANY ELEMENT OF". > > That doesn't get us any closer to having a working column-constraint > syntax unfortunately, because EACH is not a reserved word either > so "EACH ELEMENT REFERENCES" still isn't gonna work. I'm getting > more willing to give up on having a column-constraint form of this. > > "ALL" is a fully reserved keyword. Could we do something like "ALL ELEMENTS"? cheers andrew
On Fri, Oct 19, 2012 at 3:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > That doesn't get us any closer to having a working column-constraint > syntax unfortunately, because EACH is not a reserved word either > so "EACH ELEMENT REFERENCES" still isn't gonna work. I'm getting > more willing to give up on having a column-constraint form of this. This is a little sneaky, but I presume you only get the grammar conflict if you try to sneak the "each" or "element" or "each element" or whatever-you-call-it designator in BEFORE the column name. So what about just putting it afterwards? Something like this: FOREIGN KEY (a, b BY ELEMENT) REFERENCES ... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Andrew Dunstan <andrew@dunslane.net> writes: > On 10/19/2012 03:55 PM, Tom Lane wrote: >> That doesn't get us any closer to having a working column-constraint >> syntax unfortunately, because EACH is not a reserved word either >> so "EACH ELEMENT REFERENCES" still isn't gonna work. I'm getting >> more willing to give up on having a column-constraint form of this. > "ALL" is a fully reserved keyword. Could we do something like "ALL > ELEMENTS"? [ experiments... ] bison is happy with "ALL ELEMENTS REFERENCES ..." as a column constraint, but from the standpoint of English grammar it's kinda sucky. "ANY ELEMENT REFERENCES ..." would be fine but that's not the case we're implementing now. regards, tom lane
Robert Haas <robertmhaas@gmail.com> writes: > This is a little sneaky, but I presume you only get the grammar > conflict if you try to sneak the "each" or "element" or "each element" > or whatever-you-call-it designator in BEFORE the column name. So what > about just putting it afterwards? Something like this: > FOREIGN KEY (a, b BY ELEMENT) REFERENCES ... That's not the syntax we're having problems with, it's the column constraint syntax; that is CREATE TABLE t1 (c int[] REFERENCES t2); It looks like we could support CREATE TABLE t1 (c int[] REFERENCES BY ELEMENT t2); but (1) this doesn't seem terribly intelligible to me, and (2) I don't see how we modify that if we want to provide at-least-one-match semantics later. regards, tom lane
On Fri, Oct 19, 2012 at 5:48 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It looks like we could support > > CREATE TABLE t1 (c int[] REFERENCES BY ELEMENT t2); > > but (1) this doesn't seem terribly intelligible to me, and > (2) I don't see how we modify that if we want to provide > at-least-one-match semantics later. What about something more generic? CREATE TABLE <tname> ( <cname> <type> [(<expr>)] REFERENCES <t2name> [(<t2expr>)] ) Meaning, if <expr> is missing, it's taken <expr> = <cname>, if not, it's the result of that expression the one that references the target table. Sounds crazy, but with ALL() and ANY() it ought to support lots of subcases.
On Friday, October 19, 2012 09:55:10 PM Tom Lane wrote: > FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES ... > > which is certainly more verbose than just "ELEMENT" but I think it > makes it clearer that each array element is required to have a match > separately. If we ever implemented the other behavior it could be > written as "ANY ELEMENT OF". > > That doesn't get us any closer to having a working column-constraint > syntax unfortunately, because EACH is not a reserved word either > so "EACH ELEMENT REFERENCES" still isn't gonna work. I'm getting > more willing to give up on having a column-constraint form of this. What about sticking a WHERE in there? I.e. FOREIGN KEY (foo, WHERE EACH ELEMENT OF bar) ... Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Claudio Freire <klaussfreire@gmail.com> writes: > What about something more generic? > CREATE TABLE <tname> ( <cname> <type> [(<expr>)] REFERENCES <t2name> > [(<t2expr>)] ) > Meaning, if <expr> is missing, it's taken <expr> = <cname>, if not, > it's the result of that expression the one that references the target > table. Doesn't seem terribly sensible as a column constraint: a column constraint ought to just be on the current column. If you want something more generic, the table-constraint syntax would be the place for it ... but that's not where we have a syntax problem. regards, tom lane
Andres Freund <andres@2ndquadrant.com> writes: > What about sticking a WHERE in there? I.e. FOREIGN KEY (foo, WHERE EACH > ELEMENT OF bar) ... Well, we don't really need it in the table-constraint case. The column-constraint case is the sticking point. I tested, and indeed this seems to work: CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); and it's perfectly sensible from an English-grammar standpoint too. If we take that, how would we spell the table-constraint case exactly? Grammatically I'd prefer FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES but this seems a bit far afield from the column-constraint syntax. OTOH, that's a pretty minor quibble. These work according to bison, and they wouldn't make a grammarian run away screaming, so maybe we should just be happy with that. regards, tom lane
On 10/19/2012 04:40 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> On 10/19/2012 03:55 PM, Tom Lane wrote: >>> That doesn't get us any closer to having a working column-constraint >>> syntax unfortunately, because EACH is not a reserved word either >>> so "EACH ELEMENT REFERENCES" still isn't gonna work. I'm getting >>> more willing to give up on having a column-constraint form of this. >> "ALL" is a fully reserved keyword. Could we do something like "ALL >> ELEMENTS"? > [ experiments... ] bison is happy with "ALL ELEMENTS REFERENCES ..." > as a column constraint, but from the standpoint of English grammar > it's kinda sucky. "ANY ELEMENT REFERENCES ..." would be fine but > that's not the case we're implementing now. > > Well, we could add "REFERENCE" as a non-reserved keyword. I agree it's not ideal. cheers andrew
I wrote: > I tested, and indeed this seems to work: > CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); > and it's perfectly sensible from an English-grammar standpoint too. > If we take that, how would we spell the table-constraint case exactly? > Grammatically I'd prefer > FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES Are people happy with these syntax proposals, or do we need some other color for the bikeshed? regards, tom lane
2012/10/22 Tom Lane <tgl@sss.pgh.pa.us>: > I wrote: >> I tested, and indeed this seems to work: >> CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); >> and it's perfectly sensible from an English-grammar standpoint too. >> If we take that, how would we spell the table-constraint case exactly? >> Grammatically I'd prefer >> FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES > > Are people happy with these syntax proposals, or do we need some other > color for the bikeshed? I am ok Pavel > > regards, tom lane > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers
On 10/22/2012 12:08 PM, Tom Lane wrote: > I wrote: >> I tested, and indeed this seems to work: >> CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); >> and it's perfectly sensible from an English-grammar standpoint too. >> If we take that, how would we spell the table-constraint case exactly? >> Grammatically I'd prefer >> FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES > Are people happy with these syntax proposals, or do we need some other > color for the bikeshed? I can live with it, although the different spelling is slightly jarring. cheers andrew
On Monday, October 22, 2012 06:08:32 PM Tom Lane wrote: > I wrote: > > I tested, and indeed this seems to work: > > CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); > > > > and it's perfectly sensible from an English-grammar standpoint too. > > If we take that, how would we spell the table-constraint case exactly? > > Grammatically I'd prefer > > > > FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES > > Are people happy with these syntax proposals, or do we need some other > color for the bikeshed? Except that I'd prefer a WHERE in the table-constraint case as well for consistencies sake I am unsurprisingly happy with the proposal. Greetings, Andres -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/22/2012 12:13 PM, Andres Freund wrote: > On Monday, October 22, 2012 06:08:32 PM Tom Lane wrote: >> I wrote: >>> I tested, and indeed this seems to work: >>> CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); >>> >>> and it's perfectly sensible from an English-grammar standpoint too. >>> If we take that, how would we spell the table-constraint case exactly? >>> Grammatically I'd prefer >>> >>> FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES >> Are people happy with these syntax proposals, or do we need some other >> color for the bikeshed? > Except that I'd prefer a WHERE in the table-constraint case as well for > consistencies sake I am unsurprisingly happy with the proposal. That would look odd too, especially if the array isn't the last element in the FK: FOREIGN KEY (foo, WHERE EACH ELEMENT OF bar, baz) REFERENCES cheers andrew
On Mon, Oct 22, 2012 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I tested, and indeed this seems to work: >> CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); >> and it's perfectly sensible from an English-grammar standpoint too. >> If we take that, how would we spell the table-constraint case exactly? >> Grammatically I'd prefer >> FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES > > Are people happy with these syntax proposals, or do we need some other > color for the bikeshed? Well, I can't say I'm very happy with the discrepancy between the two syntaxes, but I guess I'm in the minority. Still, I can't help but think it's going to be confusing and hard to remember. If we don't get complaints about it, I'll take that as evidence that the feature isn't being used, rather than evidence that the syntax is satisfactory. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Oct 22, 2012 at 12:08 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I wrote: >>> I tested, and indeed this seems to work: >>> CREATE TABLE t1 (c int[] WHERE EACH ELEMENT REFERENCES t2); >>> and it's perfectly sensible from an English-grammar standpoint too. >>> If we take that, how would we spell the table-constraint case exactly? >>> Grammatically I'd prefer >>> FOREIGN KEY (foo, EACH ELEMENT OF bar) REFERENCES >> Are people happy with these syntax proposals, or do we need some other >> color for the bikeshed? > Well, I can't say I'm very happy with the discrepancy between the two > syntaxes, but I guess I'm in the minority. Still, I can't help but > think it's going to be confusing and hard to remember. If we don't > get complaints about it, I'll take that as evidence that the feature > isn't being used, rather than evidence that the syntax is > satisfactory. I'm not thrilled with the inconsistency either, but given the constraints we're under, it seems like the best we can do. (I feel, as Andrew does, that shoving WHERE into the table-constraint syntax would not be an improvement; but the column-constraint syntax really needs to start with a fully-reserved word). Have you got a better proposal? regards, tom lane
On Mon, Oct 22, 2012 at 2:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I'm not thrilled with the inconsistency either, but given the > constraints we're under, it seems like the best we can do. (I feel, > as Andrew does, that shoving WHERE into the table-constraint syntax > would not be an improvement; but the column-constraint syntax really > needs to start with a fully-reserved word). Have you got a better > proposal? Well, I think if that's the best we can do, you original proposal of ditching the column constraint syntax altogether might be for the best. I wasn't too excited about that before, but I think having two different syntaxes is going to be even worse. In some ways, it's actually sort of sensible, because the referring side isn't really the column itself; it's some value extracted therefrom. You can imagine other variants of that as well, such as the recently-suggested FOREIGN KEY ((somecol).member_name) REFERENCES othertab (doohicky) Now, what would the column-constraint version of that look like? Is it even sensible to think that there SHOULD be a column-constraint version of that? I'm not convinced it is sensible, so maybe decreeing that the table constraint version must be used to handle all non-trivial cases is more sensible than I initially thought. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Well, I think if that's the best we can do, you original proposal of > ditching the column constraint syntax altogether might be for the > best. I wasn't too excited about that before, but I think having two > different syntaxes is going to be even worse. In some ways, it's > actually sort of sensible, because the referring side isn't really the > column itself; it's some value extracted therefrom. You can imagine > other variants of that as well, such as the recently-suggested > FOREIGN KEY ((somecol).member_name) REFERENCES othertab (doohicky) > Now, what would the column-constraint version of that look like? Is > it even sensible to think that there SHOULD be a column-constraint > version of that? I'm not convinced it is sensible, so maybe decreeing > that the table constraint version must be used to handle all > non-trivial cases is more sensible than I initially thought. I could easily go with that ... regards, tom lane
On 10/22/12 4:22 PM, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> Well, I think if that's the best we can do, you original proposal of >> ditching the column constraint syntax altogether might be for the >> best. I wasn't too excited about that before, but I think having two >> different syntaxes is going to be even worse. In some ways, it's >> actually sort of sensible, because the referring side isn't really the >> column itself; it's some value extracted therefrom. You can imagine >> other variants of that as well, such as the recently-suggested > >> FOREIGN KEY ((somecol).member_name) REFERENCES othertab (doohicky) > >> Now, what would the column-constraint version of that look like? Is >> it even sensible to think that there SHOULD be a column-constraint >> version of that? I'm not convinced it is sensible, so maybe decreeing >> that the table constraint version must be used to handle all >> non-trivial cases is more sensible than I initially thought. > > I could easily go with that ... I'm getting around to that conclusion as well.
Now that it seems like we've got consensus on syntax, let's talk about some implementation issues. Ordinarily, the query executed during an insert or update on the referencing table looks like, for example, SELECT 1 FROM ONLY "public"."pk" x WHERE "f1" OPERATOR(pg_catalog.=) $1 FOR SHARE OF x where $1 is a parameter representing the referencing column's value. This will find and lock the referenced row if there is one. (There can't be more than one, because the equality constraint corresponds to the unique index on the referenced column pk.f1.) The proposed patch uses this if the referencing column is an array: SELECT 1 WHERE (SELECT pg_catalog.count(DISTINCT y) FROM pg_catalog.unnest($1) y) OPERATOR(pg_catalog.=) (SELECT pg_catalog.count(*)FROM (SELECT 1 FROM ONLY "public"."pk" x WHERE "f1" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OFx) z) In English, the idea is to find and lock all PK rows matching any element of the array referencing value, and count them, and then see if that's equal to the number of distinct non-null elements in the array value. I find this pretty grotty. Quite aside from any aesthetic concerns, it's broken because it presupposes that count(distinct y) has exactly the same notion of equality that the PK unique index has. In reality, count(distinct) will fall back to the default btree opclass for the array element type. There might not be such an opclass, or it might not be compatible with the PK unique index. This is not just an academic concern: for instance, there are distinct values of the numeric type that will compare equal to the same float8 PK value, because of the limited precision of float comparison. In my working copy of the patch, I've dealt with this by inserting a creation-time restriction that the array element type has to have a default btree opclass that is part of the PK index's opfamily. This is not very desirable, because it means that some cases that are allowed in plain FKs are disallowed in array FKs. Example: regression=# create table ff (f1 float8 primary key); CREATE TABLE regression=# create table cc (f1 numeric references ff); CREATE TABLE regression=# create table cc2 (f1 numeric[], foreign key(each element of f1) references ff); ERROR: foreign key constraint "cc2_f1_fkey" cannot be implemented DETAIL: Key column "f1" has element type numeric which does not have a default btree operator class that's compatible withclass "float8_ops". So I'm looking for a better answer. One somewhat brute-force answer is to not try to use = ANY at all in the RI test query, but instead deconstruct the array value within the RI trigger and execute the standard scalar locking query for each array element. One attraction that would have is making it easier to produce a decent error message. Right now, if you insert an array value that has an invalid element, you get something like this: regression=# create table pk (f1 int primary key); CREATE TABLE regression=# create table ref1 (f1 int[], foreign key(each element of f1) references pk); CREATE TABLE regression=# insert into pk values (1),(2); INSERT 0 2 regression=# insert into ref1 values(array[1,2,5]); ERROR: insert or update on table "ref1" violates foreign key constraint "ref1_f1_fkey" DETAIL: Key (f1)=({1,2,5}) is not present in table "pk". I don't find that too helpful even with three elements, and it would be very much not helpful with hundreds. I would rather it told me that "5" is the problem. So that's the direction I was thinking of going in, but I wonder if anyone has a better idea. BTW, there is a second undesirable dependency on default opclass semantics in the patch, which is that it supposes it can use array_eq() to detect whether or not the referencing column has changed. But I think that can be fixed without undue pain by providing a refactored version of array_eq() that can be told which element-comparison function to use. regards, tom lane
On Wed, Oct 24, 2012 at 12:36:31AM -0400, Tom Lane wrote: > The proposed patch uses this if the referencing column is an array: > > SELECT 1 WHERE > (SELECT pg_catalog.count(DISTINCT y) FROM pg_catalog.unnest($1) y) > OPERATOR(pg_catalog.=) > (SELECT pg_catalog.count(*) FROM > (SELECT 1 FROM ONLY "public"."pk" x > WHERE "f1" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF x) z) > > In English, the idea is to find and lock all PK rows matching any > element of the array referencing value, and count them, and then see if > that's equal to the number of distinct non-null elements in the array > value. I find this pretty grotty. Quite aside from any aesthetic > concerns, it's broken because it presupposes that count(distinct y) > has exactly the same notion of equality that the PK unique index has. > In reality, count(distinct) will fall back to the default btree opclass > for the array element type. There might not be such an opclass, or it > might not be compatible with the PK unique index. This is not just an > academic concern: for instance, there are distinct values of the numeric > type that will compare equal to the same float8 PK value, because of the > limited precision of float comparison. Good point. > One somewhat brute-force answer > is to not try to use = ANY at all in the RI test query, but instead > deconstruct the array value within the RI trigger and execute the > standard scalar locking query for each array element. One attraction > that would have is making it easier to produce a decent error message. > Right now, if you insert an array value that has an invalid element, > you get something like this: > > regression=# create table pk (f1 int primary key); > CREATE TABLE > regression=# create table ref1 (f1 int[], foreign key(each element of f1) references pk); > CREATE TABLE > regression=# insert into pk values (1),(2); > INSERT 0 2 > regression=# insert into ref1 values(array[1,2,5]); > ERROR: insert or update on table "ref1" violates foreign key constraint "ref1_f1_fkey" > DETAIL: Key (f1)=({1,2,5}) is not present in table "pk". > > I don't find that too helpful even with three elements, and it would be > very much not helpful with hundreds. I would rather it told me that > "5" is the problem. > > So that's the direction I was thinking of going in, but I wonder if > anyone has a better idea. The error message improvement would be nice. I'd be wary of the performance of firing up hundreds or thousands of SPI queries to validate a single long array, though. We can always use a slow path to prepare a nice error message. For FKs, we currently document that "The referenced columns must be the columns of a non-deferrable unique or primary key constraint in the referenced table." Taking that literally, one might imagine that bare UNIQUE indexes do not qualify. However, transformFkeyCheckAttrs() does accept them, including indexes with non-default operator classes: create table parent (c text); create unique index on parent(c text_pattern_ops); create table child (c text references parent(c)); That's a bit suspect already; in particular, given multiple indexes with different operator classes, I see no good way for the user to choose the one characterizing his foreign key constraint. Suppose, for this new feature, we enforce the letter of the documentation and accept only real UNIQUE/PK constraints. (Perhaps also allow UNIQUE indexes eligible to be converted to UNIQUE constraints?) With that restriction, we'll know that the PK side of the constraint uses the default b-tree operator class of the PK-side type. At that point, casting the element to the PK type in the count(DISTINCT) expression should fix the problem, correct? > BTW, there is a second undesirable dependency on default opclass > semantics in the patch, which is that it supposes it can use array_eq() > to detect whether or not the referencing column has changed. But I > think that can be fixed without undue pain by providing a refactored > version of array_eq() that can be told which element-comparison function > to use. Does the bit near this comment not cover that? + /* + * In case of an array ELEMENT FK, make sure TYPECACHE_EQ_OPR exists + * for the FK element_type and it is compatible with pfeqop + */ Thanks, nm
Noah Misch <noah@leadboat.com> writes: > For FKs, we currently document that "The referenced columns must be the > columns of a non-deferrable unique or primary key constraint in the referenced > table." Taking that literally, one might imagine that bare UNIQUE indexes do > not qualify. However, transformFkeyCheckAttrs() does accept them, including > indexes with non-default operator classes: Indeed, and considerable sweat was spilled to make that happen. I'm pretty unimpressed with any proposal that we should just blow that off for array keys. Now, I concede that cross-type FKs are a corner case to begin with, and we may well end up concluding that it's just too much work to handle it for arrays because of the lack of infrastructure for applying non-default comparison operators to arrays. But I don't want that to happen just because we failed to even think about it. However, I'm about to bounce this patch back for rework anyway, because I've just noticed that it has fatal performance issues. If you issue any UPDATE or DELETE against the PK table, you get a query like this (shorn of some uninteresting syntactic details) for checking to see if the RI constraint would be violated: SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x; It is impossible to implement this query except with a full-table seqscan on the FK table. You can put a GIN index on the array fkcol, but that won't help, because "something = ANY (indexedcol)" isn't an indexable condition. I don't think we can ship a feature that's unusable for anything except toy-sized tables, and that's what this is right now. One way we could consider making this GIN-indexable is to change it to SELECT 1 FROM ONLY fktable x WHERE ARRAY[$1] <@ fkcol FOR SHARE OF x; However, that puts us right back into the problem that we have no control over the specific comparison semantics that <@ uses. Or we could try to teach PG to make "something = ANY (indexedcol)" indexable. That would likely be a pretty substantial amount of work though. In particular, matching such a query to a GIN index would require knowing whether the "=" operator corresponds to the idea of equality embodied in the GIN opclass's key compare() method, and that isn't information that's available from the current opclass API. regards, tom lane
Marco Nenciarini <marco.nenciarini@2ndquadrant.it> writes: > Please find the attached refreshed patch (v2) which fixes the loose ends > you found. Attached is a v3 patch that updates the syntax per discussion, uses what seems to me to be a saner (more extensible) catalog representation, and contains assorted other code cleanup. I have not touched the documentation at all except for catalogs.sgml, so it still explains the old syntax. I have to stop working on this now, because I've already expended more time on it than I should, and it still has the serious problems mentioned in http://archives.postgresql.org/message-id/16787.1351053391@sss.pgh.pa.us and http://archives.postgresql.org/message-id/28389.1351094795@sss.pgh.pa.us I'm going to mark this Returned With Feedback for the current CF. regards, tom lane
Вложения
On Wed, Oct 24, 2012 at 12:06:35PM -0400, Tom Lane wrote: > Noah Misch <noah@leadboat.com> writes: > > For FKs, we currently document that "The referenced columns must be the > > columns of a non-deferrable unique or primary key constraint in the referenced > > table." Taking that literally, one might imagine that bare UNIQUE indexes do > > not qualify. However, transformFkeyCheckAttrs() does accept them, including > > indexes with non-default operator classes: > > Indeed, and considerable sweat was spilled to make that happen. I'm > pretty unimpressed with any proposal that we should just blow that off > for array keys. Now, I concede that cross-type FKs are a corner case to > begin with, and we may well end up concluding that it's just too much > work to handle it for arrays because of the lack of infrastructure for > applying non-default comparison operators to arrays. But I don't want > that to happen just because we failed to even think about it. Sure, it's important to raise for discussion. The way I see it, it's the existing functions and operators for arrays that blew off non-default element operator classes. This patch is just dealing with those building blocks on their own terms. I would hesitate to give up cross-type support, but support for a non-default operator class on the PK side seems expendable. Given the limitations that I mentioned for the corresponding feature of ordinary foreign keys, I'm skeptical of its importance for ELEMENT foreign keys. On further reflection, we could stop short of preemptively forbidding non-default operator classes and just teach transformFkeyCheckAttrs() to select an affected index only as a last resort. The equality_ops_are_compatible() check in ATAddForeignKeyConstraint() may proceed to reject the index. A text_pattern_ops UNIQUE index uses the same equality operator as a UNIQUE constraint, and it would continue to be rightly accepted. > However, I'm about to bounce this patch back for rework anyway, because > I've just noticed that it has fatal performance issues. If you issue > any UPDATE or DELETE against the PK table, you get a query like this > (shorn of some uninteresting syntactic details) for checking to see > if the RI constraint would be violated: > > SELECT 1 FROM ONLY fktable x WHERE $1 = ANY (fkcol) FOR SHARE OF x; > > It is impossible to implement this query except with a full-table > seqscan on the FK table. You can put a GIN index on the array fkcol, > but that won't help, because "something = ANY (indexedcol)" isn't an > indexable condition. I don't think we can ship a feature that's > unusable for anything except toy-sized tables, and that's what this is > right now. > > One way we could consider making this GIN-indexable is to change it to > > SELECT 1 FROM ONLY fktable x WHERE ARRAY[$1] <@ fkcol FOR SHARE OF x; > > However, that puts us right back into the problem that we have no > control over the specific comparison semantics that <@ uses. > > Or we could try to teach PG to make "something = ANY (indexedcol)" > indexable. That would likely be a pretty substantial amount of work > though. In particular, matching such a query to a GIN index would > require knowing whether the "=" operator corresponds to the idea of > equality embodied in the GIN opclass's key compare() method, and that > isn't information that's available from the current opclass API. Perhaps, then, excluding all cross-type ELEMENT FKs is the right start. With that and the operator compatibility check already appearing in the patch, we can prove that <@ has the right semantics. Doing better adds either large subprojects or seemingly-ad-hoc limitations. It's ugly, but only in a manner comparable to "ARRAY[0.0] < ARRAY[0]" finding no operator. Thanks, nm
On 24 October 2012 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
-- Marco Nenciarini <marco.nenciarini@2ndquadrant.it> writes:
> Please find the attached refreshed patch (v2) which fixes the loose ends
> you found.
Attached is a v3 patch that updates the syntax per discussion, uses what
seems to me to be a saner (more extensible) catalog representation, and
contains assorted other code cleanup. I have not touched the
documentation at all except for catalogs.sgml, so it still explains the
old syntax. I have to stop working on this now, because I've already
expended more time on it than I should, and it still has the serious
problems mentioned in
http://archives.postgresql.org/message-id/16787.1351053391@sss.pgh.pa.us
and
http://archives.postgresql.org/message-id/28389.1351094795@sss.pgh.pa.us
I'm going to mark this Returned With Feedback for the current CF.
Does anyone have any intention of resurrecting this at this stage?
Thom
Thom Brown wrote: > On 24 October 2012 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Marco Nenciarini <marco.nenciarini@2ndquadrant.it> writes: > > > Please find the attached refreshed patch (v2) which fixes the loose ends > > > you found. > > > > Attached is a v3 patch that updates the syntax per discussion, uses what > > seems to me to be a saner (more extensible) catalog representation, and > > contains assorted other code cleanup. I have not touched the > > documentation at all except for catalogs.sgml, so it still explains the > > old syntax. I have to stop working on this now, because I've already > > expended more time on it than I should, and it still has the serious > > problems mentioned in > > http://archives.postgresql.org/message-id/16787.1351053391@sss.pgh.pa.us > > and > > http://archives.postgresql.org/message-id/28389.1351094795@sss.pgh.pa.us > > > > I'm going to mark this Returned With Feedback for the current CF. > > > > Does anyone have any intention of resurrecting this at this stage? Not in this room. Do you? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Alvaro Herrera wrote: > Thom Brown wrote: > > Does anyone have any intention of resurrecting this at this stage? > > Not in this room. Do you? I should have added some more context so that people realizes that "this room" contains the 2ndQuadrant people involved in writing this patch. Also I wanted to say that I find the remaining problems as outlined by Tom very interesting and I would consider attacking them, except that I have a few other time commitments at the moment. But if there's anyone out there with an inclination towards interesting problems, it might be possible to get them to lend a hand here. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 25 October 2014 13:28, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
-- Not in this room. Do you?Thom Brown wrote:
> On 24 October 2012 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Marco Nenciarini <marco.nenciarini@2ndquadrant.it> writes:
> > > Please find the attached refreshed patch (v2) which fixes the loose ends
> > > you found.
> >
> > Attached is a v3 patch that updates the syntax per discussion, uses what
> > seems to me to be a saner (more extensible) catalog representation, and
> > contains assorted other code cleanup. I have not touched the
> > documentation at all except for catalogs.sgml, so it still explains the
> > old syntax. I have to stop working on this now, because I've already
> > expended more time on it than I should, and it still has the serious
> > problems mentioned in
> > http://archives.postgresql.org/message-id/16787.1351053391@sss.pgh.pa.us
> > and
> > http://archives.postgresql.org/message-id/28389.1351094795@sss.pgh.pa.us
> >
> > I'm going to mark this Returned With Feedback for the current CF.
> >
>
> Does anyone have any intention of resurrecting this at this stage?
I'm not qualified to, but I'm happy to make time to test it when it next gets picked up. My email was really just bumping the topic.
Thom
On 25 October 2014 19:19, Thom Brown <thom@linux.com> wrote:
-- On 25 October 2014 13:28, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:Not in this room. Do you?Thom Brown wrote:
> On 24 October 2012 18:17, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > Marco Nenciarini <marco.nenciarini@2ndquadrant.it> writes:
> > > Please find the attached refreshed patch (v2) which fixes the loose ends
> > > you found.
> >
> > Attached is a v3 patch that updates the syntax per discussion, uses what
> > seems to me to be a saner (more extensible) catalog representation, and
> > contains assorted other code cleanup. I have not touched the
> > documentation at all except for catalogs.sgml, so it still explains the
> > old syntax. I have to stop working on this now, because I've already
> > expended more time on it than I should, and it still has the serious
> > problems mentioned in
> > http://archives.postgresql.org/message-id/16787.1351053391@sss.pgh.pa.us
> > and
> > http://archives.postgresql.org/message-id/28389.1351094795@sss.pgh.pa.us
> >
> > I'm going to mark this Returned With Feedback for the current CF.
> >
>
> Does anyone have any intention of resurrecting this at this stage?I'm not qualified to, but I'm happy to make time to test it when it next gets picked up. My email was really just bumping the topic.
I should mention that the latest patch no longer applies against git master, so I can't test it in its current form.
Thom