Обсуждение: FK v.s unique indexes
Hi, For some time now, I'm withholding new features in my DB application as I wasn't able to have unique constraints on partitioned tables. PG-v11 now has it and I've given it a try, but to my surprise it does not give it fully to the application. Those indexes don't support FK! At this point I've also checked partial indexes to see if they could support a sort of "FK duality" I have in my datasets, but they don't either (see below EXPLAINING). I'd like to understand why. I'd appreciate it if somebody could shred some light on the technical reasons/background behind those restrictions. EXPLAINING: ---------------------------------- psql (11beta2 (Debian 11~beta2-1)) Type "help" for help. tst=# create table test1(load bigint, a int, b int, c bool) partition by list (c); CREATE TABLE tst=# create table test1_true partition of test1 for values in (true); CREATE TABLE tst=# create table test1_false partition of test1 for values in (false); CREATE TABLE tst=# create unique index load ON test1 (load,a,b,c); CREATE INDEX tst=# create table info_text1 (load text, a int, b int, c bool, info text, foreign key (load,a,b,c) references test1(load,a,b,c)) ; ERROR: cannot reference partitioned table "test1" ---------------------------------- Why is this forbidden? For my application I could live without partitioning. Just using partial indexes would be enough. Still, this does not work either: ------------------------------- tst=# create table test2(load bigint, a int, b int, c bool) ; CREATE TABLE tst=# create unique index test2_true ON test2 (load,a) where c is true ; CREATE INDEX tst=# create unique index test2_false ON test2 (load,b) where c is false; CREATE INDEX tst=# create table info_text2 (load text, a int, info text, more_info text, foreign key (load,a) references test2(load,a)) ; ERROR: there is no unique constraint matching given keys for referenced table "test2" ---------------------------- I cannot see any reasons why this functionality is blocked. In particular, contrary to what the ERROR says, the target table *does have* a "unique constraint matching given keys", admittedly only partial. Yet, why should that matter at all? A unique index, partial or not, always yield a single row, and that's all what matters for FK. Right? I would very much like to understand the reasoning behind the above restrictions (on the use of indexes for FK targets), as this would probably help me avoid poor decisions in my database schema design. So I would appreciate it very very much if somebody could point me to reasoning behind such implementation. Regards, -R
On 3 July 2018 at 19:30, Rafal Pietrak <rafal@ztk-rp.eu> wrote: > tst=# create table test1(load bigint, a int, b int, c bool) partition by > list (c); > CREATE TABLE > tst=# create table test1_true partition of test1 for values in (true); > CREATE TABLE > tst=# create table test1_false partition of test1 for values in (false); > CREATE TABLE > tst=# create unique index load ON test1 (load,a,b,c); > CREATE INDEX > tst=# create table info_text1 (load text, a int, b int, c bool, info > text, foreign key (load,a,b,c) references test1(load,a,b,c)) ; > ERROR: cannot reference partitioned table "test1" > ---------------------------------- > > Why is this forbidden? I don't think there were any actual roadblocks, it was more of just not enough time in the cycle to make it work due to a few technical details that required extra effort to make work. Alvaro managed to simplify the problem and allow foreign keys to be defined on partitioned tables and get that into PG11. So it was a case of 50% is better than 0%, which I very agree with. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Rafal Pietrak <rafal@ztk-rp.eu> writes: > > In particular, contrary to what the ERROR says, the target table *does > have* a "unique constraint matching given keys", admittedly only > partial. Yet, why should that matter at all? A unique index, partial or > not, always yield a single row, and that's all what matters for FK. Right? > Is that correct? I would have thought that if you have a multi-key unique index and you only provide values for some of the keys in the index, you would have no guarantee of a single row being returned. If this was true, then the additional keys are superfluous. Have you tried doing the same thing where the fk keys and remote unique index keys are equal in number? -- Tim Cross
On Tuesday, July 3, 2018, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
ERROR: there is no unique constraint matching given keys for referenced
table "test2"
----------------------------
I cannot see any reasons why this functionality is blocked.
In particular, contrary to what the ERROR says, the target table *does
have* a "unique constraint matching given keys", admittedly only
partial.
You are making the common error of confusing the distinct concepts of constraints and indexs. Table constraints cannot be partial by definition, and are a logical concept constraining the data model. Indexes are physical objects that only aid in the execution of queries. The only crossover is that the implementation of a unique table constraint uses a full unique index as an implementation detail.
The error says 'constraint' and indeed you have nit defined a relevant constraint in your schema, just indexes.
David J.
W dniu 04.07.2018 o 00:55, David G. Johnston pisze: > On Tuesday, July 3, 2018, Rafal Pietrak <rafal@ztk-rp.eu > <mailto:rafal@ztk-rp.eu>> wrote: > > > ERROR: there is no unique constraint matching given keys for referenced > table "test2" > ---------------------------- > > I cannot see any reasons why this functionality is blocked. > > In particular, contrary to what the ERROR says, the target table *does > have* a "unique constraint matching given keys", admittedly only > partial. > > > You are making the common error of confusing the distinct concepts of > constraints and indexs. Table constraints cannot be partial by > definition, and are a logical concept constraining the data model. Hmmm.. This does not match "my reality". Naturally I may be wrong, but the example I've posted reflects my actual data I'm putting into the RDBMS. That is: 1. the data has unique constraint on (load,a,b,c) 2. and the data have additional unique constraints on (load,a), provided c is true, and (load,b) whenever c is false. Pls consider in real life: load (a person), can have either a (a kind of brest cancer); or b (a kind of prostrate) - this is only a cooked example attemping to illustrate, that one may need to put additional constraints on the entire dataset. I'm creating partial indexes *meaning* enforcing constraints to: 1. avoid accumulation of dataset errors due to illegal data entry, and... 2. to allow for other tables to reference only parts of the main table. Those other tables contain data relevant only to the parts of main table they are referring. As of now, to have it FK to main table, I have to unnecessarily keep in those other tables constant data, which is "obvious" to them (like b & c values, which are irrelevant for unique <load,a> cases). But. IMHO, my usage of indexes in this case is actually setting up a partial constraint. As I cannot use them for FK, there is no real use for them as query execution support. And if I drop them, I'm actually allowing data in my table, which is inconsistent with reality .... so they actually do play a role of constraints. So I do have data which require partial constraints. I could have make them as sort of "checks". But it is much easier for me to just have a partial unique index. And it's very naturally meaningful to anybody "reading the schema" in the future. I don't think there is any harm in it. > Indexes are physical objects that only aid in the execution of queries. > The only crossover is that the implementation of a unique table > constraint uses a full unique index as an implementation detail. > > The error says 'constraint' and indeed you have nit defined a relevant > constraint in your schema, just indexes. Yes, but my goal isn't just enforcing constraints. My goal is to make other tables refer to the main datatable with as short FK as possible. To make those references (in postgresql) I have to have a unique index. Still, I cannot do that on partials..... My question is: Why? Why postgresql have introduced such restriction. It serves no purpose. Without it, everything would just work (I think :). -R
W dniu 03.07.2018 o 11:23, David Rowley pisze: > On 3 July 2018 at 19:30, Rafal Pietrak <rafal@ztk-rp.eu> wrote: [---------------] >> >> Why is this forbidden? > > I don't think there were any actual roadblocks, it was more of just > not enough time in the cycle to make it work due to a few technical > details that required extra effort to make work. > > Alvaro managed to simplify the problem and allow foreign keys to be > defined on partitioned tables and get that into PG11. > > So it was a case of 50% is better than 0%, which I very agree with. > I see. It's great and usefull, yes. And I'm looking forward to see the other 50% :) Thenx!! -R
> On Jul 5, 2018, at 1:30 AM, Rafal Pietrak <rafal@ztk-rp.eu> wrote: > > > > W dniu 04.07.2018 o 00:55, David G. Johnston pisze: >> On Tuesday, July 3, 2018, Rafal Pietrak <rafal@ztk-rp.eu >> <mailto:rafal@ztk-rp.eu>> wrote: >> >> >> ERROR: there is no unique constraint matching given keys for referenced >> table "test2" >> ---------------------------- >> >> I cannot see any reasons why this functionality is blocked. >> >> In particular, contrary to what the ERROR says, the target table *does >> have* a "unique constraint matching given keys", admittedly only >> partial. >> >> >> You are making the common error of confusing the distinct concepts of >> constraints and indexs. Table constraints cannot be partial by >> definition, and are a logical concept constraining the data model. > > Hmmm.. > > This does not match "my reality". Naturally I may be wrong, but the > example I've posted reflects my actual data I'm putting into the RDBMS. > That is: > 1. the data has unique constraint on (load,a,b,c) > 2. and the data have additional unique constraints on (load,a), provided > c is true, and (load,b) whenever c is false. > > Pls consider in real life: load (a person), can have either a (a kind of > brest cancer); or b (a kind of prostrate) - this is only a cooked > example attemping to illustrate, that one may need to put additional > constraints on the entire dataset. > It’s difficult enough to define a unique person (without mother and father) and certainly this weeks definition of burdenis not likely to help matters. If you’re main worry is data consistency you might be better off normalizing your structure- either with separate tables per cancer type (person id, cancer specifics; unique on person) or in a single tableone per line (person id, cancer type, cancer description; unique on person). You can reconstitue person,breast,prostatefrom either of those. We won’t quibble on one person having both (though remotely possible, men doget breast cancer).
David Rowley-3 wrote > I don't think there were any actual roadblocks, it was more of just > not enough time in the cycle to make it work due to a few technical > details that required extra effort to make work. > > Alvaro managed to simplify the problem and allow foreign keys to be > defined on partitioned tables and get that into PG11. > > So it was a case of 50% is better than 0%, which I very agree with. That's a really great news. I was waiting for this feature for many years. Finally! Thank you guys! -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
On Thursday, July 5, 2018, Rafal Pietrak <rafal@ztk-rp.eu> wrote:
W dniu 04.07.2018 o 00:55, David G. Johnston pisze:
> On Tuesday, July 3, 2018, Rafal Pietrak <rafal@ztk-rp.eu
> <mailto:rafal@ztk-rp.eu>> wrote:
>
>
> ERROR: there is no unique constraint matching given keys for referenced
> table "test2"
> ----------------------------
>
> I cannot see any reasons why this functionality is blocked.
>
> In particular, contrary to what the ERROR says, the target table *does
> have* a "unique constraint matching given keys", admittedly only
> partial.
>
>
> You are making the common error of confusing the distinct concepts of
> constraints and indexs. Table constraints cannot be partial by
> definition, and are a logical concept constraining the data model.
Hmmm..
This does not match "my reality". Naturally I may be wrong, but the
example I've posted reflects my actual data I'm putting into the RDBMS.
That is:
1. the data has unique constraint on (load,a,b,c)
2. and the data have additional unique constraints on (load,a), provided
c is true, and (load,b) whenever c is false.
I probably generalized too much, especially since exclusion constrains can be partial, so I'll be specific here. Foreign Keys don't have where clauses and thus cannot target partial constraints. If you want to overcome that limitation you can write a custom trigger.
I'm sure at least some of this is simply due to desirability as opposed to some fundamental limitation, but in the end that is how the system works today. Integer-table FK relationships are defined over the entire PK table, not a subset.
David J.
W dniu 05.07.2018 o 15:18, David G. Johnston pisze: > On Thursday, July 5, 2018, Rafal Pietrak <rafal@ztk-rp.eu > <mailto:rafal@ztk-rp.eu>> wrote: > > W dniu 04.07.2018 o 00:55, David G. Johnston pisze: > > On Tuesday, July 3, 2018, Rafal Pietrak <rafal@ztk-rp.eu > <mailto:rafal@ztk-rp.eu> > > <mailto:rafal@ztk-rp.eu <mailto:rafal@ztk-rp.eu>>> wrote: > > > > > > ERROR: there is no unique constraint matching given keys for > referenced > > table "test2" > > ---------------------------- > > > > I cannot see any reasons why this functionality is blocked. > > > > In particular, contrary to what the ERROR says, the target > table *does > > have* a "unique constraint matching given keys", admittedly only > > partial. > > > > > > You are making the common error of confusing the distinct concepts of > > constraints and indexs. Table constraints cannot be partial by > > definition, and are a logical concept constraining the data model. > > Hmmm.. > > This does not match "my reality". Naturally I may be wrong, but the > example I've posted reflects my actual data I'm putting into the RDBMS. > That is: > 1. the data has unique constraint on (load,a,b,c) > 2. and the data have additional unique constraints on (load,a), provided > c is true, and (load,b) whenever c is false. > > > I probably generalized too much, especially since exclusion constrains > can be partial, so I'll be specific here. Foreign Keys don't have where > clauses and thus cannot target partial constraints. If you want to > overcome that limitation you can write a custom trigger. Actually. It looks like I'm a really really slow learner :( Only now I realized how should I code the scenario in question. For those interested, I'm currently implementing it like this: Instead of bool column "C", I'm putting there a column, which will keep a *copy* of value from column A or B depending on "traditional value" (true/false) of earlier column C. Now I can have a plain ordinary unique index over (load,C), and have it as FK target for other tables. win-win. I haven't realized it for years (this is how long the design stays with me to this day). > > I'm sure at least some of this is simply due to desirability as opposed > to some fundamental limitation, but in the end that is how the system > works today. Integer-table FK relationships are defined over the entire > PK table, not a subset. In my simplistic view of postgresql internal, I was thinking, that the engine (e.g. the set of internal triggers maintaining all FK consistency as layed down by application schema), having a row of data (freshly inserted or updated) just looks up an index it has associated with that particilar FK, and uses it to see if the other end "is comliant", or "has to change, too", or whatever else. So I was thinking, that when FK has an index to use, it shouldn't matter if it's complete or partial. I was thinking, that when "add constraint" cannot choose appropriate index, may be some explicit help (like ... using <index_name>;) would be due. But form the above explanation I fear that there is significantly more to the full picture than I though. I only king of hoped those "fundamental limitations" would be something I could grasp. Anyway, although indireclty, this thread brought me a solution. This is good. thenx, -R
W dniu 05.07.2018 o 10:11, Rob Sargent pisze: [---------------] >> >> Pls consider in real life: load (a person), can have either a (a kind of >> brest cancer); or b (a kind of prostrate) - this is only a cooked >> example attemping to illustrate, that one may need to put additional >> constraints on the entire dataset. >> > > It’s difficult enough to define a unique person (without mother and father) and certainly this weeks definition of burdenis not likely to help matters. If you’re main worry is data consistency you might be better off normalizing your structure- either with separate tables per cancer type (person id, cancer specifics; unique on person) or in a single tableone per line (person id, cancer type, cancer description; unique on person). You can reconstitue person,breast,prostatefrom either of those. We won’t quibble on one person having both (though remotely possible, men doget breast cancer). > No, no. This was just cooked example, my reality (messaged between parties, whose identity is stored in columns A and B, and column C keeping a flag indicating who was the sender). I've just used gender example to limit the discussion of whether A can be linked to something, or if B cannot be linked to something else. It cannot, the constraints I described are as real, as prostrate cancer never happening to a women. I tried to eliminate from the discussion all but mathematical relations between values of those columns. Which are: (load,a,b,c) is unique (load,a,true) is unique (load,b,false) is unique Nothing else matters. And I need FK to such dataset. Partial. There is one table with data only referencing (load,a, true)... and I'd prefere that table NOT NEED to have the spurous column (c) containing value "true" for every row in that table. Same goes for (load,b,false). Now it occured to me, I can do: create table (load text, a int, b int, c int); instead of... create table (load text, a int, b int, c bool); With the new layout I'll just do: insert (load,a,b,c) values (<load>, <a>, <b>, <a>); for the cases of "true", and... insert (load,a,b,c) values (<load>, <a>, <b>, <b>); for other cases (load,c) will be unique over the entire dataset. Now I can FK to (a,c) from whenever I want. A computed column (d bool) = (a = c) could be helpfull, but I don't think it'll be indispensable. Sorry to mislead you. -R
I was thinking, that when "add constraint" cannot choose appropriate
index, may be some explicit help (like ... using <index_name>;) would be
due.
Basically all the FK trigger does is:
SELECT EXISTS(SELECT 1 FROM pk_table WHERE pk_col1 = val1 AND pk_col2 = val2)
And fails if query returns false. The planner is still free to use the index or not to execute the query just as if you had written it by hand. For a small table fully in memory it likely would prefer a sequential scan and it would be perfectly within its rights to do so.
David J.
W dniu 05.07.2018 o 23:04, David G. Johnston pisze: > On Thu, Jul 5, 2018 at 1:45 PM, Rafal Pietrak <rafal@ztk-rp.eu > <mailto:rafal@ztk-rp.eu>>wrote: > > I was thinking, that when "add constraint" cannot choose appropriate > index, may be some explicit help (like ... using <index_name>;) would be > due. > > > Basically all the FK trigger does is: > > SELECT EXISTS(SELECT 1 FROM pk_table WHERE pk_col1 = val1 AND pk_col2 = > val2) > > And fails if query returns false. The planner is still free to use the > index or not to execute the query just as if you had written it by > hand. For a small table fully in memory it likely would prefer a > sequential scan and it would be perfectly within its rights to do so. > I must assume, that the above "WHERE pk_col1 - val1..." is not the same place as the one you've mentioned earlier, where "FK don't have where clausures". Thus, the bulk of code that maintains FK consistency "does not use where clausures", would break on partial indexes. But in the above SELECT, with partial indexed, that select will never fail. So it is not the blocker... on the other hand, I would rather rewrite it along the lines of: SELECT 1 = (SELECT count(1) FROM pk_table WHERE pk_col1 = val1 AND pk_col2 = val2) Naturally, if the planner choses to do a seq scan, everything would break down - inappropriate rows would get hit; but that's a different story: 1. one may think, doing a "SELECT * FROM pk_table WHERE... INTO current" and making all constraint trigger functions use that "current" instead of making more lookups, could solve the problem. 2. or with some syntax help during FK creation (aka: ... add constraint ... using <index_name>) one could force the planner to always use indicated index. Still, whatever way to go, it is well beyond my level. -R
W dniu 05.07.2018 o 23:04, David G. Johnston pisze:
> On Thu, Jul 5, 2018 at 1:45 PM, Rafal Pietrak <rafal@ztk-rp.eu
> <mailto:rafal@ztk-rp.eu>>wrote:
>
> I was thinking, that when "add constraint" cannot choose appropriate
> index, may be some explicit help (like ... using <index_name>;) would be
> due.
>
>
> Basically all the FK trigger does is:
>
> SELECT EXISTS(SELECT 1 FROM pk_table WHERE pk_col1 = val1 AND pk_col2 =
> val2)
>
> And fails if query returns false. The planner is still free to use the
> index or not to execute the query just as if you had written it by
> hand. For a small table fully in memory it likely would prefer a
> sequential scan and it would be perfectly within its rights to do so.
>
I must assume, that the above "WHERE pk_col1 - val1..." is not the same
place as the one you've mentioned earlier, where "FK don't have where
clausures".
The FK definition doesn't have a WHERE clause so the only (and all) columns used in the trigger are those defined by the constraint itself. So, if a partial index for the above was:
UNIQUE INDEX (pk_col1, pk_col2) WHERE pktblcol3 = false;
and
FK (col1, col2) REFERENCES pk_table (pk_col1, pk_col2)
There is no place on the FK to reference "pktblcol3" so that the effective trigger query would become:
WHERE pk_col1 = val2 AND pk_col2 = val2 AND pktblcol3 = ???
And without pktblcol3 more than one row could be returned (so, not really EXISTS...)
Thus, the bulk of code that maintains FK consistency "does not use where
clausures", would break on partial indexes. But in the above SELECT,
with partial indexed, that select will never fail. So it is not the
blocker... on the other hand, I would rather rewrite it along the lines of:
SELECT 1 = (SELECT count(1) FROM pk_table WHERE pk_col1 = val1 AND
pk_col2 = val2)
Yeah, that's closer to reality
Naturally, if the planner choses to do a seq scan, everything would
break down - inappropriate rows would get hit; but that's a different story:
1. one may think, doing a "SELECT * FROM pk_table WHERE... INTO current"
and making all constraint trigger functions use that "current" instead
of making more lookups, could solve the problem.
2. or with some syntax help during FK creation (aka: ... add constraint
... using <index_name>) one could force the planner to always use
indicated index.
This comes back to constraints don't directly target indexes even though indexes are used in implementation (it would be violation of scope). They target other constraints at a logical level. Whether the code is written exactly like that without evidence to the contrary it provides a sound mental model to operate from.
I think we might get away from this for INSERT ON CONFLICT but I'm not that well versed nor have time to look into it right now.
David J.