Re: SQL:2011 application time
От | Paul Jungwirth |
---|---|
Тема | Re: SQL:2011 application time |
Дата | |
Msg-id | 47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com обсуждение исходный текст |
Ответ на | Re: SQL:2011 application time (Peter Eisentraut <peter@eisentraut.org>) |
Ответы |
Re: SQL:2011 application time
(jian he <jian.universality@gmail.com>)
Re: SQL:2011 application time (Robert Haas <robertmhaas@gmail.com>) Re: SQL:2011 application time (Peter Eisentraut <peter@eisentraut.org>) |
Список | pgsql-hackers |
On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches > v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch > (together). Hi Hackers, I found some problems with temporal primary keys and the idea of uniqueness, especially around the indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need but I'd like some feedback on. The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE. DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion). This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?) We should still forbid temporally-unique indexes for ON CONFLICT DO UPDATE, since there may be more than one row that conflicts. Ideally we would find and update all the conflicting rows, but we don't now, and we don't want to update just one: postgres=# create table t (id int4range, valid_at daterange, name text, constraint tpk primary key (id, valid_at without overlaps)); CREATE TABLE postgres=# insert into t values ('[1,2)', '[2000-01-01,2001-01-01)', 'a'), ('[1,2)', '[2001-01-01,2002-01-01)', 'b'); INSERT 0 2 postgres=# insert into t values ('[1,2)', '[2000-01-01,2002-01-01)', 'c') on conflict (id, valid_at) do update set name = excluded.name; INSERT 0 1 postgres=# select * from t; id | valid_at | name -------+-------------------------+------ [1,2) | [2001-01-01,2002-01-01) | b [1,2) | [2000-01-01,2001-01-01) | c (2 rows) So I also added code to prevent that. This is just preserving the old behavior for exclusion constraints, which was bypassed because of indisunique. All this is in the first patch. That got me thinking about indisunique and where else it could cause problems. Perhaps there are other places that assume only b-trees are unique. I couldn't find anywhere that just gives an error like ON CONFLICT, but I can imagine more subtle problems. A temporal PRIMARY KEY or UNIQUE constraint is unique in at least three ways: It is *metaphorically* unique: the conceit is that the scalar part is unique at every moment in time. You may have id 5 in your table more than once, as long as the records' application times don't overlap. And it is *officially* unique: the standard calls these constraints unique. I think it is correct for us to report them as unique in pg_index. But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5, '[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5, '[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5, 'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that key as many times as you like, despite a PK/UQ constraint: postgres=# insert into t values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar'); INSERT 0 2 postgres=# select * from t; id | valid_at | name -------+----------+------ [1,2) | empty | foo [1,2) | empty | bar (2 rows) Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure they don't cause problems. One place we should avoid temporally-unique indexes is REPLICA IDENTITY. Fortunately we already do that, but patch 2 adds a test to keep it that way. Uniqueness is an important property to the planner, too. We consider indisunique often for estimates, where it needn't be 100% true. Even if there are nullable columns or a non-indimmediate index, it still gives useful stats. Duplicates from 'empty' shouldn't cause any new problems there. In proof code we must be more careful. Patch 3 updates relation_has_unique_index_ext and rel_supports_distinctness to disqualify WITHOUT OVERLAPS indexes. Maybe that's more cautious than needed, but better safe than sorry. This patch has no new test though. I had trouble writing SQL that was wrong before its change. I'd be happy for help here! Another problem is GROUP BY and functional dependencies. This is wrong: postgres=# create table a (id int4range, valid_at daterange, name text, constraint apk primary key (id, valid_at without overlaps)); CREATE TABLE postgres=# insert into a values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar'); INSERT 0 2 postgres=# select * from a group by id, valid_at; id | valid_at | name -------+----------+------ [1,2) | empty | foo (1 row) One fix is to return false from check_functional_grouping for WITHOUT OVERLAPS primary keys. But I think there is a better fix that is less ad hoc. We should give temporal primary keys an internal CHECK constraint saying `NOT isempty(valid_at)`. The problem is analogous to NULLs in parts of a primary key. NULLs prevent two identical keys from ever comparing as equal. And just as a regular primary key cannot contain NULLs, so a temporal primary key should not contain empties. The standard effectively prevents this with PERIODs, because a PERIOD adds a constraint saying start < end. But our ranges enforce only start <= end. If you say `int4range(4,4)` you get `empty`. If we constrain primary keys as I'm suggesting, then they are literally unique, and indisunique seems safer. Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm inclined toward no, just as we don't forbid NULLs in parts of a UNIQUE key. We should try to pick what gives users more options, when possible. Even if it is questionably meaningful, I can see use cases for allowing empty ranges in a temporal table. For example it lets you "disable" a row, preserving its values but marking it as never true. Also it gives you a way to make a non-temporal foreign key reference to a temporal table. Normally temporal tables are "contagious", which is annoying. But if the referencing table had 'empty' for its temporal part, then references should succeed. For example this is true: 'empty'::daterange <@ '[2000-01-01,2001-01-01)'. (Technically this would require a small change to our FK SQL, because we do `pkperiod && fkperiod` as an optimization (to use the index more fully), and we would need to skip that when fkperiod is empty.) Finally, if we have a not-empty constraint on our primary keys, then the GROUP BY problem above goes away. And we can still use temporal primary keys in proofs (but maybe not other temporally-unique indexes). We can allow them in relation_has_unique_index_ext/rel_supports_distinctness. The drawback to putting a CHECK constraint on just PKs and not UNIQUEs is that indisunique may not be literally unique for them, if they have empty ranges. But even for traditional UNIQUE constraints, indisunique can be misleading: If they have nullable parts, identical keys are still "unique", so the code is already careful about them. Do note though the problems come from 'empty' values, not nullable values, so there might still be some planner rules we need to correct. Another drawback is that by using isempty we're limiting temporal PKs to just ranges and multiranges, whereas currently any type with appropriate operators is allowed. But since we decided to limit FKs already, I think this is okay. We can open it back up again later if we like (e.g. by adding a support function for the isempty concept). I'll start working on a patch for this too, but I'd be happy for early feedback/objections/etc. I guess an alternative would be to add a new operator, say &&&, that is the same as overlaps, except 'empty' overlaps everything instead of nothing. In a way that seems more consistent with <@. (How can a range contain something if it doesn't overlap it?) I don't love that a key like (5, 'empty') would conflict with every other 5, but you as I said it's not a meaningful value in a temporal table anyway. Or you could have 'empty' overlap nothing except itself. Maybe I prefer this solution to an internal CHECK constraint, but it feels like it has more unknown unknowns. Thoughts? Also I suspect there are still places where indisunique causes problems. I'll keep looking for them, but if others have thoughts please let me know. Patches here are generated against c627d944e6. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Вложения
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Michael ZhilinДата:
Сообщение: Re: BUG: deadlock between autovacuum worker and client backend during removal of orphan temp tables with sequences