Re: SQL:2011 application time

Поиск
Список
Период
Сортировка
От Paul Jungwirth
Тема Re: SQL:2011 application time
Дата
Msg-id 47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com
обсуждение исходный текст
Ответ на Re: SQL:2011 application time  (Peter Eisentraut <peter@eisentraut.org>)
Ответы Re: SQL:2011 application time  (jian he <jian.universality@gmail.com>)
Re: SQL:2011 application time  (Robert Haas <robertmhaas@gmail.com>)
Re: SQL:2011 application time  (Peter Eisentraut <peter@eisentraut.org>)
Список pgsql-hackers
On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches
> v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch 
> (together).

Hi Hackers,

I found some problems with temporal primary keys and the idea of uniqueness, especially around the 
indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need 
but I'd like some feedback on.

The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE.

DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a 
temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion).
This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT 
OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?)

We should still forbid temporally-unique indexes for ON CONFLICT DO UPDATE, since there may be more 
than one row that conflicts. Ideally we would find and update all the conflicting rows, but we don't 
now, and we don't want to update just one:

     postgres=# create table t (id int4range, valid_at daterange, name text, constraint tpk primary 
key (id, valid_at without overlaps));
     CREATE TABLE
     postgres=# insert into t values ('[1,2)', '[2000-01-01,2001-01-01)', 'a'), ('[1,2)', 
'[2001-01-01,2002-01-01)', 'b');
     INSERT 0 2
     postgres=# insert into t values ('[1,2)', '[2000-01-01,2002-01-01)', 'c') on conflict (id, 
valid_at) do update set name = excluded.name;
     INSERT 0 1
     postgres=# select * from t;
       id   |        valid_at         | name
     -------+-------------------------+------
      [1,2) | [2001-01-01,2002-01-01) | b
      [1,2) | [2000-01-01,2001-01-01) | c
     (2 rows)

So I also added code to prevent that. This is just preserving the old behavior for exclusion 
constraints, which was bypassed because of indisunique. All this is in the first patch.

That got me thinking about indisunique and where else it could cause problems. Perhaps there are 
other places that assume only b-trees are unique. I couldn't find anywhere that just gives an error 
like ON CONFLICT, but I can imagine more subtle problems.

A temporal PRIMARY KEY or UNIQUE constraint is unique in at least three ways: It is *metaphorically* 
unique: the conceit is that the scalar part is unique at every moment in time. You may have id 5 in 
your table more than once, as long as the records' application times don't overlap.

And it is *officially* unique: the standard calls these constraints unique. I think it is correct 
for us to report them as unique in pg_index.

But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5, 
'[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique 
index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5, 
'[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5, 
'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that 
key as many times as you like, despite a PK/UQ constraint:

     postgres=# insert into t values
     ('[1,2)', 'empty', 'foo'),
     ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from t;
       id   | valid_at | name
     -------+----------+------
      [1,2) | empty    | foo
      [1,2) | empty    | bar
     (2 rows)

Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful 
value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure 
they don't cause problems.

One place we should avoid temporally-unique indexes is REPLICA IDENTITY. Fortunately we already do 
that, but patch 2 adds a test to keep it that way.

Uniqueness is an important property to the planner, too.

We consider indisunique often for estimates, where it needn't be 100% true. Even if there are 
nullable columns or a non-indimmediate index, it still gives useful stats. Duplicates from 'empty' 
shouldn't cause any new problems there.

In proof code we must be more careful. Patch 3 updates relation_has_unique_index_ext and 
rel_supports_distinctness to disqualify WITHOUT OVERLAPS indexes. Maybe that's more cautious than 
needed, but better safe than sorry. This patch has no new test though. I had trouble writing SQL 
that was wrong before its change. I'd be happy for help here!

Another problem is GROUP BY and functional dependencies. This is wrong:

     postgres=# create table a (id int4range, valid_at daterange, name text, constraint apk primary 
key (id, valid_at without overlaps));
     CREATE TABLE
     postgres=# insert into a values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from a group by id, valid_at;
       id   | valid_at | name
     -------+----------+------
      [1,2) | empty    | foo
     (1 row)

One fix is to return false from check_functional_grouping for WITHOUT OVERLAPS primary keys. But I 
think there is a better fix that is less ad hoc.

We should give temporal primary keys an internal CHECK constraint saying `NOT isempty(valid_at)`. 
The problem is analogous to NULLs in parts of a primary key. NULLs prevent two identical keys from 
ever comparing as equal. And just as a regular primary key cannot contain NULLs, so a temporal 
primary key should not contain empties.

The standard effectively prevents this with PERIODs, because a PERIOD adds a constraint saying start 
< end. But our ranges enforce only start <= end. If you say `int4range(4,4)` you get `empty`. If we 
constrain primary keys as I'm suggesting, then they are literally unique, and indisunique seems safer.

Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm inclined toward no, just as 
we don't forbid NULLs in parts of a UNIQUE key. We should try to pick what gives users more options, 
when possible. Even if it is questionably meaningful, I can see use cases for allowing empty ranges 
in a temporal table. For example it lets you "disable" a row, preserving its values but marking it 
as never true.

Also it gives you a way to make a non-temporal foreign key reference to a temporal table. Normally 
temporal tables are "contagious", which is annoying. But if the referencing table had 'empty' for 
its temporal part, then references should succeed. For example this is true: 'empty'::daterange <@ 
'[2000-01-01,2001-01-01)'. (Technically this would require a small change to our FK SQL, because we 
do `pkperiod && fkperiod` as an optimization (to use the index more fully), and we would need to 
skip that when fkperiod is empty.)

Finally, if we have a not-empty constraint on our primary keys, then the GROUP BY problem above goes 
away. And we can still use temporal primary keys in proofs (but maybe not other temporally-unique 
indexes). We can allow them in relation_has_unique_index_ext/rel_supports_distinctness.

The drawback to putting a CHECK constraint on just PKs and not UNIQUEs is that indisunique may not 
be literally unique for them, if they have empty ranges. But even for traditional UNIQUE 
constraints, indisunique can be misleading: If they have nullable parts, identical keys are still 
"unique", so the code is already careful about them. Do note though the problems come from 'empty' 
values, not nullable values, so there might still be some planner rules we need to correct.

Another drawback is that by using isempty we're limiting temporal PKs to just ranges and 
multiranges, whereas currently any type with appropriate operators is allowed. But since we decided 
to limit FKs already, I think this is okay. We can open it back up again later if we like (e.g. by 
adding a support function for the isempty concept).

I'll start working on a patch for this too, but I'd be happy for early feedback/objections/etc.

I guess an alternative would be to add a new operator, say &&&, that is the same as overlaps, except
'empty' overlaps everything instead of nothing. In a way that seems more consistent with <@. (How 
can a range contain something if it doesn't overlap it?) I don't love that a key like (5, 'empty') 
would conflict with every other 5, but you as I said it's not a meaningful value in a temporal table 
anyway. Or you could have 'empty' overlap nothing except itself. Maybe I prefer this solution to an 
internal CHECK constraint, but it feels like it has more unknown unknowns. Thoughts?

Also I suspect there are still places where indisunique causes problems. I'll keep looking for them, 
but if others have thoughts please let me know.

Patches here are generated against c627d944e6.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Zhilin
Дата:
Сообщение: Re: BUG: deadlock between autovacuum worker and client backend during removal of orphan temp tables with sequences
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Synchronizing slots from primary to standby