Re: SQL:2011 application time

Поиск
Список
Период
Сортировка
От Paul Jungwirth
Тема Re: SQL:2011 application time
Дата
Msg-id a02414cb-3f52-d530-b853-79120568b957@illuminatedcomputing.com
обсуждение исходный текст
Ответ на Re: SQL:2011 application time  (Paul Jungwirth <pj@illuminatedcomputing.com>)
Список pgsql-hackers
Hi Peter et al,

On 9/1/23 12:56, Paul Jungwirth wrote:
>> On 9/1/23 11:30, Peter Eisentraut wrote:
>>> I think the WITHOUT OVERLAPS clause should be per-column, so that 
>>> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
>>> would be possible.  Then the WITHOUT OVERLAPS clause would directly 
>>> correspond to the choice between equality or overlaps operator per 
>>> column.
> I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
> position) is a great recommendation that enables a lot of new 
> functionality.

I've been working on implementing this, but I've come to think it is the 
wrong way to go.

If we support this in primary key and unique constraints, then we must 
also support it for foreign keys and UPDATE/DELETE FOR PORTION OF. But 
implementing that logic is pretty tricky. For example take a foreign key 
on (id, PERIOD valid_at, PERIOD asserted_at). We need to ensure the 
referenced two-dimensional time space `contains` the referencing 
two-dimensional space. You can visualize a rectangle in two-dimensional 
space for each referencing record (which we validate one at a time). The 
referenced records must be aggregated and so form a polygon (of all 
right angles). For example the referencing record may be (1, [0,2), 
[0,2)) with referenced records of (1, [0,2), [0,1)) and (1, [0,1), 
[1,2)). (I'm using intranges since they're easier to read, but you could 
imagine these as dateranges like [2000-01-01,2002-01-01).) Now the 
range_agg of their valid_ats is [0,2) and of their asserted_ats is 
[0,2). But the referenced 2d space still doesn't contain the referencing 
space. It's got one corner missing. This is a well-known problem among 
game developers. We're lucky not to have arbitrary polygons, but it's 
still a tough issue.

Besides `contains` we also need to compute `overlaps` and `intersects` 
to support these temporal features. Implementing that for 2d, 3d, etc 
looks very complicated, for something that is far outside the normal use 
case and also not part of the standard. It will cost a little 
performance for the normal 1d use case too.

I think a better approach (which I want to attempt as an add-on patch, 
not in this main series) is to support not just range types, but any 
type with the necessary operators. Then you could have an mdrange 
(multi-dimensional range) or potentially even an arbitrary n-dimensional 
polygon. (PostGIS has something like this, but its `contains` operator 
compares (non-concave) *bounding boxes*, so it would not work for the 
example above. Still the similarity between temporal and spatial data is 
striking. I'm going to see if I can get some input from PostGIS folks 
about how useful any of this is to them.) This approach would also let 
us use multiranges: not for multiple dimensions, but for non-contiguous 
time spans stored in a single row. This puts the complexity in the types 
themselves (which seems more appropriate) and is ultimately more 
flexible (supporting not just mdrange but also multirange, and other 
things too).

This approach also means that instead of storing a mask/list of which 
columns use WITHOUT OVERLAPS, I can just store one attnum. Again, this 
saves the common use-case from paying a performance penalty to support a 
much rarer one.

I've still got my multi-WITHOUT OVERLAPS work, but I'm going to switch 
gears to what I've described here. Please let me know if you disagree!

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: stopgap fix for signal handling during restore_command
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: interval_ops shall stop using btequalimage (deduplication)