Обсуждение: Transition relations: correlating OLD TABLE and NEW TABLE

Поиск
Список
Период
Сортировка

Transition relations: correlating OLD TABLE and NEW TABLE

От
Brent Kerby
Дата:
In a situation where we're using transition relations to capture changes after UPDATE statements, i.e., using a trigger of the form

CREATE TRIGGER trigger_name AFTER UPDATE ON table_name
REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table
FOR EACH STATEMENT EXECUTE PROCEDURE procedure();

there seems to be a challenge in how to join `old_table` and `new_table` so that the old and new version of each row can be matched up with each other. Of course if the table has a primary key, then this can be used, but I'm wondering how to handle this in the general case where a primary key might not exist.

According to this blog (http://databasedoings.blogspot.com/2017/07/cool-stuff-in-postgresql-10-transition.html) it is possible to use ROW_NUMBER() OVER () to create a key to join the old and new tables, but this depends on an implementation detail (or at least, isn't documented?) that the rows will be returned in the same order for the two tables. Is it correct that this will work under the existing implementation? 

If there's not a clean way of matching up the old and new versions, could the transition relation mechanism be extended in order to make this possible? Here's a couple ideas:

1) A special system column could be added to the two transition relations, OLD TABLE, and NEW TABLE, providing a common value that could be used to join corresponding rows; it could be a sequential value (like what ROW_NUMBER() would generate), or it could be some other unique identifier for the row that is convenient for implementation. But there's some awkwardness in the fact that this special column name could clash with the columns in the table (unless an existing reserved name is used). Also, exposing a unique row identifier might restrict potential future implementations. 

2) Maybe a cleaner way would be to add a third kind of transition table, say, a "CHANGE TABLE", which could be used as an alternative to "OLD TABLE" and "NEW TABLE". A change table could contain just two columns, say 'old_row' and 'new_row', each of which have the appropriate record type. In this way, the old table and new table are essentially "pre-joined" in the transition table.

Would this be workable? Or is there some other way of achieving this? 

- Brent Kerby

Re: Transition relations: correlating OLD TABLE and NEW TABLE

От
"David G. Johnston"
Дата:
On Friday, July 6, 2018, Brent Kerby <blkerby@gmail.com> wrote:
Of course if the table has a primary key, then this can be used, but I'm wondering how to handle this in the general case where a primary key might not exist.

Personally, I would consider the lack of a PK a rare and special case...I'd handle the proposed situation by adding a big serial column to the table.

David J.

Re: Transition relations: correlating OLD TABLE and NEW TABLE

От
Brent Kerby
Дата:
This is a possible workaround. But even if a table has a primary key, it seems like there's some inefficiency in doing things this way: the old and new row versions start out linked together (for instance this information is available in a FOR EACH ROW trigger), but we're throwing away that information by splitting them into two separate relations, forcing us to have to join them back up again. Wouldn't it make sense to expose a transition relation where the correspondence between old and new versions is never discarded in the first place? Or is there something I'm missing?

Also, there are cases where it may not be desired to have a primary key, as the index maintenance and constraint checking are not free and not always necessary. And if one wants to implement a general change data capture setup, it would nice to be able accomodate such tables without having to alter them.

I'd be happy to try to work out an implementation of REFERENCING CHANGE TABLE if there's support for the idea. Or is there some problem with this, or some better way of achieving the goal?

- Brent

On Fri, Jul 6, 2018 at 9:36 PM, David G. Johnston <david.g.johnston@gmail.com> wrote:
On Friday, July 6, 2018, Brent Kerby <blkerby@gmail.com> wrote:
Of course if the table has a primary key, then this can be used, but I'm wondering how to handle this in the general case where a primary key might not exist.

Personally, I would consider the lack of a PK a rare and special case...I'd handle the proposed situation by adding a big serial column to the table.

David J.


Re: Transition relations: correlating OLD TABLE and NEW TABLE

От
"David G. Johnston"
Дата:
On Saturday, July 7, 2018, Brent Kerby <blkerby@gmail.com> wrote:
Also, there are cases where it may not be desired to have a primary key, as the index maintenance and constraint checking are not free and not always necessary.

Btree uniqueness enforcement is worth the price.
 
I'd be happy to try to work out an implementation of REFERENCING CHANGE TABLE if there's support for the idea. Or is there some problem with this, or some better way of achieving the goal?

I do see the value in basically saying, "I have a unique index but I want to write a generic function that can handle being installed on any table and, without dynamic sql, can be presented with a full outer join relation of all inserts, updates, and deletes."  Natural join won't work.  We can limit the feature to just transition tables or create a new join type that would require left and right to be the same relation and the syste, would figure out and join on the PK columns (or planner error if there are none).

Keep in mind that part of this discussion involves deciding where we are OK with being non-standard.

David J.