Обсуждение: Virtual generated columns

Поиск
Список
Период
Сортировка

Virtual generated columns

От
Peter Eisentraut
Дата:
Here is a patch set to implement virtual generated columns.

Some history first:  The original development of generated columns was 
discussed in [0].  It started with virtual columns, then added stored 
columns.  Before the release of PG12, it was decided that only stored 
columns were ready, so I cut out virtual columns, and stored generated 
columns shipped with PG12, which is where we are today.

Virtual generated columns are occasionally requested still, and it's a 
bit of unfinished business for me, too, so I started to resurrect it. 
What I did here first was to basically reverse interdiff the patches 
where I cut out virtual generated columns above (this was between 
patches v8 and v9 in [0]) and clean that up and make it work again.

One thing that I needed to decide was how to organize the tests for 
this.  The original patch series had both stored and virtual tests in 
the same test file src/test/regress/sql/generated.sql.  As that file has 
grown, I think it would have been a mess to weave another whole set of 
tests into that.  So instead I figured I'd make two separate test files

     src/test/regress/sql/generated_stored.sql (renamed from current)
     src/test/regress/sql/generated_virtual.sql

and kind of try to keep them aligned, similar to how the various 
collate* tests are handled.  So I put that renaming in as preparatory 
patches.  And there are also some other preparatory cleanup patches that 
I'm including.

The main feature patch (0005 here) generally works but has a number of 
open corner cases that need to be thought about and/or fixed, many of 
which are marked in the code or the tests.  I'll continue working on 
that.  But I wanted to see if I can get some feedback on the test 
structure, so I don't have to keep changing it around later.


[0]: 
https://www.postgresql.org/message-id/flat/b151f851-4019-bdb1-699e-ebab07d2f40a@2ndquadrant.com
Вложения

Re: Virtual generated columns

От
Corey Huinker
Дата:
On Mon, Apr 29, 2024 at 4:24 AM Peter Eisentraut <peter@eisentraut.org> wrote:
Here is a patch set to implement virtual generated columns.

I'm very excited about this!



The main feature patch (0005 here) generally works but has a number of
open corner cases that need to be thought about and/or fixed, many of
which are marked in the code or the tests.  I'll continue working on
that.  But I wanted to see if I can get some feedback on the test
structure, so I don't have to keep changing it around later.

I'd be very interested to see virtual generated columns working, as one of my past customers had a need to reclassify data in a partitioned table, and the ability to detach a partition, alter the virtual generated columns, and re-attach would have been great. In case you care, it was basically an "expired" flag, but the rules for what data "expired" varied by country of customer and level of service.

+ * Stored generated columns cannot work: They are computed after
+ * BEFORE triggers, but partition routing is done before all
+ * triggers.  Maybe virtual generated columns could be made to
+ * work, but then they would need to be handled as an expression
+ * below.

I'd say you nailed it with the test structure. The stored/virtual copy/split is the ideal way to approach this, which makes the diff very easy to understand.

+1 for not handling domain types yet.

 -- generation expression must be immutable
-CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) STORED);
+CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision GENERATED ALWAYS AS (random()) VIRTUAL);

Does a VIRTUAL generated column have to be immutable? I can see where the STORED one has to be, but consider the following:

CREATE TABLE foo (
created_at timestamptz DEFAULT CURRENT_TIMESTAMP,
row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at
);

 -- can't have generated column that is a child of normal column
 CREATE TABLE gtest_normal (a int, b int);
-CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) STORED) INHERITS (gtest_normal);  -- error
+CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS (a * 2) VIRTUAL) INHERITS (gtest_normal);  -- error

This is the barrier to the partitioning reorganization scheme I described above. Is there any hard rule why a child table couldn't have a generated column matching the parent's regular column? I can see where it might prevent indexing that column on the parent table, but is there some other dealbreaker or is this just a "it doesn't work yet" situation?

One last thing to keep in mind is that there are two special case expressions in the spec:

GENERATED ALWAYS AS ROW START
GENERATED ALWAYS AS ROW END

and we'll need to be able to fit those into the catalog. I'll start another thread for that unless you prefer I keep it here.

Re: Virtual generated columns

От
Peter Eisentraut
Дата:
On 29.04.24 10:23, Peter Eisentraut wrote:
> Here is a patch set to implement virtual generated columns.

> The main feature patch (0005 here) generally works but has a number of 
> open corner cases that need to be thought about and/or fixed, many of 
> which are marked in the code or the tests.  I'll continue working on 
> that.  But I wanted to see if I can get some feedback on the test 
> structure, so I don't have to keep changing it around later.

Here is an updated patch set.  It needed some rebasing, especially 
around the reverting of the catalogued not-null constraints.  I have 
also fixed up the various incomplete or "fixme" pieces of code mentioned 
above.  I have in most cases added "not supported yet" error messages 
for now, with the idea that some of these things can be added in later, 
as incremental features.

In particular, quoting from the commit message, the following are 
currently not supported (but could possibly be added as incremental 
features, some easier than others):

- index on virtual column
- expression index using a virtual column
- hence also no unique constraints on virtual columns
- not-null constraints on virtual columns
- (check constraints are supported)
- foreign key constraints on virtual columns
- extended statistics on virtual columns
- ALTER TABLE / SET EXPRESSION
- ALTER TABLE / DROP EXPRESSION
- virtual columns as trigger columns
- virtual column cannot have domain type

So, I think this basically works now, and the things that don't work 
should be appropriately prevented.  So if someone wants to test this and 
tell me what in fact doesn't work correctly, that would be helpful.

Вложения

Re: Virtual generated columns

От
Peter Eisentraut
Дата:
On 29.04.24 20:54, Corey Huinker wrote:
>       -- generation expression must be immutable
>     -CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
>     GENERATED ALWAYS AS (random()) STORED);
>     +CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
>     GENERATED ALWAYS AS (random()) VIRTUAL);
> 
> Does a VIRTUAL generated column have to be immutable? I can see where 
> the STORED one has to be, but consider the following:
> 
>     CREATE TABLE foo (
>     created_at timestamptz DEFAULT CURRENT_TIMESTAMP,
>     row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at
>     );

I have been hesitant about this, but I'm now leaning toward that we 
could allow this.

>       -- can't have generated column that is a child of normal column
>       CREATE TABLE gtest_normal (a int, b int);
>     -CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
>     (a * 2) STORED) INHERITS (gtest_normal);  -- error
>     +CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
>     (a * 2) VIRTUAL) INHERITS (gtest_normal);  -- error
> 
> This is the barrier to the partitioning reorganization scheme I 
> described above. Is there any hard rule why a child table couldn't have 
> a generated column matching the parent's regular column? I can see where 
> it might prevent indexing that column on the parent table, but is there 
> some other dealbreaker or is this just a "it doesn't work yet" situation?

We had a quite a difficult time getting the inheritance business of 
stored generated columns working correctly.  I'm sticking to the 
well-trodden path here.  We can possibly expand this if someone wants to 
work out the details.

> One last thing to keep in mind is that there are two special case 
> expressions in the spec:
> 
>     GENERATED ALWAYS AS ROW START
>     GENERATED ALWAYS AS ROW END
> 
> and we'll need to be able to fit those into the catalog. I'll start 
> another thread for that unless you prefer I keep it here.

I think this is a separate feature.




Re: Virtual generated columns

От
Tomasz Rybak
Дата:
On Wed, 2024-05-22 at 19:22 +0200, Peter Eisentraut wrote:
> On 29.04.24 10:23, Peter Eisentraut wrote:
> > Here is a patch set to implement virtual generated columns.
>
> > The main feature patch (0005 here) generally works but has a number
> > of
> > open corner cases that need to be thought about and/or fixed, many
> > of
> > which are marked in the code or the tests.  I'll continue working
> > on
> > that.  But I wanted to see if I can get some feedback on the test
> > structure, so I don't have to keep changing it around later.
>
> Here is an updated patch set.  It needed some rebasing, especially
> around the reverting of the catalogued not-null constraints.  I have
> also fixed up the various incomplete or "fixme" pieces of code
> mentioned
> above.  I have in most cases added "not supported yet" error messages
> for now, with the idea that some of these things can be added in
> later,
> as incremental features.
>

This is not (yet) full review.

Patches applied cleanly on 76618097a6c027ec603a3dd143f61098e3fb9794
from 2024-06-14.
I've run
./configure && make world-bin && make check && make check-world
on 0001, then 0001+0002, then 0001+0002+0003, up to applying
all 5 patches. All cases passed on Debian unstable on aarch64 (ARM64)
on gcc (Debian 13.2.0-25) 13.2.0.

v1-0001-Rename-regress-test-generated-to-generated_stored.patch:
no objections here, makes sense as preparation for future changes

v1-0002-Put-generated_stored-test-objects-in-a-schema.patch:
also no objections.
OTOH other tests (like publication.out, rowsecurity.out, stats_ext.out,
tablespace.out) are creating schemas and later dropping them - so here
it might also make sense to drop schema at the end of testing.

v1-0003-Remove-useless-initializations.patch:
All other cases (I checked directory src/backend/utils/cache)
calling MemoryContextAllocZero only initialize fields when they
are non-zero, so removing partial initialization with false brings
consistency to the code.

v1-0004-Remove-useless-code.patch:
Patch removes filling in of constraints from function
BuildDescForRelation. This function is only called from file
view.c and tablecmds.c (twice). In none of those cases
result->constr is used, so proposed change makes sense.
While I do not know code well, so might be wrong here,
I would apply this patch.

I haven't looked at the most important (and biggest) file yet,
v1-0005-Virtual-generated-columns.patch; hope to look at it
this week.
At the same I believe 0001-0004 can be applied - even backported
if it'll make maintenance of future changes easier. But that should
be commiter's decision.


Best regards

--
Tomasz Rybak, Debian Developer <serpent@debian.org>
GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C

Вложения

Re: Virtual generated columns

От
jian he
Дата:
On Thu, May 23, 2024 at 1:23 AM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 29.04.24 10:23, Peter Eisentraut wrote:
> > Here is a patch set to implement virtual generated columns.
>
> > The main feature patch (0005 here) generally works but has a number of
> > open corner cases that need to be thought about and/or fixed, many of
> > which are marked in the code or the tests.  I'll continue working on
> > that.  But I wanted to see if I can get some feedback on the test
> > structure, so I don't have to keep changing it around later.

the test structure you made ( generated_stored.sql,
generated_virtual.sq) looks ok to me.
but do we need to reset the search_path at the end of
generated_stored.sql, generated_virtual.sql?

most of the test tables didn't use much storage,
maybe not necessary to clean up (drop the test table) at the end of sql files.

>
> So, I think this basically works now, and the things that don't work
> should be appropriately prevented.  So if someone wants to test this and
> tell me what in fact doesn't work correctly, that would be helpful.


in https://www.postgresql.org/docs/current/catalog-pg-attrdef.html
>>>
The catalog pg_attrdef stores column default values. The main
information about columns is stored in pg_attribute. Only columns for
which a default value has been explicitly set will have an entry here.
>>
didn't mention generated columns related expressions.
Do we need to add something here? maybe a separate issue?


+ /*
+ * TODO: This could be done, but it would need a different implementation:
+ * no rewriting, but still need to recheck any constraints.
+ */
+ if (attTup->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ALTER TABLE / SET EXPRESSION is not supported for virtual
generated columns"),
+ errdetail("Column \"%s\" of relation \"%s\" is a virtual generated column.",
+   colName, RelationGetRelationName(rel))));

minor typo, should be
+ errmsg("ALTER TABLE SET EXPRESSION is not supported for virtual
generated columns"),

insert/update/delete/merge returning have problems:
CREATE TABLE t2 (
a int ,
b int GENERATED ALWAYS AS (a * 2),
d int default 22);
insert into t2(a) select g from generate_series(1,10) g;

insert into t2 select 100 returning *, (select t2.b), t2.b = t2.a * 2;
update t2 set a = 12 returning *, (select t2.b), t2.b = t2.a * 2;
update t2 set a = 12 returning *, (select (select t2.b)), t2.b = t2.a * 2;
delete from t2 where t2.b = t2.a * 2 returning *, 1,((select t2.b));

currently all these query, error message is "unexpected virtual
generated column reference"
we expect above these query work?


issue with merge:
CREATE TABLE t0 (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) VIRTUAL);
insert into t0(a) select g from generate_series(1,10) g;
MERGE INTO t0 t USING t0 AS s ON 2 * t.a = s.b WHEN MATCHED THEN
DELETE returning *;

the above query returns zero rows, but for stored generated columns it
will return 10 rows.

in  transformMergeStmt(ParseState *pstate, MergeStmt *stmt)
add
`qry->hasGeneratedVirtual = pstate->p_hasGeneratedVirtual;`
before
`assign_query_collations(pstate, qry);`
solve the problem.