Обсуждение: Re: About #13489, array dimensions and CREATE TABLE ... LIKE

Поиск
Список
Период
Сортировка

Re: About #13489, array dimensions and CREATE TABLE ... LIKE

От
Bruce Momjian
Дата:
On Fri, Sep  8, 2023 at 05:10:51PM -0400, Bruce Momjian wrote:
> I knew we only considered the array dimension sizes to be documentation
> _in_ the query, but I thought we at least properly displayed the number
> of dimensions specified at creation when we described the table in psql,
> but it seems we don't do that either.
> 
> A big question is why we even bother to record the dimensions in
> pg_attribute if is not accurate for LIKE and not displayed to the user
> in a meaningful way by psql.
> 
> I think another big question is whether the structure we are using to
> supply the column information to BuildDescForRelation is optimal.  The
> typmod that has to be found for CREATE TABLE uses:
> 
>         typenameTypeIdAndMod(NULL, entry->typeName, &atttypid, &atttypmod);
> 
> which calls typenameTypeIdAndMod() -> typenameType() -> LookupTypeName()
> -> LookupTypeNameExtended() -> typenameTypeMod().  This seems very
> complicated because the ColumnDef, at least in the LIKE case,  already
> has the value.  Is there a need to revisit how we handle type such
> cases?

(Bug report moved to hackers, previous bug reporters added CCs.)

I looked at this some more and found more fundamental problems.  We have
pg_attribute.attndims which does record the array dimensionality:

    CREATE TABLE test (data integer, data_array integer[5][5]);

    SELECT attndims
    FROM pg_attribute
    WHERE attrelid = 'test'::regclass AND
          attname = 'data_array';
     attndims
    ----------
            2

The first new problem I found is that we don't dump the dimensionality:

    $ pg_dump test
    ...
    CREATE TABLE public.test (
        data integer,
-->        data_array integer[]
    );

and psql doesn't display the dimensionality:

    \d test
                       Table "public.test"
       Column   |   Type    | Collation | Nullable | Default
    ------------+-----------+-----------+----------+---------
     data       | integer   |           |          |
-->     data_array | integer[] |           |          |

A report from 2015 reports that CREATE TABLE ... LIKE and CREATE TABLE
... AS doesn't propagate the dimensionality:

    https://www.postgresql.org/message-id/flat/20150707072942.1186.98151%40wrigleys.postgresql.org

and this thread from 2018 supplied a fix:

    https://www.postgresql.org/message-id/flat/7862e882-8b9a-0c8e-4a38-40ad374d3634%40brandwatch.com

though in my testing it only fixes LIKE, not CREATE TABLE ... AS.  This
report from April of this year also complains about LIKE:

    https://www.postgresql.org/message-id/flat/ZD%2B14YZ4IUue8Rhi%40gendo.asyd.net

Here is the output from master for LIKE:

    CREATE TABLE test2 (LIKE test);

    SELECT attndims
    FROM pg_attribute
    WHERE attrelid = 'test2'::regclass AND
         attname = 'data_array';
     attndims
    ----------
-->            0

and this is the output for CREATE TABLE ... AS:

    CREATE TABLE test3 AS SELECT * FROM test;
    
    SELECT attndims
    FROM pg_attribute
    WHERE attrelid = 'test3'::regclass AND
          attname = 'data_array';
     attndims
    ----------
-->            0

The attached patch fixes pg_dump:

    $ pg_dump test
    ...
    CREATE TABLE public.test2 (
        data integer,
-->        data_array integer[][]
    );

It uses repeat() at the SQL level rather then modifying format_type() at
the SQL or C level.  It seems format_type() is mostly used to get the
type name, e.g. int4[], rather than the column definition so I added
brackets at the call site.  I used a similar fix for psql output:

    \d test
                        Table "public.test"
       Column   |    Type     | Collation | Nullable | Default
    ------------+-------------+-----------+----------+---------
     data       | integer     |           |          |
-->     data_array | integer[][] |           |          |


The 2018 patch from Alexey Bashtanov fixes the LIKE case:

    CREATE TABLE test2 (LIKE test);

    \d test2
               Table "public.test2"
       Column   |     Type      | Collation | Nullable | Default
    ------------+-------------+-----------+----------+---------
     data        | integer      |          |         |
-->     data_array | integer[][] |          |         |

It does not fix CREATE TABLE ... AS because it looks like fixing that
would require adding an ndims column to Var for WITH NO DATA and adding
ndims to TupleDesc for WITH DATA.  I am not sure if that overhead is
warrented to fix this item.  I have added C comments where they should
be added.

I would like to apply this patch to master because I think our current
deficiencies in this area are unacceptable.  An alternate approach would
be to remove pg_attribute.attndims so we don't even try to preserve 
dimensionality.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.

Вложения

Re: About #13489, array dimensions and CREATE TABLE ... LIKE

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> I would like to apply this patch to master because I think our current
> deficiencies in this area are unacceptable.

I do not think this is a particularly good idea, because it creates
the impression in a couple of places that we track this data, when
we do not really do so to any meaningful extent.

> An alternate approach would
> be to remove pg_attribute.attndims so we don't even try to preserve 
> dimensionality.

I could get behind that, perhaps.  It looks like we're not using the
field in any meaningful way, and we could simplify TupleDescInitEntry
and perhaps some other APIs.

            regards, tom lane



Re: About #13489, array dimensions and CREATE TABLE ... LIKE

От
Bruce Momjian
Дата:
On Mon, Nov 20, 2023 at 09:04:21PM -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I would like to apply this patch to master because I think our current
> > deficiencies in this area are unacceptable.
> 
> I do not think this is a particularly good idea, because it creates
> the impression in a couple of places that we track this data, when
> we do not really do so to any meaningful extent.

Okay, I thought we could get by without tracking the CREATE TABLE AS
case, but it is inconsistent.  My patch just makes it less
inconsistent.

> > An alternate approach would
> > be to remove pg_attribute.attndims so we don't even try to preserve 
> > dimensionality.
> 
> I could get behind that, perhaps.  It looks like we're not using the
> field in any meaningful way, and we could simplify TupleDescInitEntry
> and perhaps some other APIs.

So should I work on that patch or do you want to try?  I think we should
do something.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



Re: About #13489, array dimensions and CREATE TABLE ... LIKE

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> On Mon, Nov 20, 2023 at 09:04:21PM -0500, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> An alternate approach would
>>> be to remove pg_attribute.attndims so we don't even try to preserve 
>>> dimensionality.

>> I could get behind that, perhaps.  It looks like we're not using the
>> field in any meaningful way, and we could simplify TupleDescInitEntry
>> and perhaps some other APIs.

> So should I work on that patch or do you want to try?  I think we should
> do something.

Let's wait for some other opinions, first ...

            regards, tom lane



Re: About #13489, array dimensions and CREATE TABLE ... LIKE

От
Laurenz Albe
Дата:
On Mon, 2023-11-20 at 21:13 -0500, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > On Mon, Nov 20, 2023 at 09:04:21PM -0500, Tom Lane wrote:
> > > Bruce Momjian <bruce@momjian.us> writes:
> > > > An alternate approach would
> > > > be to remove pg_attribute.attndims so we don't even try to preserve
> > > > dimensionality.
>
> > > I could get behind that, perhaps.  It looks like we're not using the
> > > field in any meaningful way, and we could simplify TupleDescInitEntry
> > > and perhaps some other APIs.
>
> > So should I work on that patch or do you want to try?  I think we should
> > do something.
>
> Let's wait for some other opinions, first ...

Looking at the code, I get the impression that we wouldn't lose anything
without "pg_attribute.attndims", so +1 for removing it.

This would call for some documentation.  We should remove most of the
documentation about the non-existing difference between declaring a column
"integer[]", "integer[][]" or "integer[3][3]" and just describe the first
variant in detail, perhaps mentioning that the other notations are
accepted for backward compatibility.

I also think that it would be helpful to emphasize that while dimensionality
does not matter to a column definition, it matters for individual array values.
Perhaps it would make sense to recommend a check constraint if one wants
to make sure that an array column should contain only a certain kind of array.

Yours,
Laurenz Albe



Re: About #13489, array dimensions and CREATE TABLE ... LIKE

От
Bruce Momjian
Дата:
On Tue, Nov 21, 2023 at 09:33:18AM +0100, Laurenz Albe wrote:
> On Mon, 2023-11-20 at 21:13 -0500, Tom Lane wrote:
> > Bruce Momjian <bruce@momjian.us> writes:
> > > On Mon, Nov 20, 2023 at 09:04:21PM -0500, Tom Lane wrote:
> > > > Bruce Momjian <bruce@momjian.us> writes:
> > > > > An alternate approach would
> > > > > be to remove pg_attribute.attndims so we don't even try to preserve 
> > > > > dimensionality.
> > 
> > > > I could get behind that, perhaps.  It looks like we're not using the
> > > > field in any meaningful way, and we could simplify TupleDescInitEntry
> > > > and perhaps some other APIs.
> > 
> > > So should I work on that patch or do you want to try?  I think we should
> > > do something.
> > 
> > Let's wait for some other opinions, first ...
> 
> Looking at the code, I get the impression that we wouldn't lose anything
> without "pg_attribute.attndims", so +1 for removing it.
> 
> This would call for some documentation.  We should remove most of the
> documentation about the non-existing difference between declaring a column
> "integer[]", "integer[][]" or "integer[3][3]" and just describe the first
> variant in detail, perhaps mentioning that the other notations are
> accepted for backward compatibility.

Agreed, I see:

    https://www.postgresql.org/docs/current/arrays.html

    However, the current implementation ignores any supplied array
    size limits, i.e., the behavior is the same as for arrays of
    unspecified length.

    The current implementation does not enforce the declared number
    of dimensions either.

So both size limits and dimensions would be ignored.

> I also think that it would be helpful to emphasize that while dimensionality
> does not matter to a column definition, it matters for individual array values.
> Perhaps it would make sense to recommend a check constraint if one wants
> to make sure that an array column should contain only a certain kind of array.

The CHECK constraint idea is very good.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.