Обсуждение: Arrays of domain returned to client as non-builtin oid describing thearray, not the base array type's oid

Поиск
Список
Период
Сортировка
I was surprised yesterday in a difference between querying domains as scalars versus domains as arrays. As we're all generally aware, when a domain is queried and projected as a scalar in a result set, it is described over-the-wire as that column having the oid of the domain's base type, NOT the oid of the domain itself. This helps out many clients and their applications, but confuses a few who want to use domains as 'tagged types' to register new client-side type mappings against. Changing that behavior seems to asked every now and then and rejected due to breaking more than it would help. And can be worked around through making a whole new type sharing much of the config as the base type.

But when arrays of the domain are returned to the client, the column is described on the wire with the oid of the domain's array type, instead of the oid of the base type's array type. This seems inconsistent to me, even though it can be worked around in SQL by a cast of either the element type when building the array, or casting the resulting array type.

Example SQL:

create database test;

\c test

create domain required_text text
check (trim(value) = value and length(value) > 0) not null;

create table people
(
name required_text
);

insert into people values ('Joe'), ('Mary'), ('Jane');

And then client-side interaction using python/psycopg2 (sorry, am ignorant of how to get psql itself to show the protocol-level oids):

import psycopg2
con = psycopg2.connect('dbname=test')
cur = con.cursor()

# Scalar behaviours first: a query of the domain or the base type return the base type's oid:
>>> cur.execute('select name from people')
>>> cur.description
(Column(name='name', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None),)
>>> cur.execute('select name::text from people')
>>> cur.description
(Column(name='name', type_code=25, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None),)

Arrays of the base type (forced through explicit cast of either the element or the array):
>>> cur.execute('select array_agg(name::text) from people')
>>> cur.description
(Column(name='array_agg', type_code=1009, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None),)
>>> cur.execute('select array_agg(name)::text[] from people')
>>> cur.description
(Column(name='array_agg', type_code=1009, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None),)

Arrays of the domain, showing the new array type:
cur.execute('select array_agg(name) from people')
>>> cur.description
(Column(name='array_agg', type_code=2392140, display_size=None, internal_size=-1, precision=None, scale=None, null_ok=None),)

Interesting bits from my pg_type -- 2392140 is indeed the oid of the array type for the domain.

test=# select oid, typname, typcategory, typelem from pg_type where typname in ( '_text', '_required_text');
   oid   |    typname     | typcategory | typelem 
---------+----------------+-------------+---------
    1009 | _text          | A           |      25
 2392140 | _required_text | A           | 2392141

So -- do others find this inconsistent, or is it just me and I should work on having psycopg2 be able to learn the type mapping itself if I don't want to do SQL-side casts? I'll argue that if scalar projections erase the domain's oid, then array projections ought to as well.

Thanks!
James

-----
James Robinson
james@jlr-photo.com
http://jlr-photo.com/



On 2019-Jan-02, James Robinson wrote:

> So -- do others find this inconsistent, or is it just me and I should
> work on having psycopg2 be able to learn the type mapping itself if I
> don't want to do SQL-side casts? I'll argue that if scalar projections
> erase the domain's oid, then array projections ought to as well.

Sounds reasonable.

Do you have a link to a previous discussion that rejected changing the
returned OID to that of the domain array?  I want to know what the argument
is, other than backwards compatibility.

Disregarding the size/shape of a patch to change this, I wonder what's
the cost of the change.  I mean, how many clients are going to be broken
if we change it?  And by contrast, how many apps are going to work
better with array-on-domain if we change it?

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Jan-02, James Robinson wrote:
>> So -- do others find this inconsistent, or is it just me and I should
>> work on having psycopg2 be able to learn the type mapping itself if I
>> don't want to do SQL-side casts? I'll argue that if scalar projections
>> erase the domain's oid, then array projections ought to as well.

> Sounds reasonable.

> Do you have a link to a previous discussion that rejected changing the
> returned OID to that of the domain array?  I want to know what the argument
> is, other than backwards compatibility.

TBH I doubt it was ever discussed; I don't recall having thought about
doing that while working on c12d570fa.

> Disregarding the size/shape of a patch to change this, I wonder what's
> the cost of the change.

It could be kind of expensive.  The only way to find out whether an array
is over a domain type is to drill down to the element type and see.  Then
if it is, we'd have to drill down to the domain base type, after which we
could use its typarray field.  So that means at least one additional
syscache lookup each time we determine which type OID to report.

I think there are also corner cases to worry about, in particular what
if the base type lacks a typarray entry?  This would happen at least
for domains over arrays.  We don't have arrays of arrays according to
the type system, but arrays of domains over arrays allow you to kind
of fake that.  I don't see a way to report a valid description of the
data type while still abstracting out the domain in that case.

> I mean, how many clients are going to be broken
> if we change it?

This possibility only came in with v11, so probably there are few if any
use-cases of arrays-of-domains in the wild yet, and few or no clients
with intelligence about it.  I don't think that backwards compatibility
would be a show-stopper argument against changing it, if we could satisfy
ourselves about the above points.

Having said that: in the end, the business of flattening scalar domains
was mainly meant to help simple clients handle simple cases simply.
I'm not sure that array cases fall into that category at all, so I'm
not that excited about adding complexity/cycles for this.

            regards, tom lane