Обсуждение: Inaccurate description of UNION/CASE/etc type selection

Поиск
Список
Период
Сортировка

Inaccurate description of UNION/CASE/etc type selection

От
Tom Lane
Дата:
We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1].  The documentation about this [2] says
that the relevant resolution rules are:

  5. Choose the first non-unknown input type which is a preferred type in
  that category, if there is one.

  6. Otherwise, choose the last non-unknown input type that allows all the
  preceding non-unknown inputs to be implicitly converted to it.  (There
  always is such a type, since at least the first type in the list must
  satisfy this condition.)

But what select_common_type() actually does is:

            else if (!pispreferred &&
                     can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
                     !can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))
            {
                /*
                 * take new type if can coerce to it implicitly but not the
                 * other way; but if we have a preferred type, stay on it.
                 */
                pexpr = nexpr;
                ptype = ntype;
                pcategory = ncategory;
                pispreferred = nispreferred;
            }

(ptype is the currently selected common type, ntype is the next
input type to consider, and we've already eliminated cases involving
UNKNOWN.)

In the reported case, we have ptype = "name", ntype = "text", and there
are implicit coercions in both directions so we stay with "name" even
though it's not preferred.

So, the step-5 claim that we always choose a preferred type over other
types is just wrong.  Step 6 is much short of truthful as well, since
it fails to describe the check about coercion in the other direction.
(Also, we're not really checking that *every* earlier argument can be
promoted to ntype, only the currently best one.  Typically, if there's
an implicit coercion from A to B and also one from B to C, there'd be
one from A to C too; but there are lots of counterexamples.)

Now, this code is old enough to vote, so I think changing its behavior
is probably a really bad idea.  I did experiment with giving preferred
types fractionally more preference, like this:

            else if (can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
                     (nispreferred > pispreferred ||
                      (!pispreferred &&
                       !can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))))

but this broke a couple of regression test cases, so I'm sure it'd
break real-world queries too.  So I think we need to leave the code
alone and fix the docs to describe it more accurately.

However, I'm having a hard time coming up with wording that describes
this accurately without being a verbatim statement of the algorithm.
(I see that I already made one attempt at improving the description,
back in 07daff63c, but it's clearly still not good enough.)

Any ideas?

            regards, tom lane

[1] https://www.postgresql.org/message-id/CAOwYNKYfKPfAL4rgP0AO_w0Mn7h8yiXd_Qi9swPdAc4CAUXeAQ%40mail.gmail.com
[2] https://www.postgresql.org/docs/current/typeconv-union-case.html



Re: Inaccurate description of UNION/CASE/etc type selection

От
"David G. Johnston"
Дата:
On Sun, Aug 16, 2020 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
We had a question about why an ARRAY[] construct was resolving the
array's type strangely [1]. 

In this specific complaint it strikes me as undesirable to have an lossy implicit cast from text to name - and if that wasn't present then text would have been chosen as expected.
 
The documentation about this [2] says
that the relevant resolution rules are:

  5. Choose the first non-unknown input type which is a preferred type in
  that category, if there is one.

  6. Otherwise, choose the last non-unknown input type that allows all the
  preceding non-unknown inputs to be implicitly converted to it.  (There
  always is such a type, since at least the first type in the list must
  satisfy this condition.)

But what select_common_type() actually does is:

            else if (!pispreferred &&
                     can_coerce_type(1, &ptype, &ntype, COERCION_IMPLICIT) &&
                     !can_coerce_type(1, &ntype, &ptype, COERCION_IMPLICIT))
            {
                /*
                 * take new type if can coerce to it implicitly but not the
                 * other way; but if we have a preferred type, stay on it.
                 */
                pexpr = nexpr;
                ptype = ntype;
                pcategory = ncategory;
                pispreferred = nispreferred;
            }

(ptype is the currently selected common type, ntype is the next
input type to consider, and we've already eliminated cases involving
UNKNOWN.)

Seems like 5 and 6 need to be merged - the chosen type is the first one that all subsequent types can be implicitly cast to.  We do not guarantee that previous types will be implicitly convertible to this type.

In pseudo-code:
else if (can_coerce(n->p)) continue /* matches when pispreferred */
else if (can_coerce(p->n)) "change chosen type"
else continue /* but now we expect a runtime implicit cast not found error */

To go along with the above simplification:

  /* move on to next one if no new information... */
if (ntype != UNKNOWNOID && ntype != ptype) {
should be written positively as
if (ntype == UNKNOWNOID || ntype == ptype)
    continue;

Random thought, once we get to the end of the loop why not conditionally go over the list a second time so the elements prior to the selected value's type are re-considered in lieu of the new choice?  That at least clears up the documentation to state that all encountered types are compared against the chosen type for implicit casting instead of just those appearing after the one we settled upon. Maybe on the second pass we don't actually change the chosen type but instead provide for a better error message in the case of a missing implicit type cast.

I do agree that while this doesn't feel like an ideal algorithm it seems undesirable to change the implementation - or at least its behavior in non-error situations.  Given the lack of complaints here error handling improvements don't seem like a win either.

David J.

Re: Inaccurate description of UNION/CASE/etc type selection

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Sun, Aug 16, 2020 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We had a question about why an ARRAY[] construct was resolving the
>> array's type strangely [1].

> In this specific complaint it strikes me as undesirable to have an lossy
> implicit cast from text to name - and if that wasn't present then text
> would have been chosen as expected.

Yeah, in an ideal world maybe, but I don't see us removing that
implicit cast either --- I expect that'd also break a lot of queries.
[ experiments ... actually it might not be that bad ]  But in any case,
that's not very relevant to the documentation problem.

> Seems like 5 and 6 need to be merged - the chosen type is the first one
> that all subsequent types can be implicitly cast to.  We do not guarantee
> that previous types will be implicitly convertible to this type.
> In pseudo-code:
> else if (can_coerce(n->p)) continue /* matches when pispreferred */
> else if (can_coerce(p->n)) "change chosen type"
> else continue /* but now we expect a runtime implicit cast not found error
> */

This formulation fails to account for the pispreferred check, though.
The pseudo-code would be correct if you made the first line be

else if (pispreferred || can_coerce(n->p)) continue

but then it doesn't map neatly to a description that fails to mention
preferred-ness.  (Note that this would be simpler if we could assume
that pispreferred implies that there is a cast from every other category
member type to p.  But there are counterexamples.)

            regards, tom lane



Re: Inaccurate description of UNION/CASE/etc type selection

От
"David G. Johnston"
Дата:
On Sun, Aug 16, 2020 at 5:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Sun, Aug 16, 2020 at 2:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Seems like 5 and 6 need to be merged - the chosen type is the first one
> that all subsequent types can be implicitly cast to.  We do not guarantee
> that previous types will be implicitly convertible to this type.
> In pseudo-code:
> else if (can_coerce(n->p)) continue /* matches when pispreferred */
> else if (can_coerce(p->n)) "change chosen type"
> else continue /* but now we expect a runtime implicit cast not found error
> */

This formulation fails to account for the pispreferred check, though.
The pseudo-code would be correct if you made the first line be

else if (pispreferred || can_coerce(n->p)) continue

but then it doesn't map neatly to a description that fails to mention
preferred-ness.  (Note that this would be simpler if we could assume
that pispreferred implies that there is a cast from every other category
member type to p.  But there are counterexamples.)

I was making that assumption.

In the pseudo-code:

else if (pispreferred) continue; /* never move away from a preferred type */
{the rest of the else ifs from above}

In the docs:

5. If the first non-unknown type is a preferred type it is chosen, otherwise it is made a candidate, and then,
6. each subsequent type is compared to the current candidate, with a new candidate being chosen only when there exists a non-mutal implicit cast to the new type.
6a. If at any point a preferred type is made a candidate then it will be chosen.

David J.

Re: Inaccurate description of UNION/CASE/etc type selection

От
"David G. Johnston"
Дата:
On Sun, Aug 16, 2020 at 5:58 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
5. If the first non-unknown type is a preferred type it is chosen, otherwise it is made a candidate, and then,
6. each subsequent type is compared to the current candidate, with a new candidate being chosen only when there exists a non-mutal implicit cast to the new type.
6a. If at any point a preferred type is made a candidate then it will be chosen.

More concisely:

Make the first input type a candidate type.  Each subsequent input type is compared to the current candidate, with a new candidate being chosen only when there exists a non-mutal implicit cast to the new type.  If at any point a preferred type is made a candidate then it will be chosen.

In my original the whole if/otherwise in 5 is pointless given the presence of 6a.

David J.

Re: Inaccurate description of UNION/CASE/etc type selection

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> More concisely:

> Make the first input type a candidate type.  Each subsequent input type is
> compared to the current candidate, with a new candidate being chosen only
> when there exists a non-mutal implicit cast to the new type.  If at any
> point a preferred type is made a candidate then it will be chosen.

So this is just a verbatim statement of the algorithm, which is what
I was hoping to avoid :-(.  But maybe there's no simpler way.

            regards, tom lane



Re: Inaccurate description of UNION/CASE/etc type selection

От
"David G. Johnston"
Дата:
On Mon, Aug 17, 2020 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> More concisely:

> Make the first input type a candidate type.  Each subsequent input type is
> compared to the current candidate, with a new candidate being chosen only
> when there exists a non-mutal implicit cast to the new type.  If at any
> point a preferred type is made a candidate then it will be chosen.

So this is just a verbatim statement of the algorithm, which is what
I was hoping to avoid :-(.  But maybe there's no simpler way.


I got nothin'.  The locking onto the preferred type is conditional on one being chosen and there doesn't seem to be any greater principle that emerges from the pairwise matching algorithm - at least given that implicit casts are essentially randomly distributed and the algorithm is order-dependent.

David J.


Re: Inaccurate description of UNION/CASE/etc type selection

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Aug 17, 2020 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So this is just a verbatim statement of the algorithm, which is what
>> I was hoping to avoid :-(.  But maybe there's no simpler way.

> I got nothin'.

Yeah, me either.  So here's a proposed patch, fixing a couple other
things:

* Re-reading this, I thought the use of "preferred" in the existing
footnote about domains could be read as meaning that we treat the
base type as a preferred type; so I changed that.

* Something that's been true for a very long time, but never documented,
is that CASE puts its ELSE clause at the front for this purpose.
I figured that if we're trying to tell the full truth we better mention
that.

            regards, tom lane

diff --git a/doc/src/sgml/typeconv.sgml b/doc/src/sgml/typeconv.sgml
index 81dba7dacf..8900d0eb38 100644
--- a/doc/src/sgml/typeconv.sgml
+++ b/doc/src/sgml/typeconv.sgml
@@ -1069,7 +1069,7 @@ domain's base type for all subsequent steps.
     functions, this behavior allows a domain type to be preserved through
     a <literal>UNION</literal> or similar construct, so long as the user is
     careful to ensure that all inputs are implicitly or explicitly of that
-    exact type.  Otherwise the domain's base type will be preferred.
+    exact type.  Otherwise the domain's base type will be used.
    </para>
   </footnote>
 </para>
@@ -1092,24 +1092,29 @@ If the non-unknown inputs are not all of the same type category, fail.

 <step performance="required">
 <para>
-Choose the first non-unknown input type which is a preferred type in
-that category, if there is one.
-</para>
-</step>
-
-<step performance="required">
-<para>
-Otherwise, choose the last non-unknown input type that allows all the
-preceding non-unknown inputs to be implicitly converted to it.  (There
-always is such a type, since at least the first type in the list must
-satisfy this condition.)
+Select the first non-unknown input type as the candidate type,
+then consider each other non-unknown input type, left to right.
+  <footnote>
+   <para>
+    For historical reasons, <literal>CASE</literal> treats
+    its <literal>ELSE</literal> clause (if any) as the <quote>first</quote>
+    input, with the <literal>THEN</literal> clauses(s) considered after
+    that.  In all other cases, <quote>left to right</quote> means the order
+    in which the expressions appear in the query text.
+   </para>
+  </footnote>
+If the candidate type can be implicitly converted to the other type,
+but not vice-versa, select the other type as the new candidate type.
+Then continue considering the remaining inputs.  If, at any stage of this
+process, a preferred type is selected, stop considering additional
+inputs.
 </para>
 </step>

 <step performance="required">
 <para>
-Convert all inputs to the selected type.  Fail if there is not a
-conversion from a given input to the selected type.
+Convert all inputs to the final candidate type.  Fail if there is not an
+implicit conversion from a given input type to the candidate type.
 </para>
 </step>
 </procedure>

Re: Inaccurate description of UNION/CASE/etc type selection

От
"David G. Johnston"
Дата:
On Mon, Aug 17, 2020 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Aug 17, 2020 at 8:31 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So this is just a verbatim statement of the algorithm, which is what
>> I was hoping to avoid :-(.  But maybe there's no simpler way.

> I got nothin'.

Yeah, me either.  So here's a proposed patch, fixing a couple other
things:


LGTM

David J.

Re: Inaccurate description of UNION/CASE/etc type selection

От
Tom Lane
Дата:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Mon, Aug 17, 2020 at 10:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah, me either.  So here's a proposed patch, fixing a couple other
>> things:

> LGTM

Pushed, thanks for review.

            regards, tom lane