Обсуждение: [HACKERS] "cannot specify finite value after UNBOUNDED" ... uh, why?

Поиск
Список
Период
Сортировка

[HACKERS] "cannot specify finite value after UNBOUNDED" ... uh, why?

От
Tom Lane
Дата:
Would someone please defend the restrictions imposed by the
"seen_unbounded" checks in transformPartitionBound
(parse_utilcmd.c:3365..3396 in current HEAD)?  They sure look to me like
nothing but sloppy thinking, and/or protection of downstream sloppy
thinking.  Why should the boundedness of one partitioning column's
range matter to any other partitioning column's range?
        regards, tom lane



Re: [HACKERS] "cannot specify finite value after UNBOUNDED" ... uh,why?

От
Amit Langote
Дата:
On 2017/05/29 16:04, Tom Lane wrote:
> Would someone please defend the restrictions imposed by the
> "seen_unbounded" checks in transformPartitionBound
> (parse_utilcmd.c:3365..3396 in current HEAD)?  They sure look to me like
> nothing but sloppy thinking, and/or protection of downstream sloppy
> thinking.  Why should the boundedness of one partitioning column's
> range matter to any other partitioning column's range?

If an earlier column's value is unbounded per the bound specification of a
range partition, later columns of an input row would never be compared
against their respective partition bounds, because the row's partition key
would trivially have been determined to be greater or less than the
partition's lower or upper bound, respectively, based on the comparison
against aforementioned unbounded column's infinite value.  Note that it's
the row comparison logic at work here.

Having said that, having finite values after an unbounded value doesn't
really affect tuple-routing code per se (which is what the above paragraph
is meant to describe the actions of), but it does affect the code that
creates constraint expression from the relpartbound value of a range
partition, at least because of the way get_qual_for_range() implements it.

If we had allowed them, an incorrect constraint expression would result as
illustrated below; consider a range partition:

create table foo (a int, b int) partition by range (a, b);
create table foo1 partition of foo for values in (unbounded, -1) to
(unbounded, 1);

foo1, as defined above, basically accepts any row with non-null a, if
inserted through foo via tuple-routing, because any non-null value of a is
trivially >= -infinity and < +infinity.

But, get_qual_for_range() would end up returning an expression that looks
like:

(b >= -1) and (b < 1)

which means that (2, 1) would fail to be inserted directly into foo1 due
to the above constraint, even if inserting it through foo would work,
because column b would not be considered in the latter.

Maybe there is a way to rewrite the code that generates the constraint
expression to somehow not emit the above expression, but any new code to
accomplish that might be more complicated than it is now.

See the following message:

https://www.postgresql.org/message-id/CA%2BTgmoYWnV2GMnYLG-Czsix-E1WGAbo4D%2B0tx7t9NdfYBDMFsA%40mail.gmail.com

I tend to agree with what Robert said there in the first part; there is no
point in allowing finite values to be specified after the last UNBOUNDED
in FROM and TO lists, which might mislead the user to believe that they
are actually significant.

Thanks,
Amit




Re: [HACKERS] "cannot specify finite value after UNBOUNDED" ... uh, why?

От
Robert Haas
Дата:
On Mon, May 29, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Would someone please defend the restrictions imposed by the
> "seen_unbounded" checks in transformPartitionBound
> (parse_utilcmd.c:3365..3396 in current HEAD)?  They sure look to me like
> nothing but sloppy thinking, and/or protection of downstream sloppy
> thinking.  Why should the boundedness of one partitioning column's
> range matter to any other partitioning column's range?

Because this is supposed to work more or less like row-comparison --
the earlier columns are strictly more significant than the later ones.
That is, allowing (1, 2) through (3, 4) allows (2, whatever) but (1,
y) only if y >= 2 and (3, y) only if y < 4.

In case you're wondering, this is also how a certain large commercial
database system interprets composite bounds.  You could imagine in
theory a system where a bound from (1, 2) to (3, 4) allows only those
(x, y) where 1<=x<3 and 2<=y<4 but I know of no existing system that
does anything like that.  If you want that sort of thing, you can get
it anyway using two levels of partitioning, one on each column.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] "cannot specify finite value after UNBOUNDED" ... uh, why?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, May 29, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Would someone please defend the restrictions imposed by the
>> "seen_unbounded" checks in transformPartitionBound
>> (parse_utilcmd.c:3365..3396 in current HEAD)?

> Because this is supposed to work more or less like row-comparison --
> the earlier columns are strictly more significant than the later ones.
> That is, allowing (1, 2) through (3, 4) allows (2, whatever) but (1,
> y) only if y >= 2 and (3, y) only if y < 4.

I see.  That makes the logic awfully complex though.  I was looking
at get_qual_for_range() yesterday --- it's mind-bendingly complicated
and I have next to no faith that it's 100% right.

> In case you're wondering, this is also how a certain large commercial
> database system interprets composite bounds.  You could imagine in
> theory a system where a bound from (1, 2) to (3, 4) allows only those
> (x, y) where 1<=x<3 and 2<=y<4 but I know of no existing system that
> does anything like that.  If you want that sort of thing, you can get
> it anyway using two levels of partitioning, one on each column.

Well, if we just treated each column independently, you could get
the row-comparison behavior by partitioning on a ROW() expression.
So that argument doesn't impress me.  I suppose compatibility with
other partitioning implementations is worth something, but I'm not
sure it's worth this much complication and risk of bugs.
        regards, tom lane



Re: [HACKERS] "cannot specify finite value after UNBOUNDED" ... uh, why?

От
Robert Haas
Дата:
On Tue, May 30, 2017 at 11:03 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, May 29, 2017 at 3:04 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Would someone please defend the restrictions imposed by the
>>> "seen_unbounded" checks in transformPartitionBound
>>> (parse_utilcmd.c:3365..3396 in current HEAD)?
>
>> Because this is supposed to work more or less like row-comparison --
>> the earlier columns are strictly more significant than the later ones.
>> That is, allowing (1, 2) through (3, 4) allows (2, whatever) but (1,
>> y) only if y >= 2 and (3, y) only if y < 4.
>
> I see.  That makes the logic awfully complex though.  I was looking
> at get_qual_for_range() yesterday --- it's mind-bendingly complicated
> and I have next to no faith that it's 100% right.

It might be useful if somebody would be willing to put together a
fuzz-tester for it.  Like, randomly generate partition bounds over
various different data types, and then verify that each tuple is
accepted by the partition constraint of only one partition and that
it's the same partition into which tuple routing tries to put it.

>> In case you're wondering, this is also how a certain large commercial
>> database system interprets composite bounds.  You could imagine in
>> theory a system where a bound from (1, 2) to (3, 4) allows only those
>> (x, y) where 1<=x<3 and 2<=y<4 but I know of no existing system that
>> does anything like that.  If you want that sort of thing, you can get
>> it anyway using two levels of partitioning, one on each column.
>
> Well, if we just treated each column independently, you could get
> the row-comparison behavior by partitioning on a ROW() expression.
> So that argument doesn't impress me.  I suppose compatibility with
> other partitioning implementations is worth something, but I'm not
> sure it's worth this much complication and risk of bugs.

I guess you won't be terribly surprised to hear that I think
compatibility with other implementations is quite desirable.  I also
think that Amit and others who have been involved in the discussions
have the same opinion, although of course they can speak for
themselves.  In terms of partitioning on a ROW() expression, I believe
that you would lose the ability to control which opclass and collation
is used for each column, which I think is something that people care
about.  Also, I bet it would be slower.  Also, I bet the syntax would
be less intuitive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company