Обсуждение: BUG #17584: SQL crashes PostgreSQL when using ICU collation

Поиск
Список
Период
Сортировка

BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17584
Logged by:          James Inform
Email address:      james.inform@pharmapp.de
PostgreSQL version: 14.5
Operating system:   Linux / Mac
Description:

This bug appears in PG up to 14.5.

I have attached a data file in 7z format. Please extract the "data..csv"
from it and run the following statements in a newly created database on
PostgreSQL 14.x.

-- 1. Create collation and implizit cast
create collation mycol from "de-DE-x-icu";
create cast (varchar as integer) with inout as implicit;

-- 2. Create table
create table testtab (
    a1  text not null collate "mycol",
    b2  text not null collate "mycol",
    c3  text not null collate "mycol",
    d4  varchar(6) null collate "mycol",
    e5  varchar(2) null collate "mycol",
    f6  varchar(3) null collate "mycol",
    g7  varchar(10) null collate "mycol",
    h8  varchar(10) null collate "mycol",
    i9  text null collate "mycol",
    j10  text null collate "mycol",
    k11  varchar(10) null collate "mycol",
    l12  text null collate "mycol",
    m13  varchar(10) null collate "mycol",
    n14  varchar(500) null collate "mycol",
    o15  varchar(80) null collate "mycol",
    p16  varchar(80) null collate "mycol",
    q17  varchar(10) null collate "mycol",
    r18  varchar(60) null collate "mycol",
    s19  varchar(10) null collate "mycol",
    t20  varchar(10) null collate "mycol",
    u21  text null collate "mycol",
    v22  varchar(1) null collate "mycol",
    w23  varchar(250) null collate "mycol",
    x24  varchar(1000) null collate "mycol",
    y25  varchar(1000) null collate "mycol",
    z26  varchar(1000) null collate "mycol",
    a27  varchar(1000) null collate "mycol",
    b28  varchar(1000) null collate "mycol",
    c29  varchar(1000) null collate "mycol",
    d30  varchar(1000) null collate "mycol",
    e31  varchar(1000) null collate "mycol",
    f32  varchar(1000) null collate "mycol",
    g33  int8 null,
    h34  varchar(2) null collate "mycol",
    i35  varchar(2) null collate "mycol",
    j36  varchar(2) null collate "mycol",
    k37  varchar(250) null collate "mycol",
    l38  varchar(250) null collate "mycol",
    m39  varchar(2) null collate "mycol",
    n40  timestamptz null,
    o41  timestamptz null,
    p42  int8 null,
    q43  int8 null,
    r44  int8 not null,
    constraint ak_1 unique (r44),
    constraint pk_1 primary key (a1)
)
;

-- 3. Load data into table
copy testtab from '{PATH_TO_YOUR_FILE}/data.csv' csv header;

-- 4. Create index and analyze table
create index on testtab using btree (i9);
analyze testtab;

-- ERROR
-- 5. This is where the error occurs
-- ERROR:  could not find block containing chunk 0x
select count(*) from
(
    select vou_d.* from
    (
            select
                r44,
                i9,
                j10,
                n14
            from
                testtab
    ) vou_d 
    order by i9 asc, j10 asc
) a
;


Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
"Euler Taveira"
Дата:
On Fri, Aug 12, 2022, at 11:18 AM, PG Bug reporting form wrote:
I have attached a data file in 7z format. Please extract the "data..csv"
from it and run the following statements in a newly created database on
PostgreSQL 14.x.
You forgot to attach the file. You should also provide additional information
about your setup.

* OS information ("Linux" without a distro name and version is too generic);
* Postgres information (pg_config --configure);
* database information (\l output);
* ICU package information (sudo dkpg -l | grep icu) or
  (sudo yum list installed | grep libicu).


--
Euler Taveira

Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
James Inform
Дата:
How can I add an attachment?
The bug reporting form doesn't have an upload function.

My attachemnt is 8MB.
12. August 2022 um 21:25
On Fri, Aug 12, 2022, at 11:18 AM, PG Bug reporting form wrote:
You forgot to attach the file. You should also provide additional information
about your setup.

* OS information ("Linux" without a distro name and version is too generic);
* Postgres information (pg_config --configure);
* database information (\l output);
* ICU package information (sudo dkpg -l | grep icu) or
  (sudo yum list installed | grep libicu).


--
Euler Taveira

12. August 2022 um 16:18
The following bug has been logged on the website:

Bug reference: 17584
Logged by: James Inform
Email address: james.inform@pharmapp.de
PostgreSQL version: 14.5
Operating system: Linux / Mac
Description:

This bug appears in PG up to 14.5.

I have attached a data file in 7z format. Please extract the "data..csv"
from it and run the following statements in a newly created database on
PostgreSQL 14.x.

-- 1. Create collation and implizit cast
create collation mycol from "de-DE-x-icu";
create cast (varchar as integer) with inout as implicit;

-- 2. Create table
create table testtab (
a1 text not null collate "mycol",
b2 text not null collate "mycol",
c3 text not null collate "mycol",
d4 varchar(6) null collate "mycol",
e5 varchar(2) null collate "mycol",
f6 varchar(3) null collate "mycol",
g7 varchar(10) null collate "mycol",
h8 varchar(10) null collate "mycol",
i9 text null collate "mycol",
j10 text null collate "mycol",
k11 varchar(10) null collate "mycol",
l12 text null collate "mycol",
m13 varchar(10) null collate "mycol",
n14 varchar(500) null collate "mycol",
o15 varchar(80) null collate "mycol",
p16 varchar(80) null collate "mycol",
q17 varchar(10) null collate "mycol",
r18 varchar(60) null collate "mycol",
s19 varchar(10) null collate "mycol",
t20 varchar(10) null collate "mycol",
u21 text null collate "mycol",
v22 varchar(1) null collate "mycol",
w23 varchar(250) null collate "mycol",
x24 varchar(1000) null collate "mycol",
y25 varchar(1000) null collate "mycol",
z26 varchar(1000) null collate "mycol",
a27 varchar(1000) null collate "mycol",
b28 varchar(1000) null collate "mycol",
c29 varchar(1000) null collate "mycol",
d30 varchar(1000) null collate "mycol",
e31 varchar(1000) null collate "mycol",
f32 varchar(1000) null collate "mycol",
g33 int8 null,
h34 varchar(2) null collate "mycol",
i35 varchar(2) null collate "mycol",
j36 varchar(2) null collate "mycol",
k37 varchar(250) null collate "mycol",
l38 varchar(250) null collate "mycol",
m39 varchar(2) null collate "mycol",
n40 timestamptz null,
o41 timestamptz null,
p42 int8 null,
q43 int8 null,
r44 int8 not null,
constraint ak_1 unique (r44),
constraint pk_1 primary key (a1)
)
;

-- 3. Load data into table
copy testtab from '{PATH_TO_YOUR_FILE}/data.csv' csv header;

-- 4. Create index and analyze table
create index on testtab using btree (i9);
analyze testtab;

-- ERROR
-- 5. This is where the error occurs
-- ERROR: could not find block containing chunk 0x
select count(*) from
(
select vou_d.* from
(
select
r44,
i9,
j10,
n14
from
testtab
) vou_d
order by i9 asc, j10 asc
) a
;


Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
James Inform <james.inform@pharmapp.de> writes:
> How can I add an attachment?
> The bug reporting form doesn't have an upload function.

Just attach it to mail to pgsql-bugs@lists.postgresql.org.
The form is only another way to send mail to that list.

            regards, tom lane



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
James Inform <james.inform@pharmapp.de> writes:
> Here is the attachment.

Thanks for the test case!  After tracing through it, I find that
the bug is not particularly related to ICU.  It's in
varstr_abbrev_convert(): that function will sometimes reallocate
buffers in CurrentMemoryContext, which could be a shorter-lived
context than the one the SortSupport object belongs to.  If so,
we'll eventually be scribbling on memory that doesn't belong
to us, and the observed problems are gripes from the scribble-ees.

I found this by valgrind'ing the test case, which eventually
printed this:

==00:00:00:27.370 497120== More than 10000000 total errors detected.  I'm not reporting any more.
==00:00:00:27.370 497120== Final error counts will be inaccurate.  Go fix your program!

and awhile later suffered an OOM kill.  I got a good laugh
out of that --- never saw that valgrind message before.

Attached is a quick draft fix.  Some notes:

* I'm not really proposing the added Asserts for commit,
though they helped provide confidence that I was on the
right track.

* We need to look around and see if the same mistake appears
in any other sortsupport code.

* The bug could never have existed at all if these buffer
resizings had been done with repalloc().  I get that the
idea is to avoid copying data we don't care about, but
this coding is still feeling like an antipattern.  I wonder
if it'd be worth inventing a variant of repalloc that makes
the chunk bigger without preserving its contents.

            regards, tom lane

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 919138eaf3..90688376a6 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -2330,12 +2330,14 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)

     if (len1 >= sss->buflen1)
     {
+        Assert(ssup->ssup_cxt == GetMemoryChunkContext(sss->buf1));
         pfree(sss->buf1);
         sss->buflen1 = Max(len1 + 1, Min(sss->buflen1 * 2, MaxAllocSize));
         sss->buf1 = MemoryContextAlloc(ssup->ssup_cxt, sss->buflen1);
     }
     if (len2 >= sss->buflen2)
     {
+        Assert(ssup->ssup_cxt == GetMemoryChunkContext(sss->buf2));
         pfree(sss->buf2);
         sss->buflen2 = Max(len2 + 1, Min(sss->buflen2 * 2, MaxAllocSize));
         sss->buf2 = MemoryContextAlloc(ssup->ssup_cxt, sss->buflen2);
@@ -2518,9 +2520,10 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
         /* By convention, we use buffer 1 to store and NUL-terminate */
         if (len >= sss->buflen1)
         {
+            Assert(ssup->ssup_cxt == GetMemoryChunkContext(sss->buf1));
             pfree(sss->buf1);
             sss->buflen1 = Max(len + 1, Min(sss->buflen1 * 2, MaxAllocSize));
-            sss->buf1 = palloc(sss->buflen1);
+            sss->buf1 = MemoryContextAlloc(ssup->ssup_cxt, sss->buflen1);
         }

         /* Might be able to reuse strxfrm() blob from last call */
@@ -2607,10 +2610,11 @@ varstr_abbrev_convert(Datum original, SortSupport ssup)
             /*
              * Grow buffer and retry.
              */
+            Assert(ssup->ssup_cxt == GetMemoryChunkContext(sss->buf2));
             pfree(sss->buf2);
             sss->buflen2 = Max(bsize + 1,
                                Min(sss->buflen2 * 2, MaxAllocSize));
-            sss->buf2 = palloc(sss->buflen2);
+            sss->buf2 = MemoryContextAlloc(ssup->ssup_cxt, sss->buflen2);
         }

         /*

Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Peter Geoghegan
Дата:
On Sat, Aug 13, 2022 at 5:30 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Attached is a quick draft fix.  Some notes:
>
> * I'm not really proposing the added Asserts for commit,
> though they helped provide confidence that I was on the
> right track.

It seems very unlikely that anybody will notice the added cycles in
debug builds. And while the risk of introducing unwanted side-effects
in assert-enabled builds is greater than zero, it might be worth it.

> * We need to look around and see if the same mistake appears
> in any other sortsupport code.

I just did -- I see no problems. The only other SortSupport code that
allocates a buffer that is associated its SortSupport state is used by
numeric. That is a fixed-sized buffer (it's dynamically allocated
because we only need it when abbreviated keys are used).

> * The bug could never have existed at all if these buffer
> resizings had been done with repalloc().  I get that the
> idea is to avoid copying data we don't care about, but
> this coding is still feeling like an antipattern.  I wonder
> if it'd be worth inventing a variant of repalloc that makes
> the chunk bigger without preserving its contents.

Or we could just use repalloc(). The vast majority of text sorts won't
need a buffer larger than the initial buffer size (TEXTBUFLEN/1024
bytes), so it seems like a fairly useless optimization.

I think that repalloc might even be mandatory. These buffers are used
as caches that allow us to avoid repeat strxfrm()/strcoll() calls,
which is quite a useful optimization. But we're not accounting for
that here -- we're not invalidating the cache. We wouldn't have to
bother with that if we just used repalloc().

-- 
Peter Geoghegan



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> I think that repalloc might even be mandatory. These buffers are used
> as caches that allow us to avoid repeat strxfrm()/strcoll() calls,
> which is quite a useful optimization. But we're not accounting for
> that here -- we're not invalidating the cache. We wouldn't have to
> bother with that if we just used repalloc().

Hmm, but if we're enlarging the buffer, doesn't that imply
a new input value?  Seems like invalidation must occur anyway.

            regards, tom lane



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Peter Geoghegan
Дата:
On Sat, Aug 13, 2022 at 8:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm, but if we're enlarging the buffer, doesn't that imply
> a new input value?  Seems like invalidation must occur anyway.

Yeah, on second thought it should be invalidated anyway (I had to
check because the details of how we cache strxfrm() and strcoll() are
tricky). Even still, I think that we should just use repalloc for
this.

--
Peter Geoghegan



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sat, Aug 13, 2022 at 8:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm, but if we're enlarging the buffer, doesn't that imply
>> a new input value?  Seems like invalidation must occur anyway.

> Yeah, on second thought it should be invalidated anyway (I had to
> check because the details of how we cache strxfrm() and strcoll() are
> tricky). Even still, I think that we should just use repalloc for
> this.

Works for me.  I'll see about pushing the fix tomorrow.  Thanks
for the additional investigation!

            regards, tom lane



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
... BTW, I wondered how come we'd missed noticing such a serious problem
in any testing.  The code coverage report at

https://coverage.postgresql.org/src/backend/utils/adt/varlena.c.gcov.html

says that only the "if (sss->collate_c)" path in varstr_abbrev_convert
gets exercised in our regression tests.  I thought maybe that just
meant that the coverage.postgresql.org run uses C locale, but I see
the same lack-of-coverage in a local test under LANG=en_US.utf8.
How can we only be reaching this function with sss->collate_c true
when the prevailing locale isn't that?

            regards, tom lane



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Peter Geoghegan
Дата:
On Sat, Aug 13, 2022 at 11:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> says that only the "if (sss->collate_c)" path in varstr_abbrev_convert
> gets exercised in our regression tests.  I thought maybe that just
> meant that the coverage.postgresql.org run uses C locale, but I see
> the same lack-of-coverage in a local test under LANG=en_US.utf8.
> How can we only be reaching this function with sss->collate_c true
> when the prevailing locale isn't that?

We don't trust libc's strxfrm() following the debacle with
locale-aware abbreviated keys back in 2016 (unless the user builds
their own Postgres, and goes out of their way to #define
TRUST_STRXFRM). So the relevant strxfrm() code is arguably dead code.

ICU is different: our policy with ICU locales has always been to trust
ICU's strxfrm()-like function to agree with ICU's strcoll()-like
function.

-- 
Peter Geoghegan



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Sat, Aug 13, 2022 at 11:15 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How can we only be reaching this function with sss->collate_c true
>> when the prevailing locale isn't that?

> We don't trust libc's strxfrm() following the debacle with
> locale-aware abbreviated keys back in 2016 (unless the user builds
> their own Postgres, and goes out of their way to #define
> TRUST_STRXFRM). So the relevant strxfrm() code is arguably dead code.
>
> ICU is different: our policy with ICU locales has always been to trust
> ICU's strxfrm()-like function to agree with ICU's strcoll()-like
> function.

Ah ... so that explains the OP's observation that this is only
seen with ICU locales.

When I build with --with-icu and run coverage testing on the core
regression tests under LANG=en_US.utf8, I see that most of
varstr_abbrev_convert() is reached, but *not* the two buggy
buffer-enlargement stanzas.  So that explains why we've not seen this
in testing.  I wonder whether there is a reasonably cheap way to test
that.  The submitted test case is clearly out of the question to add
as a regression test...

            regards, tom lane



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Peter Geoghegan
Дата:
On Sun, Aug 14, 2022 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> When I build with --with-icu and run coverage testing on the core
> regression tests under LANG=en_US.utf8, I see that most of
> varstr_abbrev_convert() is reached, but *not* the two buggy
> buffer-enlargement stanzas.  So that explains why we've not seen this
> in testing.  I wonder whether there is a reasonably cheap way to test
> that.  The submitted test case is clearly out of the question to add
> as a regression test...

I don't think that it would be terribly difficult. I notice that the
similar "TEXTBUFLEN isn't big enough" code path in varstr_cmp()
already has test coverage. As does the corresponding point in the
equivalent SortSupport-based comparator, varstrfastcmp_locale().

Oh, hang on -- I see why it's tricky to test, now that I took a fresh look
the code. I see that ucol_nextSortKeyPart()'s interface (used only
when the DB encoding is UTF-8) allows us to only request however many
bytes of the conditioned binary key that we need, which for us is
always just the first sizeof(Datum) bytes. That's probably another
factor that made this bug hard to reach; I suspect that
ucol_nextSortKeyPart() has a tendency to avoid needing very much
scratch space to produce a usable abbreviated key.

I also suspect that the test case happens to tickle some obscure UCA
implementation detail in just the right/wrong way, where (for whatever
reason) it is necessary for the implementation to use a fairly large
buffer, despite the fact that it knows that its varlena.c caller will
only require enough conditioned binary key bytes to build an 8 byte
abbreviated key. It might be very rare and hard to hit (and/or depend
on the ICU version), which would explain why it took this long to hear
any complaints. So...I think that it might be quite difficult to test
this.

BTW, is the plan to get rid of the questionable coding pattern in both
varstr_abbrev_convert() and in varstrfastcmp_locale()? I am in favor
of just using repalloc() across the board.


--
Peter Geoghegan



Re: BUG #17584: SQL crashes PostgreSQL when using ICU collation

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> I also suspect that the test case happens to tickle some obscure UCA
> implementation detail in just the right/wrong way, where (for whatever
> reason) it is necessary for the implementation to use a fairly large
> buffer, despite the fact that it knows that its varlena.c caller will
> only require enough conditioned binary key bytes to build an 8 byte
> abbreviated key. It might be very rare and hard to hit (and/or depend
> on the ICU version), which would explain why it took this long to hear
> any complaints. So...I think that it might be quite difficult to test
> this.

I think that on our side, it also requires using incremental sort,
creating yet another obstacle to building a small test case.

> BTW, is the plan to get rid of the questionable coding pattern in both
> varstr_abbrev_convert() and in varstrfastcmp_locale()? I am in favor
> of just using repalloc() across the board.

Yeah, done that way already.

            regards, tom lane