Обсуждение: Re: [GENERAL] Strange problem with create table as select * from table;

Поиск
Список
Период
Сортировка

Re: [GENERAL] Strange problem with create table as select * from table;

От
Tom Lane
Дата:
I wrote:
> A different line of thought is that there's something about these
> specific source rows, and only these rows, that makes them vulnerable to
> corruption during INSERT/SELECT.  Do they by any chance contain any
> values that are unusual elsewhere in your table?  One thing I'm
> wondering about right now is the nulls bitmap --- so do these rows have
> nulls (or not-nulls) in any place that's unusual elsewhere?

Hah ... I have a theory.

I will bet that you recently added some column(s) to the source table
using ALTER TABLE ADD COLUMN and no default value, so that the added
columns were nulls and no table rewrite happened.  And that these
troublesome rows predate that addition, but contained no nulls before
that.  And that they are the only rows that, in addition to the above
conditions, contain data fields wide enough to require out-of-line
toasting.

These conditions together are enough to break the assumption in
toast_insert_or_update that the old and new tuples must have the same
value of t_hoff.  But it can only happen when the source tuple is an
original on-disk tuple, which explains why only INSERT ... SELECT *
causes the problem, not any variants that require projection of a new
column set.  When it does happen, toast_insert_or_update correctly
computes the required size of the new tuple ... but then it tells
heap_fill_tuple to fill the data part at offset olddata->t_hoff, which
is wrong (too small) and so the nulls bitmap that heap_fill_tuple
concurrently constructs will overwrite the first few data bytes.  In
your example, the table contains 49 columns so the nulls bitmap requires
7 bytes, just enough to overwrite the first 6 data bytes as observed.
(In fact, given the values we see being filled in, I can confidently say
that you have two added-since-creation null columns, no more, no less.)

I can reproduce the problem with the attached test case (using the
regression database).  With asserts enabled, the
        Assert(new_len == olddata->t_hoff);
fails.  With asserts off, corrupt data.

This is trivial to fix, now that we know there's a problem --- the
function is only using that assumption to save itself a couple lines
of code.  Penny wise, pound foolish :-(

            regards, tom lane


drop table wide;

create table wide as
select
ten as firstc,
unique1 as unique1_1,
unique2 as unique2_1,
two as two_1,
four as four_1,
ten as ten_1,
twenty as twenty_1,
hundred as hundred_1,
thousand as thousand_1,
twothousand as twothousand_1,
fivethous as fivethous_1,
tenthous as tenthous_1,
odd as odd_1,
even as even_1,
stringu1 as stringu1_1,
stringu2 as stringu2_1,
string4 as string4_1,
unique1 as unique1_2,
unique2 as unique2_2,
two as two_2,
four as four_2,
ten as ten_2,
twenty as twenty_2,
hundred as hundred_2,
thousand as thousand_2,
twothousand as twothousand_2,
fivethous as fivethous_2,
tenthous as tenthous_2,
odd as odd_2,
even as even_2,
stringu1 as stringu1_2,
stringu2 as stringu2_2,
string4 as string4_2,
unique1 as unique1_3,
unique2 as unique2_3,
two as two_3,
four as four_3,
ten as ten_3,
twenty as twenty_3,
hundred as hundred_3,
thousand as thousand_3,
twothousand as twothousand_3,
fivethous as fivethous_3,
tenthous as tenthous_3,
odd as odd_3,
even as even_3,
repeat('xyzzyxydlkadlkndvlelfzzy', 20000) as widec
from onek limit 10;

alter table wide add column nullc1 int;
alter table wide add column nullc2 int;

drop table widec;

create table widec as select * from wide;

select firstc, to_hex(unique1_1), unique2_1, to_hex(unique1_2) from widec;

Re: [GENERAL] Strange problem with create table as select * from table;

От
Martijn van Oosterhout
Дата:
On Fri, Nov 04, 2011 at 09:04:02PM -0400, Tom Lane wrote:
> Hah ... I have a theory.
>
> I will bet that you recently added some column(s) to the source table
> using ALTER TABLE ADD COLUMN and no default value, so that the added
> columns were nulls and no table rewrite happened.  And that these
> troublesome rows predate that addition, but contained no nulls before
> that.  And that they are the only rows that, in addition to the above
> conditions, contain data fields wide enough to require out-of-line
> toasting.
>
> These conditions together are enough to break the assumption in
> toast_insert_or_update that the old and new tuples must have the same
> value of t_hoff.

Wow! Good catch.

> This is trivial to fix, now that we know there's a problem --- the
> function is only using that assumption to save itself a couple lines
> of code.  Penny wise, pound foolish :-(

No doubt the assumption was true when the code was written, but still.

Hve a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer

Вложения

Re: [GENERAL] Strange problem with create table as select * from table;

От
Tom Lane
Дата:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> Any chance of getting the fix in patch format so we could test it on
> this system?

http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=23998fe99c1220ba3a9eefee194e37ec1f14ae07

            regards, tom lane

Re: [GENERAL] Strange problem with create table as select * from table;

От
hubert depesz lubaczewski
Дата:
On Fri, Nov 04, 2011 at 09:04:02PM -0400, Tom Lane wrote:
> that.  And that they are the only rows that, in addition to the above
> conditions, contain data fields wide enough to require out-of-line
> toasting.

checked lengths of the text/varchar columns in database.

there are 16 such columns in the table.
full report of lengths is in
http://www.depesz.com/various/lengths.report.gz

it was obtained using:
select length( "first_text_column" ) as length_1, count(*) from etsy_v2.receipts group by 1 order by 1;
and so on for every text column, and at the end I also made summary of
sum-of-lengths.

there is also:
http://www.depesz.com/various/lengths2.report.gz
which has the same summary, but only of the damaged rows.

As you can see the length of columns is not really special - somewhere
in the middle of all other rows. summarized length is also not special
in any way.

> These conditions together are enough to break the assumption in
> toast_insert_or_update that the old and new tuples must have the same
> value of t_hoff.  But it can only happen when the source tuple is an
> original on-disk tuple, which explains why only INSERT ... SELECT *
> causes the problem, not any variants that require projection of a new
> column set.  When it does happen, toast_insert_or_update correctly
> computes the required size of the new tuple ... but then it tells
> heap_fill_tuple to fill the data part at offset olddata->t_hoff, which
> is wrong (too small) and so the nulls bitmap that heap_fill_tuple
> concurrently constructs will overwrite the first few data bytes.  In
> your example, the table contains 49 columns so the nulls bitmap requires
> 7 bytes, just enough to overwrite the first 6 data bytes as observed.
> (In fact, given the values we see being filled in, I can confidently say
> that you have two added-since-creation null columns, no more, no less.)
>
> I can reproduce the problem with the attached test case (using the
> regression database).  With asserts enabled, the
>         Assert(new_len == olddata->t_hoff);
> fails.  With asserts off, corrupt data.

How can I make the onek table for the test? is it standard table from
something?

> This is trivial to fix, now that we know there's a problem --- the
> function is only using that assumption to save itself a couple lines
> of code.  Penny wise, pound foolish :-(

Any chance of getting the fix in patch format so we could test it on
this system?

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/

Re: [GENERAL] Strange problem with create table as select * from table;

От
Tom Lane
Дата:
hubert depesz lubaczewski <depesz@depesz.com> writes:
> checked lengths of the text/varchar columns in database.

> there are 16 such columns in the table.
> full report of lengths is in
> http://www.depesz.com/various/lengths.report.gz

> it was obtained using:
> select length( "first_text_column" ) as length_1, count(*) from etsy_v2.receipts group by 1 order by 1;
> and so on for every text column, and at the end I also made summary of
> sum-of-lengths.

BTW, that probably doesn't prove a lot since it takes no account of
compression, and different values could be compressible to varying
degrees.  It'd be more useful to look at pg_column_size() numbers
to get an idea of whether toasting happened or not.

            regards, tom lane

Re: [GENERAL] Strange problem with create table as select * from table;

От
hubert depesz lubaczewski
Дата:
On Sun, Nov 06, 2011 at 09:34:24AM -0500, Tom Lane wrote:
> hubert depesz lubaczewski <depesz@depesz.com> writes:
> > Any chance of getting the fix in patch format so we could test it on
> > this system?
>
> http://git.postgresql.org/gitweb/?p=postgresql.git;a=patch;h=23998fe99c1220ba3a9eefee194e37ec1f14ae07

hi
just to close the loop - finally today I could restart the database with
patched 8.4. bug fixed, all works fine. thanks a lot.

Best regards,

depesz

--
The best thing about modern society is how easy it is to avoid contact with it.
                                                             http://depesz.com/