Обсуждение: BUG #9741: Mininal case for the BUG #9735: Error: "ERROR: tuple offset out of range: 0" during bitmap scan

Поиск
Список
Период
Сортировка

BUG #9741: Mininal case for the BUG #9735: Error: "ERROR: tuple offset out of range: 0" during bitmap scan

От
maxim.boguk@gmail.com
Дата:
The following bug has been logged on the website:

Bug reference:      9741
Logged by:          Maxim Boguk
Email address:      maxim.boguk@gmail.com
PostgreSQL version: 9.3.3
Operating system:   Linux (Ubuntu)
Description:

Hi,

How situation with the BUG #9735 become even more curious.

One query successfully executing on master db but doesn't work on both
streaming replicas.
After much digging I located problem and produced it in pretty simple way:

gate_platbox=# set enable_indexscan to 0;
SET
(force bitmap scan)
gate_platbox=# explain analyze select * from transactions where
id=53265020;
ERROR:  tuple offset out of range: 0

This tuple had been frozen not that long time ago:

gate_platbox=# select id,xmin,xmax,cmin,cmax,ctid from transactions where
id=53265020;
    id    | xmin | xmax | cmin | cmax |    ctid
----------+------+------+------+------+-------------
 53265020 |    2 |    0 |    1 |    1 | (2413168,1)

pageinspect over used index also show nothing suspicious:
select gs.i,t.* from (select generate_series(1,94961) as i) as gs,
bt_page_items('transactions_pkey', i) as t where ctid::text='(2413168,1)';
   i   | itemoffset |    ctid     | itemlen | nulls | vars |          data
-------+------------+-------------+---------+-------+------+-------------------------
 88472 |         93 | (2413168,1) |      16 | f     | f    | 7c c2 2c 03 00
00 00 00

Any ideas what's can be wrong with the database?
> gate_platbox=# set enable_indexscan to 0;
> SET
> (force bitmap scan)
> gate_platbox=# explain analyze select * from transactions where
> id=53265020;
> ERROR:  tuple offset out of range: 0
>
> This tuple had been frozen not that long time ago:
>
> gate_platbox=# select id,xmin,xmax,cmin,cmax,ctid from transactions where
> id=53265020;
>     id    | xmin | xmax | cmin | cmax |    ctid
> ----------+------+------+------+------+-------------
>  53265020 |    2 |    0 |    1 |    1 | (2413168,1)
>
> pageinspect over used index also show nothing suspicious:
> select gs.i,t.* from (select generate_series(1,94961) as i) as gs,
> bt_page_items('transactions_pkey', i) as t where ctid::text='(2413168,1)';
>    i   | itemoffset |    ctid     | itemlen | nulls | vars |          data
>
> -------+------------+-------------+---------+-------+------+-------------------------
>  88472 |         93 | (2413168,1) |      16 | f     | f    | 7c c2 2c 03 00
> 00 00 00
>
> Any ideas what's can be wrong with the database?
>

Some playing with GDB end me with the  btgetbitmap function in nbtree.c,

And with the next content of the
BTScanOpaque so = (BTScanOpaque) scan->opaque
structure:

(gdb) p *(BTScanOpaque) scan->opaque
$21 = {
  qual_ok = 1 '\001',
  numberOfKeys = 1,
  keyData = 0x7f099afaad80,
  arrayKeyData = 0x0,
  numArrayKeys = 0,
  arrayKeys = 0x0,
  arrayContext = 0x0,
  killedItems = 0x0,
  numKilled = 0,
  currTuples = 0x0,
  markTuples = 0x0,
  markItemIndex = -1,
  currPos = {
    buf = 137250,
    nextPage = 88473,
    moreLeft = 0 '\000',
    moreRight = 0 '\000',
    nextTupleOffset = 0,
    firstItem = 0,
    lastItem = 1,
    itemIndex = 1,
    items =       {[0] = {
        heapTid = {
          ip_blkid = {
            bi_hi = 36,
            bi_lo = 53872
          },
          ip_posid = 1
        },
        indexOffset = 93,
        tupleOffset = 0
      },
      [1] = {
        heapTid = {
          ip_blkid = {
            bi_hi = 35,
            bi_lo = 44171
          },
          ip_posid = 0
        },
        indexOffset = 94,
        tupleOffset = 0
      },
...

It could be seen that there some weird second element in items array with
ip_posid = 0.

Now that lead me to check the next entry in transactions_pkey index (with
offset=94):

select * from bt_page_items('transactions_pkey', 88472) where itemoffset in
(93,94);
 itemoffset |    ctid     | itemlen | nulls | vars |          data
------------+-------------+---------+-------+------+-------------------------
         93 | (2413168,1) |      16 | f     | f    | 7c c2 2c 03 00 00 00 00
         94 | (2337931,0) |      16 | f     | f    | 7c c2 2c 03 00 00 00 00

Ok so now we have a two entry for the same value in the index.
Index on the master database have the same content.

But the second entry very likely not visible on master db.

So it confirm my idea that there some permanent corruption of visibility
map on replicas (and probably on master database as well).

What's strange that the problem could not be fixed via vacuum freeze with
vacuum_freeze_table_age=0.

Any suggestions what I can do next?
Unfortunately dump/restore the database not an issue at this moment (it
huge project with strict online read-write requirements).

Is there any way force rebuild a visibility map for the table without
access exclusive lock (e.g. without cluster/vacuum full)?

I going to try upgrade database to 9.3.4 today and see is it fix an issue.
But I see nothing about visibility map on change notes.

Kind Regards,
Maksym
Maxim Boguk <maxim.boguk@gmail.com> writes:
> select * from bt_page_items('transactions_pkey', 88472) where itemoffset in
> (93,94);
>  itemoffset |    ctid     | itemlen | nulls | vars |          data
> ------------+-------------+---------+-------+------+-------------------------
>          93 | (2413168,1) |      16 | f     | f    | 7c c2 2c 03 00 00 00 00
>          94 | (2337931,0) |      16 | f     | f    | 7c c2 2c 03 00 00 00 00

This is sort of what I expected.  Surely reindexing the index will get
rid of that bogus entry?

            regards, tom lane
On Thu, Mar 27, 2014 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Maxim Boguk <maxim.boguk@gmail.com> writes:
> > select * from bt_page_items('transactions_pkey', 88472) where itemoffset
> in
> > (93,94);
> >  itemoffset |    ctid     | itemlen | nulls | vars |          data
> >
> ------------+-------------+---------+-------+------+-------------------------
> >          93 | (2413168,1) |      16 | f     | f    | 7c c2 2c 03 00 00
> 00 00
> >          94 | (2337931,0) |      16 | f     | f    | 7c c2 2c 03 00 00
> 00 00
>
> This is sort of what I expected.  Surely reindexing the index will get
> rid of that bogus entry?
>
>                         regards, tom lane
>

Yep reindexing that particular index fixed problem.
But this index had been reindexed just 2 days ago in the same situation,
and now problem reappear (with different table row/index entries).

If problem reappear after the last reindex and with 9.3.4 version I will
investigate future.


It would be interesting know how simple integer btree index could have
entry with definitely wrong ctid?
It very strange for problem reappear with the only one table (there are
couple other tables in the database with similar load but with no issues at
all), it looks too selective and repeatable for a hardware glitch.

Kind Regards,
Maksym
On Thu, Mar 27, 2014 at 11:27 PM, Maxim Boguk <maxim.boguk@gmail.com> wrote:

>
>
>
> On Thu, Mar 27, 2014 at 10:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> Maxim Boguk <maxim.boguk@gmail.com> writes:
>> > select * from bt_page_items('transactions_pkey', 88472) where
>> itemoffset in
>> > (93,94);
>> >  itemoffset |    ctid     | itemlen | nulls | vars |          data
>> >
>> ------------+-------------+---------+-------+------+-------------------------
>> >          93 | (2413168,1) |      16 | f     | f    | 7c c2 2c 03 00 00
>> 00 00
>> >          94 | (2337931,0) |      16 | f     | f    | 7c c2 2c 03 00 00
>> 00 00
>>
>> This is sort of what I expected.  Surely reindexing the index will get
>> rid of that bogus entry?
>>
>>                         regards, tom lane
>>
>
> Yep reindexing that particular index fixed problem.
> But this index had been reindexed just 2 days ago in the same situation,
> and now problem reappear (with different table row/index entries).
>
> If problem reappear after the last reindex and with 9.3.4 version I will
> investigate future.
>
>
> It would be interesting know how simple integer btree index could have
> entry with definitely wrong ctid?
> It very strange for problem reappear with the only one table (there are
> couple other tables in the database with similar load but with no issues at
> all), it looks too selective and repeatable for a hardware glitch.
>

Hi,

Situation reappeared just in 3 hours after reindex (and no database
crash/restart happens during that period).
Problem found via manual search of the potentially troublesome id's:

Step 1:
search potentially suspicious index entry/id's (with index ctid have offset
zero):
select *,(get_byte(val, 0))+(get_byte(val, 1)<<8)+(get_byte(val,
2)<<16)+(get_byte(val, 3)<<24) as id from (select
gs.i,t.*,(E'\\x'||replace(trim(trailing ' 0' from data), ' ', ''))::bytea
as val from (select generate_series(1,90000) as i) as gs,
bt_page_items('transactions_pkey', i) as t where ctid::text like '%,0)') as
_t order by id,i,itemoffset;
   i   | itemoffset |    ctid     | itemlen | nulls | vars |
data           |    val     |    id
-------+------------+-------------+---------+-------+------+-------------------------+------------+----------
 83604 |         60 | (2233938,0) |      16 | f     | f    | df aa 11 03 00
00 00 00 | \xdfaa1103 | 51489503
 83648 |         92 | (2245568,0) |      16 | f     | f    | e7 e9 11 03 00
00 00 00 | \xe7e91103 | 51505639
 83819 |         37 | (2253801,0) |      16 | f     | f    | bc dc 12 03 00
00 00 00 | \xbcdc1203 | 51567804
 83907 |        184 | (2256376,0) |      16 | f     | f    | 1f 5b 13 03 00
00 00 00 | \x1f5b1303 | 51600159
...
Found 25 rows.

Now situation become much worse: the test query (
set enable_indexscan to 0;
select * from transactions where id in (25 rows from previous query result);
) produce the same error on the master database as well as on the replicas.

And reindex index transactions_pkey; doesn't help.
However, filenode of index changed for sure (so reindex happened):

select
*,pg_relation_filenode('transactions_pkey'),pg_relation_filepath('transactions_pkey')
from pg_stat_file(pg_relation_filepath('transactions_pkey'));
   size    |         access         |      modification      |
change         | creation | isdir | pg_relation_filenode |
pg_relation_filepath

-----------+------------------------+------------------------+------------------------+----------+-------+----------------------+----------------------
 760471552 | 2014-03-28 04:02:22+04 | 2014-03-28 04:16:34+04 | 2014-03-28
04:16:34+04 |          | f     |               408909 | base/16424/408909
reindex index transactions_pkey;
REINDEX
select
*,pg_relation_filenode('transactions_pkey'),pg_relation_filepath('transactions_pkey')
from pg_stat_file(pg_relation_filepath('transactions_pkey'));
   size    |         access         |      modification      |
change         | creation | isdir | pg_relation_filenode |
pg_relation_filepath

-----------+------------------------+------------------------+------------------------+----------+-------+----------------------+----------------------
 760438784 | 2014-03-28 04:18:20+04 | 2014-03-28 04:19:39+04 | 2014-03-28
04:19:39+04 |          | f     |               408931 | base/16424/408931
set enable_indexscan to 0;
SET
select id from transactions where id in (51489503,51505639,51567804,... 25
total);
ERROR:  tuple offset out of range: 0

Any suggestions what I can do next and where problem could be?
Now it doesn't seems replication issue... may be some repeatable low level
database corruption? But I have no idea what type corruption could produce
so weird results.

Kind Regards,
Maskym