Re: BUG #11141: Duplicate primary key values corruption

Поиск
Список
Период
Сортировка
От Gerd Behrmann
Тема Re: BUG #11141: Duplicate primary key values corruption
Дата
Msg-id 53EB6DD6.6030703@ndgf.org
обсуждение исходный текст
Ответ на Re: BUG #11141: Duplicate primary key values corruption  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Список pgsql-bugs
Den 13/08/14 15.48, Alvaro Herrera skrev:
> Gerd Behrmann wrote:
>
>>   lp | lp_off | lp_flags | lp_len |  t_xmin   | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
t_bits| t_oid 
>>
----+--------+----------+--------+-----------+--------+----------+--------+-------------+------------+--------+--------+-------
>>    5 |   7992 |        1 |     96 | 541168217 |      0 |        3 | (21,5) |       32778 |      10498 |     24 |
  | 
>> (1 row)
>>
>>   lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits
|t_oid 
>>
----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------
>>   62 |   8096 |        1 |     96 |      2 |      0 |        4 | (5,62) |       32778 |      10498 |     24 |
|
>> (1 row)
>
> So t_infomask is 0x2902, or
> HEAP_UPDATED | HEAP_XMAX_INVALID | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH
>
> Note both tuples have the same t_infomask.  Andres Freund suggests these
> might be two updated versions from a common "ancestor" tuple.  I don't
> have reason to think different, except that t_xmin in one of them is
> frozen and so it's probably considerably older than the other one (if
> enough time has passed to have one of them frozen, then why didn't you
> detect this earlier?)
>
> Anyway this might be fixed in 9.3.5, per the commit below.  I suggest
> you upgrade to that one, remove one of the copies, and verify other
> tables for duplicates.
>
> commit c0bd128c81c2b23a1cbc53305180fca51b3b61c3
> Author: Alvaro Herrera <alvherre@alvh.no-ip.org>
> Date:   Thu Apr 24 15:41:55 2014 -0300
>
>      Fix race when updating a tuple concurrently locked by another process
>
>      If a tuple is locked, and this lock is later upgraded either to an
>      update or to a stronger lock, and in the meantime some other process
>      tries to lock, update or delete the same tuple, it (the tuple) could end
>      up being updated twice, or having conflicting locks held.
>
>      The reason for this is that the second updater checks for a change in
>      Xmax value, or in the HEAP_XMAX_IS_MULTI infomask bit, after noticing
>      the first lock; and if there's a change, it restarts and re-evaluates
>      its ability to update the tuple.  But it neglected to check for changes
>      in lock strength or in lock-vs-update status when those two properties
>      stayed the same.  This would lead it to take the wrong decision and
>      continue with its own update, when in reality it shouldn't do so but
>      instead restart from the top.
>
>      This could lead to either an assertion failure much later (when a
>      multixact containing multiple updates is detected), or duplicate copies
>      of tuples.
>
>      To fix, make sure to compare the other relevant infomask bits alongside
>      the Xmax value and HEAP_XMAX_IS_MULTI bit, and restart from the top if
>      necessary.
>
>      Also, in the belt-and-suspenders spirit, add a check to
>      MultiXactCreateFromMembers that a multixact being created does not have
>      two or more members that are claimed to be updates.  This should protect
>      against other bugs that might cause similar bogus situations.
>
>      Backpatch to 9.3, where the possibility of multixacts containing updates
>      was introduced.  (In prior versions it was possible to have the tuple
>      lock upgraded from shared to exclusive, and an update would not restart
>      from the top; yet we're protected against a bug there because there's
>      always a sleep to wait for the locking transaction to complete before
>      continuing to do anything.  Really, the fact that tuple locks always
>      conflicted with concurrent updates is what protected against bugs here.)
>
>      Per report from Andrew Dunstan and Josh Berkus in thread at
>      http://www.postgresql.org/message-id/534C8B33.9050807@pgexperts.com
>
>      Bug analysis by Andres Freund.
>

Thanks.

One tuple is indeed considerably older (a month or two maybe). Since
only the new tuple shows up when filtering on the key, our application
continued to work, and hence nobody noticed the problem.

I will do as suggested. You can close the report.

Cheers,

/gerd

--
Acting NT1 Area Coordinator, NeIC

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #10675: alter database set tablespace and unlogged table
Следующее
От: amutu@amutu.com
Дата:
Сообщение: BUG #11161: set kern.ipc.semmap on FreeBSD 9.0+ get error