Обсуждение: Violation of primary key constraint

Поиск
Список
Период
Сортировка

Violation of primary key constraint

От
Toby Murray
Дата:
I just had some interaction with RhodiumToad on IRC about a duplicated
primary key problem I ran into today. After some poking around he
suggested that I send this to -bugs since it seems like an interesting
error.

The short version is that I have a table ("ways") which has a primary
key ("id") that has managed to get 4 of its records duplicated (out of
~168 million):
select id, version, tstamp from ways where id in
(26245218,26709186,26709284,26926573) order by id, version;
    id    | version |       tstamp
----------+---------+---------------------
 26245218 |      12 | 2011-02-06 06:54:10
 26245218 |      13 | 2013-01-28 02:37:56
 26709186 |       4 | 2008-09-02 04:39:21
 26709186 |       4 | 2008-09-02 04:39:21
 26709284 |       4 | 2008-10-26 14:06:03
 26709284 |       5 | 2013-01-28 02:38:30
 26926573 |       4 | 2009-12-27 07:13:28
 26926573 |       4 | 2009-12-27 07:13:28

version and tstamp are just additional data to show that two of the
entries have identical values while two have different values. More on
that in a bit.

Here is the table definition:
    Column    |            Type             | Modifiers | Storage
--------------+-----------------------------+-----------+----------
 id           | bigint                      | not null  | plain
 version      | integer                     | not null  | plain
 user_id      | integer                     | not null  | plain
 tstamp       | timestamp without time zone | not null  | plain
 changeset_id | bigint                      | not null  | plain
 tags         | hstore                      |           | extended
 nodes        | bigint[]                    |           | extended
 linestring   | geometry(Geometry,4326)     |           | main
Indexes:
    "pk_ways" PRIMARY KEY, btree (id)
    "idx_ways_linestring" gist (linestring), tablespace "ssd"
    "idx_ways_tags" gin (tags)
Has OIDs: no
Tablespace: "ssd"

If you don't read your email with a monospaced font, here is a pastie'd version:
http://pastie.org/5979660


Some background:
I am running a postgis database filled with OpenStreetMap data. The
database was initially populated with data from the "planet" file in
early January. The process involves creating a set of data files and
then loading them into the database using \copy. The basic load
process can be seen in this file:
https://github.com/openstreetmap/osmosis/blob/master/package/script/pgsnapshot_load_0.6.sql

The file does have some sections that are optional, depending on your
setup so I did not execute everything in there. Taking out only the
statements that affect the problematic ways table:
ALTER TABLE ways DROP CONSTRAINT pk_ways;
DROP INDEX idx_ways_linestring;
\copy ways FROM 'ways.txt'
ALTER TABLE ONLY ways ADD CONSTRAINT pk_ways PRIMARY KEY (id);
CREATE INDEX idx_ways_linestring ON ways USING gist (linestring);
plus an index creation on the hstore "tags" column that isn't in that file.


After the copy and index building was done, I set it up to update from
the OpenStreetMap minutely diff files. This means that every minute,
updates are being applied to the database. This has been working fine
for about 3 weeks until this morning when the minutely update crashed
with a duplicate key error:

Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key
value violates unique constraint "pk_ways"
  Detail: Key (id)=(26926573) already exists.

Knowing how the update software works, my first response was "that's
impossible" since it actually checks for the existence of an object in
the database and if it exists, it issues an UPDATE query. Otherwise it
does an INSERT. Of course the problem is actually that there are
already two copies of this object in the database. So the update
software finds an existing record and issues an UPDATE query which
fails with the duplicate key error.

As I said, I did the initial import at the beginning of January. So if
you look back up at the data you will see that at least all the
earlier versions of the objects have a timestamp long before the
import. My first intuition was that this meant that there were
duplicates in the ways.txt file that somehow went undetected when I
added the PK constraint. While I don't have the original files any
more I do still have the planet file and the same software that
generated the files so I just ran that again and I do not see any
duplicates in the file. This leaves me completely clueless as to how
the duplicates got into the database. Especially the one with the ID
26709186 since it hasn't been changed in OpenStreetMap in years so
there is no reason for it to have been touched in any way since the
import.

But looking at the timestamps again, it looks like an UPDATE query may
have successfully executed on two of the duplicates during minutely
diff application and updated one of the records on ways 26245218 and
26709284. The timestamp is when this object was last touched in
OpenStreetMap. These two ways were changed within seconds of each
other although they don't seem to share anything. Edited by different
users in different locations.

Here are some queries and their results that RhodiumToad had me run to
try and track things down:

select ctid,xmin,xmax,id from ways
where id in (26926573,26709284,26709186,26245218) order by id;
     ctid      |  xmin  |  xmax  |    id
---------------+--------+--------+----------
 (11249625,3)  | 979685 |      0 | 26245218
 (1501614,5)   | 955993 |      0 | 26245218
 (11247884,18) | 979686 |      0 | 26709186
 (1520052,19)  | 955993 |      0 | 26709186
 (1520056,15)  | 976128 |      0 | 26709284
 (11249780,8)  | 979686 |      0 | 26709284
 (1528888,14)  | 955993 |      0 | 26926573
 (11249622,6)  | 979685 | 982467 | 26926573

( http://pastie.org/5979497 )

The last one there seems to be showing the transaction that failed and
was rolled back earlier this morning.



select * from heap_page_items(get_raw_page('ways',11249622)) where lp=6;
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 |
t_ctid    | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid

----+--------+----------+--------+--------+--------+----------+--------------+-------------+------------+--------+--------+-------
  6 |   7888 |        1 |    304 | 979685 | 982467 |      143 |
(11399071,5) |           8 |      10498 |     24 |        |

( http://pastie.org/pastes/5979566/text )


select i, (x).* from (select i, bt_page_items('pk_ways',i::integer) as x
from generate_series(1, pg_relation_size('pk_ways')/8192 - 1) i offset 0) s
where (x).ctid in

('(11249625,3)','(1501614,5)','(11247884,18)','(1520052,19)','(1520056,15)','(11249780,8)','(1528888,14)','(11249622,6)');
   i    | itemoffset |     ctid      | itemlen | nulls | vars |          data
--------+------------+---------------+---------+-------+------+-------------------------
  50039 |        181 | (11249625,3)  |      16 | f     | f    | 62 78
90 01 00 00 00 00
  50039 |        183 | (1501614,5)   |      16 | f     | f    | 62 78
90 01 00 00 00 00
 506579 |        214 | (11249622,6)  |      16 | f     | f    | ed dd
9a 01 00 00 00 00
 506579 |        219 | (1528888,14)  |      16 | f     | f    | ed dd
9a 01 00 00 00 00
 535456 |          4 | (11247884,18) |      16 | f     | f    | c2 8c
97 01 00 00 00 00
 535456 |          5 | (1520052,19)  |      16 | f     | f    | c2 8c
97 01 00 00 00 00
 535456 |         77 | (11249780,8)  |      16 | f     | f    | 24 8d
97 01 00 00 00 00
 535456 |         80 | (1520056,15)  |      16 | f     | f    | 24 8d
97 01 00 00 00 00

( http://pastie.org/5980159 )


Additional system information:
OS: Linux Mint 14 (based on Ubuntu 12.10)
Postgresql version: PostgreSQL 9.1.7 on x86_64-unknown-linux-gnu,
compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2, 64-bit
Postgis version: 2.0.1 r9979
The database is running on a 4 disk software RAID5 (yeah... not built
for speed) plus one SSD which holds the ways table is along with its
linestring index (but not the PK index)

And one more thing I just realized: I did turn off fsync for the
import, intending to turn it back on for normal usage but it looks
like I never turned it back on. However I haven't had any ungraceful
shutdowns since the import so I don't think this should be a factor.
The nature of the database is that I can rebuild it at any time from
the source (although it does take several days) so I'm not overly
vigilant about this.

I have left the database in its current state for now so I can run
some more checks if needed.

Toby

Re: Violation of primary key constraint

От
Tom Lane
Дата:
Toby Murray <toby.murray@gmail.com> writes:
> I just had some interaction with RhodiumToad on IRC about a duplicated
> primary key problem I ran into today. After some poking around he
> suggested that I send this to -bugs since it seems like an interesting
> error.

I poked around in the PK index file (thanks for sending that) and could
not find anything that looks wrong.  There are a lot of duplicate keys
(a few of the keys appear more than a thousand times) but I think this
is just the result of update activity that hasn't been vacuumed away
yet.  I count 181340233 leaf index tuples bearing 168352931 distinct
key values --- that makes for a dead-tuple fraction of 7.7% which is
not quite enough to trigger an autovacuum, so it's not terribly
surprising that the dups are still present.

At this point it seems that it's not the index's fault.  What seems more
likely is that somehow the older heap entry failed to get marked "dead"
after an UPDATE.

> ... Especially the one with the ID
> 26709186 since it hasn't been changed in OpenStreetMap in years so
> there is no reason for it to have been touched in any way since the
> import.

Yeah, it's a bit hard to explain that this way unless there was an
UPDATE that didn't change the timestamp or version.  How sure are you
that the updating process always changes those?

> Here are some queries and their results that RhodiumToad had me run to
> try and track things down:

Could we see the full results of heap_page_items(get_raw_page()) for
each of the pages where any of these tuples live, viz

    11249625
    1501614
    11247884
    1520052
    1520056
    11249780
    1528888
    11249622

            regards, tom lane

Re: Violation of primary key constraint

От
Toby Murray
Дата:
On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Toby Murray <toby.murray@gmail.com> writes:
>> I just had some interaction with RhodiumToad on IRC about a duplicated
>> primary key problem I ran into today. After some poking around he
>> suggested that I send this to -bugs since it seems like an interesting
>> error.
>
> I poked around in the PK index file (thanks for sending that) and could
> not find anything that looks wrong.  There are a lot of duplicate keys
> (a few of the keys appear more than a thousand times) but I think this
> is just the result of update activity that hasn't been vacuumed away
> yet.  I count 181340233 leaf index tuples bearing 168352931 distinct
> key values --- that makes for a dead-tuple fraction of 7.7% which is
> not quite enough to trigger an autovacuum, so it's not terribly
> surprising that the dups are still present.
>
> At this point it seems that it's not the index's fault.  What seems more
> likely is that somehow the older heap entry failed to get marked "dead"
> after an UPDATE.
>
>> ... Especially the one with the ID
>> 26709186 since it hasn't been changed in OpenStreetMap in years so
>> there is no reason for it to have been touched in any way since the
>> import.
>
> Yeah, it's a bit hard to explain that this way unless there was an
> UPDATE that didn't change the timestamp or version.  How sure are you
> that the updating process always changes those?

Pretty sure. The minutely change stream coming from OSM is generated
from all objects that were modified since the last diff was generated,
based on transaction numbers in their postgresql database. The way
changes get into that database is through the rails API which enforces
version number bumps on upload and sets its own timestamp based on
when the upload came in. The only way to apply an update that doesn't
change anything would be to apply a minutely diff from the past to an
up-to-date database. This does happen right when you start updating
from minutely diffs after a clean import but the overlap is a matter
of hours, just to ensure there is no gap - not something from 2008.


>> Here are some queries and their results that RhodiumToad had me run to
>> try and track things down:
>
> Could we see the full results of heap_page_items(get_raw_page()) for
> each of the pages where any of these tuples live, viz
>
>         11249625
>         1501614
>         11247884
>         1520052
>         1520056
>         11249780
>         1528888
>         11249622
>

Seems awkward to put inline so I made a text file with these results
and am attaching it. Hoping the mailing list allows attachments.

Toby

Вложения

Re: Violation of primary key constraint

От
Toby Murray
Дата:
On Thu, Jan 31, 2013 at 10:57 PM, Toby Murray <toby.murray@gmail.com> wrote:
> On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Toby Murray <toby.murray@gmail.com> writes:
>>> I just had some interaction with RhodiumToad on IRC about a duplicated
>>> primary key problem I ran into today. After some poking around he
>>> suggested that I send this to -bugs since it seems like an interesting
>>> error.
>>
>> I poked around in the PK index file (thanks for sending that) and could
>> not find anything that looks wrong.  There are a lot of duplicate keys
>> (a few of the keys appear more than a thousand times) but I think this
>> is just the result of update activity that hasn't been vacuumed away
>> yet.  I count 181340233 leaf index tuples bearing 168352931 distinct
>> key values --- that makes for a dead-tuple fraction of 7.7% which is
>> not quite enough to trigger an autovacuum, so it's not terribly
>> surprising that the dups are still present.
>>
>> At this point it seems that it's not the index's fault.  What seems more
>> likely is that somehow the older heap entry failed to get marked "dead"
>> after an UPDATE.
>>
>>> ... Especially the one with the ID
>>> 26709186 since it hasn't been changed in OpenStreetMap in years so
>>> there is no reason for it to have been touched in any way since the
>>> import.
>>
>> Yeah, it's a bit hard to explain that this way unless there was an
>> UPDATE that didn't change the timestamp or version.  How sure are you
>> that the updating process always changes those?
>
> Pretty sure. The minutely change stream coming from OSM is generated
> from all objects that were modified since the last diff was generated,
> based on transaction numbers in their postgresql database. The way
> changes get into that database is through the rails API which enforces
> version number bumps on upload and sets its own timestamp based on
> when the upload came in. The only way to apply an update that doesn't
> change anything would be to apply a minutely diff from the past to an
> up-to-date database. This does happen right when you start updating
> from minutely diffs after a clean import but the overlap is a matter
> of hours, just to ensure there is no gap - not something from 2008.

Ignore this!

I was just talking some more with RhodiumToad on IRC and realized that
I completely forgot about the linestring column. Whenever a node that
is a member of a way gets changed, the way's linestring is updated by
the minutely updates. And checking again, the linestrings on the two
records with an ID of 26709186 are indeed different, but still the
same length. The node was only moved by a few centimeters I think.

Also, the node change happened on January 28th. To be precise, the
timestamp of the node is 2013-01-28 02:38:29. So it is looking like
something may have gone wrong, possibly in a single minutely update,
on January 28th. I'll do a little more digging.

Toby

Re: Violation of primary key constraint

От
Toby Murray
Дата:
On Thu, Jan 31, 2013 at 11:21 PM, Toby Murray <toby.murray@gmail.com> wrote:
> Also, the node change happened on January 28th. To be precise, the
> timestamp of the node is 2013-01-28 02:38:29. So it is looking like
> something may have gone wrong, possibly in a single minutely update,
> on January 28th. I'll do a little more digging.

I can confirm that all 4 of these errors can be accounted for in two
back to back minutely updates at 8:38 and 8:39 on January 28th:
http://planet.openstreetmap.org/replication/minute/000/198/193.osc.gz
http://planet.openstreetmap.org/replication/minute/000/198/194.osc.gz

It is possible that both of them were applied in a single transaction.
Sometimes replication falls behind by a few seconds and misses the
minutely update and then the next run picks up 2 minutes and combines
them before updating the database. However one of the queries from
earlier seemed to indicate that two different transactions were
involved.

So with a specific timeframe in mind, I looked at other things. It was
2:30 AM on a Monday morning here and it looks like I was in bed at the
time. I don't see any errors in log files at that time... however
about 6 hours later I did get my first notification from smartd of an
unreadable sector on one of my drives and a failed SMART self test.
This is one of the drives in the RAID5 array so even if it failed, it
should (in theory of course) not have corrupted anything. Also, the
affected table data lives on an SSD that is not part of the array. The
index is on the array but it sounds like this has been ruled out as a
cause of error.

We were discussing the possibility of hardware failure on IRC. If
something went wrong on the SSD, it sounds like it would have had to
have been a completely silent write failure. This is of course always
an option although I haven't seen any other indications that this
drive is failing. It is pretty new at 1,200 power on hours and a wear
leveling count of 5.

Toby

Re: Violation of primary key constraint

От
Tom Lane
Дата:
Toby Murray <toby.murray@gmail.com> writes:
> On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Could we see the full results of heap_page_items(get_raw_page()) for
>> each of the pages where any of these tuples live, viz

> Seems awkward to put inline so I made a text file with these results
> and am attaching it. Hoping the mailing list allows attachments.

Thanks.  These are a bit odd, because none of the tuples in question
look like they've been updated, except for (11249622,6) which we
already thought had been re-modified by some later transaction.

What it looks like to me is that the writes of the source pages simply
got lost somehow, so that what's on disk now is the pre-update copies
of these tuples as well as the post-update copies.  I wonder whether
your SSD dropped a batch of updates.  Speculating wildly: maybe that
was triggered by a power glitch that also caused the reported error
on your RAID array?

One thing that I learned from examining the PK index is that for
three out of these four pairs, there were intermediate versions,
ie there must have been multiple UPDATEs issued, probably all in
the same transaction (does that square with what you know of the
update process?).  For instance, the index entries for 26245218
look like

 Item 181 -- Length:   16  Offset: 5024 (0x13a0)  Flags: NORMAL
  Block Id: 11249625  linp Index: 3  Size: 16
  Has Nulls: 0  Has Varwidths: 0

  13a0: ab00d9a7 03001000 62789001 00000000  ........bx......

 Item 182 -- Length:   16  Offset: 5040 (0x13b0)  Flags: DEAD
  Block Id: 1501614  linp Index: 16  Size: 16
  Has Nulls: 0  Has Varwidths: 0

  13b0: 1600aee9 10001000 62789001 00000000  ........bx......

 Item 183 -- Length:   16  Offset: 5328 (0x14d0)  Flags: NORMAL
  Block Id: 1501614  linp Index: 5  Size: 16
  Has Nulls: 0  Has Varwidths: 0

  14d0: 1600aee9 05001000 62789001 00000000  ........bx......

We can see the heap entries at (11249625,3) and (1501614,5)
but there's nothing at (1501614,16).  This is explainable if
you believe that a write of page 1501614 got lost.

It's conceivable that this was a software glitch not a hardware glitch,
ie Postgres forgetting the dirty-bits for a batch of pages, but we've
not seen any similar reports elsewhere.  So I'm leaning to the hardware
explanation.

            regards, tom lane

Re: Violation of primary key constraint

От
Toby Murray
Дата:
On Fri, Feb 1, 2013 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Toby Murray <toby.murray@gmail.com> writes:
>> On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Could we see the full results of heap_page_items(get_raw_page()) for
>>> each of the pages where any of these tuples live, viz
>
>> Seems awkward to put inline so I made a text file with these results
>> and am attaching it. Hoping the mailing list allows attachments.
>
> Thanks.  These are a bit odd, because none of the tuples in question
> look like they've been updated, except for (11249622,6) which we
> already thought had been re-modified by some later transaction.

Well, that's the one that finally brought the house of cards down when
the update process attempted to modify it but failed because the
database finally noticed the duplicate PK.


> What it looks like to me is that the writes of the source pages simply
> got lost somehow, so that what's on disk now is the pre-update copies
> of these tuples as well as the post-update copies.  I wonder whether
> your SSD dropped a batch of updates.  Speculating wildly: maybe that
> was triggered by a power glitch that also caused the reported error
> on your RAID array?

Yeah... probably no way to confirm/deny this at this point. The
machine is on a UPS but of course nothing is 100%. I think the spinny
drive reporting the SMART errors definitely has a problem. It has
accumulated more errors since the 28th and I will be replacing it
soon.

> One thing that I learned from examining the PK index is that for
> three out of these four pairs, there were intermediate versions,
> ie there must have been multiple UPDATEs issued, probably all in
> the same transaction (does that square with what you know of the
> update process?).

Yes, that is possible. It is actually a big inefficiency in the update
process that I am planning on tinkering with at some point. If 3 nodes
that are part of the way are changed then the way's linestring will be
updated 3 times during the update process. And if the way itself is
changed then it will get updated twice - once for the way attribute
changes and once for the linestring update. Like I said...
inefficient.

The OSM object history does back up your statement of 3 of them having
multiple updates.
26245218: updated once
26709186: updated 2 times
26709284: updated 4 times
26926573: updated 5 times


> It's conceivable that this was a software glitch not a hardware glitch,
> ie Postgres forgetting the dirty-bits for a batch of pages, but we've
> not seen any similar reports elsewhere.  So I'm leaning to the hardware
> explanation.

Roger that. Unless there is anything else to look at with the database
in its current state, I will try to delete the duplicates and then
backtrack my OSM replication process back to the 28th so that any
other errors that may have gotten in during the botched write are
redone.

Thanks for the detailed debugging.

Toby

Re: Violation of primary key constraint

От
Tom Lane
Дата:
Toby Murray <toby.murray@gmail.com> writes:
> On Fri, Feb 1, 2013 at 3:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's conceivable that this was a software glitch not a hardware glitch,
>> ie Postgres forgetting the dirty-bits for a batch of pages, but we've
>> not seen any similar reports elsewhere.  So I'm leaning to the hardware
>> explanation.

> Roger that. Unless there is anything else to look at with the database
> in its current state, I will try to delete the duplicates and then
> backtrack my OSM replication process back to the 28th so that any
> other errors that may have gotten in during the botched write are
> redone.

I wish I could think of something else to look at, but right now
I can't.  You might as well revert the database.  I'd suggest REINDEXing
all the indexes too, since the theory of some dropped page updates would
imply that the indexes are probably not in sync.

It might be worth cranking up logging (eg log_statement = all),
so that if this does happen again, we'll have more to go on.

            regards, tom lane