Re: Violation of primary key constraint

From: Toby Murray <toby(dot)murray(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Violation of primary key constraint
Date: 2013-02-01 04:57:02
Message-ID: CAJeqKgvgFRawT8tL21XtVdw7jjh7BOJXceBLVr1FA+Tf6Pc_-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thu, Jan 31, 2013 at 5:43 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Toby Murray <toby(dot)murray(at)gmail(dot)com> writes:
>> I just had some interaction with RhodiumToad on IRC about a duplicated
>> primary key problem I ran into today. After some poking around he
>> suggested that I send this to -bugs since it seems like an interesting
>> error.
>
> I poked around in the PK index file (thanks for sending that) and could
> not find anything that looks wrong. There are a lot of duplicate keys
> (a few of the keys appear more than a thousand times) but I think this
> is just the result of update activity that hasn't been vacuumed away
> yet. I count 181340233 leaf index tuples bearing 168352931 distinct
> key values --- that makes for a dead-tuple fraction of 7.7% which is
> not quite enough to trigger an autovacuum, so it's not terribly
> surprising that the dups are still present.
>
> At this point it seems that it's not the index's fault. What seems more
> likely is that somehow the older heap entry failed to get marked "dead"
> after an UPDATE.
>
>> ... Especially the one with the ID
>> 26709186 since it hasn't been changed in OpenStreetMap in years so
>> there is no reason for it to have been touched in any way since the
>> import.
>
> Yeah, it's a bit hard to explain that this way unless there was an
> UPDATE that didn't change the timestamp or version. How sure are you
> that the updating process always changes those?

Pretty sure. The minutely change stream coming from OSM is generated
from all objects that were modified since the last diff was generated,
based on transaction numbers in their postgresql database. The way
changes get into that database is through the rails API which enforces
version number bumps on upload and sets its own timestamp based on
when the upload came in. The only way to apply an update that doesn't
change anything would be to apply a minutely diff from the past to an
up-to-date database. This does happen right when you start updating
from minutely diffs after a clean import but the overlap is a matter
of hours, just to ensure there is no gap - not something from 2008.

>> Here are some queries and their results that RhodiumToad had me run to
>> try and track things down:
>
> Could we see the full results of heap_page_items(get_raw_page()) for
> each of the pages where any of these tuples live, viz
>
> 11249625
> 1501614
> 11247884
> 1520052
> 1520056
> 11249780
> 1528888
> 11249622
>

Seems awkward to put inline so I made a text file with these results
and am attaching it. Hoping the mailing list allows attachments.

Toby

Attachment Content-Type Size
heap_page_items.txt text/plain 14.6 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Toby Murray 2013-02-01 05:21:45 Re: Violation of primary key constraint
Previous Message Pius Chan 2013-02-01 01:27:34 Re: BUG #7819: missing chunk number 0 for toast value 1235919 in pg_toast_35328