Re: BUG #14150: Attempted to delete invisible tuple

From: Peter Tripp <peter(at)chartio(dot)com>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: virendra(at)idyllic-software(dot)com, pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14150: Attempted to delete invisible tuple
Date: 2016-06-13 23:52:46
Message-ID: CACOLnRWad2M9wA70vvjQ2GMOZMrfNkVOYVtq8+=au6JJX1oKjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Thank you for your assistance in tracking this down Peter G. My apologies
for the delayed reply, I've was out sick in days following my original post.

I looked further at the code paths with DELETE statements and they are only
used for our test suite. Normally an external periodic task process
executes pruning, but this clean instance from Tuesday hadn't had a single
DELETE statement executed and the issue occurred anyways. I also forgot to
mention that in previous testing we had also tested an alternate version of
the SQL statement without the UPDATE which also resulted in the same
"attempted to delete invisible tuple"

````
INSERT INTO cache (key, value, expires_at) VALUES ($1, $2, $3) ON CONFLICT
(key) DO NOTHING
````

I've replied to your questions inline...

On Mon, Jun 13, 2016 at 1:45 PM, Peter Geoghegan <pg(at)heroku(dot)com> wrote:

> On Thu, May 19, 2016 at 5:33 AM, <virendra(at)idyllic-software(dot)com> wrote:
> > ## We have system where we perform lot of insert and update (no delete)
> and
> > some time upsert query.
> >
> > ## Error the ruby pg library report
> > PG::ObjectNotInPrerequisiteState: ERROR: attempted to delete
> invisible
> > tuple
>
> I am having difficulty isolating this bug, although I have some
> high-level theories [1]. (I won't have much time to work on it this
> week, but want to get ahead of that.)
>
> Any further details you can provide may be useful. For example:
>
> * How many clients are running when problems are seen?
>

This was with 92 concurrent client connections. These are pooled
connections for both reading/writing, I expect the actual number of
simultaneous transactions is far less, but 92 is an upper boundary.

> * How frequently can you see a problem? Is it easy for you to
> reproduce the problem, or is this something you only see every once in
> a long while?
>

Roughly 5-25 times an hour, every hour, relatively dispersed throughout the
hour. In a "bad hour" some of the errors are closely time correlated (2-3
within a second) but generally the errors are well dispersed throughout a
given hour and impact only a single insert with many minutes before it
occurs again. I can consistently reproduce the problem, but only under
production workloads. I've spent a couple hours trying to run simulations
but haven't come up with the correct mix/concurrency to trigger the issue.

* Do you notice any correlation between this issue and other things?
> For example, do you tend to notice that other queries are running at
> the same time? What do your logs look like around the time that the
> error is raised?

There doesn't appear to be an correlation. My metrics are somewhat coarse
for these instance though, so a short CPU burst might be lost in averages.
I haven't seen anything else in the logs/metrics/queries which correlates
with these events.

Ideally, you'd be able to build Postgres yourself, with the attached
> patch applied (the 9.5 release branch should be used, of course). This
> will provoke a hard crash in the event of hitting this problem. You
> can set things up such that this produces a core dump for further
> debugging here (for the directly affected backend):
>
>
> https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD
>
> I'm particularly interested in seeing a backtrace from the core dump.
> That should make the general nature of the problem clear.
>

If I can come up with a synthetic workload which triggers this I will
happily build a patched postgres and send along core dumps. I was
considering enabling full statement logging (including the up to 10MB JSON
objects) for half an hour to see if replaying all those statements triggers
the same error, but have not yet done so.

> Thanks
>
> [1]
> https://www.postgresql.org/message-id/CAM3SWZSf_R5g4w_md-etXyUdUtzYzWofFZ9qOzPOhbnrBUHz8A@mail.gmail.com
> --
> Peter Geoghegan
>

Thanks again for your help tracking this down.

Peter Tripp

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2016-06-14 07:16:58 Re: BUG #13907: Restore materialized view throw permission denied
Previous Message Peter Geoghegan 2016-06-13 21:58:58 Re: BUG #14150: Attempted to delete invisible tuple