Re: INSERT ... ON CONFLICT UPDATE and logical decoding

From: Peter Geoghegan <pg(at)heroku(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Subject: Re: INSERT ... ON CONFLICT UPDATE and logical decoding
Date: 2015-02-25 22:04:55
Message-ID: CAM3SWZT2rGLm9S-9JZ+1yrQkXsFhbKudFWYAb+57ho=B9yGJGA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 25, 2015 at 3:26 AM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> I'm pretty sure this will entirely fail if you have a transaction that's
> large enough to spill to disk. Calling ReorderBufferIterTXNNext() will
> reuse the memory for the in memory changes.
>
> It's also borked because it skips a large number of checks for the next
> change. You're entirely disregarding what the routine does otherwise -
> it could be a toast record, it could be a change to a system table, it
> could be a new snapshot...
>
> Also, this seems awfully fragile in th presence of toast rows and such?
> Can we be entirely sure that the next WAL record logged by that session
> would be the internal super deletion?

toast_save_datum() is called with a heap_insert() call before heap
insertion for the tuple proper. We're relying on the assumption that
if there is no immediate super deletion record, things are fine. We
cannot speculatively insert into catalogs, BTW. Why should we see a
REORDER_BUFFER_CHANGE_INTERNAL_COMMAND_ID case next, in the case that
we need to care about (when the tuple was super deleted immediately
afterwards)?

Now, I fully admit that as written, the quick sketch patch is
unacceptably fragile (I think it might accidentally work despite this,
leaving aside concerns about resource lifespans). The "peek ahead"
needs to be more robust, for sure, most obviously by not failing when
we spill to disk, but also by being paranoid about super deletion
records. Basically, by "more paranoid", I mean the next thing
immediately following a speculatively inserted tuple might not be a
super deletion record, but we still look until we conclude that there
won't be one (based on something else - some "terminating condition"
- happening, including the xact committing). Other examples of
terminating conditions would be new, unrelated
REORDER_BUFFER_CHANGE_INSERT/ REORDER_BUFFER_CHANGE_UPDATE/
REORDER_BUFFER_CHANGE_DELETE changes. We can make some assumptions
about when it's safe to conclude that there was no super deletion that
are not fragile.

As for the idea of writing WAL separately, I don't think it's worth
it. We'd need to go exclusive lock the buffer again, which seems like
a non-starter. While what I have here *is* very ugly, I see no better
alternative. By doing what you suggest, we'd be finding a special case
way of safely violating the general "WAL log-before-data" rule.
Performance aside, that seems like a worse, less isolated kludge...no?
--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2015-02-25 22:06:02 Question about lazy_space_alloc() / linux over-commit
Previous Message Andres Freund 2015-02-25 21:19:45 Re: collations in shared catalogs?