Re: Online enabling of checksums

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Online enabling of checksums
Date: 2018-04-03 12:05:04
Message-ID: CABUevEyvSfDC_U+SPKp5xa9C3wUm0vRba2T-P4i4=rfVuYrmqA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus(at)hagander(dot)net> wrote:

> On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra <
> tomas(dot)vondra(at)2ndquadrant(dot)com> wrote:
>
>> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
>> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
>> > <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>
>> wrote:
>> >
>> > ...
>> >
>> > I do think just waiting for all running transactions to complete is
>> > fine, and it's not the first place where we use it - CREATE
>> SUBSCRIPTION
>> > does pretty much exactly the same thing (and CREATE INDEX
>> CONCURRENTLY
>> > too, to some extent). So we have a precedent / working code we can
>> copy.
>> >
>> >
>> > Thinking again, I don't think it should be done as part of
>> > BuildRelationList(). We should just do it once in the launcher before
>> > starting, that'll be both easier and cleaner. Anything started after
>> > that will have checksums on it, so we should be fine.
>> >
>> > PFA one that does this.
>> >
>>
>> Seems fine to me. I'd however log waitforxid, not the oldest one. If
>> you're a DBA and you want to make the checksumming to proceed, knowing
>> the oldest running XID is useless for that. If we log waitforxid, it can
>> be used to query pg_stat_activity and interrupt the sessions somehow.
>>
>
> Yeah, makes sense. Updated.
>
>
>
>> > > And if you try this with a temporary table (not hidden in
>> transaction,
>> > > so the bgworker can see it), the worker will fail with this:
>> > >
>> > > ERROR: cannot access temporary tables of other sessions
>> > >
>> > > But of course, this is just another way how to crash without
>> updating
>> > > the result for the launcher, so checksums may end up being
>> enabled
>> > > anyway.
>> > >
>> > >
>> > > Yeah, there will be plenty of side-effect issues from that
>> > > crash-with-wrong-status case. Fixing that will at least make
>> things
>> > > safer -- in that checksums won't be enabled when not put on all
>> pages.
>> > >
>> >
>> > Sure, the outcome with checksums enabled incorrectly is a
>> consequence of
>> > bogus status, and fixing that will prevent that. But that wasn't my
>> main
>> > point here - not articulated very clearly, though.
>> >
>> > The bigger question is how to handle temporary tables gracefully, so
>> > that it does not terminate the bgworker like this at all. This
>> might be
>> > even bigger issue than dropped relations, considering that temporary
>> > tables are pretty common part of applications (and it also includes
>> > CREATE/DROP).
>> >
>> > For some clusters it might mean the online checksum enabling would
>> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>> >
>> > Unfortunately, try_relation_open() won't fix this, as the error
>> comes
>> > from ReadBufferExtended. And it's not a matter of simply creating a
>> > ReadBuffer variant without that error check, because temporary
>> tables
>> > use local buffers.
>> >
>> > I wonder if we could just go and set the checksums anyway, ignoring
>> the
>> > local buffers. If the other session does some changes, it'll
>> overwrite
>> > our changes, this time with the correct checksums. But it seems
>> pretty
>> > dangerous (I mean, what if they're writing stuff while we're
>> updating
>> > the checksums? Considering the various short-cuts for temporary
>> tables,
>> > I suspect that would be a boon for race conditions.)
>> >
>> > Another option would be to do something similar to running
>> transactions,
>> > i.e. wait until all temporary tables (that we've seen at the
>> beginning)
>> > disappear. But we're starting to wait on more and more stuff.
>> >
>> > If we do this, we should clearly log which backends we're waiting
>> for,
>> > so that the admins can go and interrupt them manually.
>> >
>> >
>> >
>> > Yeah, waiting for all transactions at the beginning is pretty simple.
>> >
>> > Making the worker simply ignore temporary tables would also be easy.
>> >
>> > One of the bigger issues here is temporary tables are *session* scope
>> > and not transaction, so we'd actually need the other session to finish,
>> > not just the transaction.
>> >
>> > I guess what we could do is something like this:
>> >
>> > 1. Don't process temporary tables in the checksumworker, period.
>> > Instead, build a list of any temporary tables that existed when the
>> > worker started in this particular database (basically anything that we
>> > got in our scan). Once we have processed the complete database, keep
>> > re-scanning pg_class until those particular tables are gone (search by
>> oid).
>> >
>> > That means that any temporary tables that are created *while* we are
>> > processing a database are ignored, but they should already be receiving
>> > checksums.
>> >
>> > It definitely leads to a potential issue with long running temp tables.
>> > But as long as we look at the *actual tables* (by oid), we should be
>> > able to handle long-running sessions once they have dropped their temp
>> > tables.
>> >
>> > Does that sound workable to you?
>> >
>>
>> Yes, that's pretty much what I meant by 'wait until all temporary tables
>> disappear'. Again, we need to make it easy to determine which OIDs are
>> we waiting for, which sessions may need DBA's attention.
>>
>> I don't think it makes sense to log OIDs of the temporary tables. There
>> can be many of them, and in most cases the connection/session is managed
>> by the application, so the only thing you can do is kill the connection.
>>
>
> Yeah, agreed. I think it makes sense to show the *number* of temp tables.
> That's also a predictable amount of information -- logging all temp tables
> may as you say lead to an insane amount of data.
>
> PFA a patch that does this. I've also added some docs for it.
>
> And I also noticed pg_verify_checksums wasn't installed, so fixed that too.
>
>
PFA a rebase on top of the just committed verify-checksums patch.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

Attachment Content-Type Size
online_checksums10.patch text/x-patch 81.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-04-03 13:01:53 Re: [HACKERS] Add support for tuple routing to foreign partitions
Previous Message Amit Langote 2018-04-03 12:02:50 Re: [HACKERS] path toward faster partition pruning