Quick Links

Re: Online enabling of checksums

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject:	Re: Online enabling of checksums
Date:	2018-03-31 15:38:54
Message-ID:	fe76f96e-9dfe-3725-3f0a-ed3ffe6146a3@2ndquadrant.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>> wrote:
>
> ...
>
> I do think just waiting for all running transactions to complete is
> fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
> does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
> too, to some extent). So we have a precedent / working code we can copy.
>
>
> Thinking again, I don't think it should be done as part of
> BuildRelationList(). We should just do it once in the launcher before
> starting, that'll be both easier and cleaner. Anything started after
> that will have checksums on it, so we should be fine.
>
> PFA one that does this.
>

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

>
> > And if you try this with a temporary table (not hidden in transaction,
> > so the bgworker can see it), the worker will fail with this:
> >
> > ERROR: cannot access temporary tables of other sessions
> >
> > But of course, this is just another way how to crash without updating
> > the result for the launcher, so checksums may end up being enabled
> > anyway.
> >
> >
> > Yeah, there will be plenty of side-effect issues from that
> > crash-with-wrong-status case. Fixing that will at least make things
> > safer -- in that checksums won't be enabled when not put on all pages.
> >
>
> Sure, the outcome with checksums enabled incorrectly is a consequence of
> bogus status, and fixing that will prevent that. But that wasn't my main
> point here - not articulated very clearly, though.
>
> The bigger question is how to handle temporary tables gracefully, so
> that it does not terminate the bgworker like this at all. This might be
> even bigger issue than dropped relations, considering that temporary
> tables are pretty common part of applications (and it also includes
> CREATE/DROP).
>
> For some clusters it might mean the online checksum enabling would
> crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>
> Unfortunately, try_relation_open() won't fix this, as the error comes
> from ReadBufferExtended. And it's not a matter of simply creating a
> ReadBuffer variant without that error check, because temporary tables
> use local buffers.
>
> I wonder if we could just go and set the checksums anyway, ignoring the
> local buffers. If the other session does some changes, it'll overwrite
> our changes, this time with the correct checksums. But it seems pretty
> dangerous (I mean, what if they're writing stuff while we're updating
> the checksums? Considering the various short-cuts for temporary tables,
> I suspect that would be a boon for race conditions.)
>
> Another option would be to do something similar to running transactions,
> i.e. wait until all temporary tables (that we've seen at the beginning)
> disappear. But we're starting to wait on more and more stuff.
>
> If we do this, we should clearly log which backends we're waiting for,
> so that the admins can go and interrupt them manually.
>
>
>
> Yeah, waiting for all transactions at the beginning is pretty simple.
>
> Making the worker simply ignore temporary tables would also be easy.
>
> One of the bigger issues here is temporary tables are *session* scope
> and not transaction, so we'd actually need the other session to finish,
> not just the transaction.
>
> I guess what we could do is something like this:
>
> 1. Don't process temporary tables in the checksumworker, period.
> Instead, build a list of any temporary tables that existed when the
> worker started in this particular database (basically anything that we
> got in our scan). Once we have processed the complete database, keep
> re-scanning pg_class until those particular tables are gone (search by oid).
>
> That means that any temporary tables that are created *while* we are
> processing a database are ignored, but they should already be receiving
> checksums.
>
> It definitely leads to a potential issue with long running temp tables.
> But as long as we look at the *actual tables* (by oid), we should be
> able to handle long-running sessions once they have dropped their temp
> tables.
>
> Does that sound workable to you?
>

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/session is managed
by the application, so the only thing you can do is kill the connection.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Re: Online enabling of checksums at 2018-03-31 15:05:28 from Magnus Hagander

Responses

Re: Online enabling of checksums at 2018-04-01 12:04:38 from Magnus Hagander

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2018-03-31 15:41:32	Re: [HACKERS] Runtime Partition Pruning
Previous Message	Peter Eisentraut	2018-03-31 15:32:01	Re: pgsql: Add documentation for the JIT feature.