Quick Links

Re: Online enabling of checksums

From:	Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To:	Magnus Hagander <magnus(at)hagander(dot)net>
Cc:	Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Banck <michael(dot)banck(at)credativ(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject:	Re: Online enabling of checksums
Date:	2018-04-03 22:11:49
Message-ID:	82296c92-7124-dc5b-4f00-053f302d7233@2ndquadrant.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 04/03/2018 02:05 PM, Magnus Hagander wrote:
> On Sun, Apr 1, 2018 at 2:04 PM, Magnus Hagander <magnus(at)hagander(dot)net
> <mailto:magnus(at)hagander(dot)net>> wrote:
>
> On Sat, Mar 31, 2018 at 5:38 PM, Tomas Vondra
> <tomas(dot)vondra(at)2ndquadrant(dot)com <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>
> wrote:
>
> On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> > <tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com
> <mailto:tomas(dot)vondra(at)2ndquadrant(dot)com>>> wrote:
> >
> > ...
> >
> > I do think just waiting for all running transactions to complete is
> > fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
> > does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
> > too, to some extent). So we have a precedent / working code we can copy.
> >
> >
> > Thinking again, I don't think it should be done as part of
> > BuildRelationList(). We should just do it once in the launcher before
> > starting, that'll be both easier and cleaner. Anything started after
> > that will have checksums on it, so we should be fine.
> >
> > PFA one that does this.
> >
>
> Seems fine to me. I'd however log waitforxid, not the oldest one. If
> you're a DBA and you want to make the checksumming to proceed,
> knowing
> the oldest running XID is useless for that. If we log
> waitforxid, it can
> be used to query pg_stat_activity and interrupt the sessions
> somehow.
>
>
> Yeah, makes sense. Updated.
>
>
>
> > > And if you try this with a temporary table (not
> hidden in transaction,
> > > so the bgworker can see it), the worker will fail
> with this:
> > >
> > > ERROR: cannot access temporary tables of other
> sessions
> > >
> > > But of course, this is just another way how to crash
> without updating
> > > the result for the launcher, so checksums may end up
> being enabled
> > > anyway.
> > >
> > >
> > > Yeah, there will be plenty of side-effect issues from that
> > > crash-with-wrong-status case. Fixing that will at least
> make things
> > > safer -- in that checksums won't be enabled when not put
> on all pages.
> > >
> >
> > Sure, the outcome with checksums enabled incorrectly is a
> consequence of
> > bogus status, and fixing that will prevent that. But that
> wasn't my main
> > point here - not articulated very clearly, though.
> >
> > The bigger question is how to handle temporary tables
> gracefully, so
> > that it does not terminate the bgworker like this at all.
> This might be
> > even bigger issue than dropped relations, considering that
> temporary
> > tables are pretty common part of applications (and it also
> includes
> > CREATE/DROP).
> >
> > For some clusters it might mean the online checksum
> enabling would
> > crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> >
> > Unfortunately, try_relation_open() won't fix this, as the
> error comes
> > from ReadBufferExtended. And it's not a matter of simply
> creating a
> > ReadBuffer variant without that error check, because
> temporary tables
> > use local buffers.
> >
> > I wonder if we could just go and set the checksums anyway,
> ignoring the
> > local buffers. If the other session does some changes,
> it'll overwrite
> > our changes, this time with the correct checksums. But it
> seems pretty
> > dangerous (I mean, what if they're writing stuff while
> we're updating
> > the checksums? Considering the various short-cuts for
> temporary tables,
> > I suspect that would be a boon for race conditions.)
> >
> > Another option would be to do something similar to running
> transactions,
> > i.e. wait until all temporary tables (that we've seen at
> the beginning)
> > disappear. But we're starting to wait on more and more stuff.
> >
> > If we do this, we should clearly log which backends we're
> waiting for,
> > so that the admins can go and interrupt them manually.
> >
> >
> >
> > Yeah, waiting for all transactions at the beginning is pretty
> simple.
> >
> > Making the worker simply ignore temporary tables would also be
> easy.
> >
> > One of the bigger issues here is temporary tables are
> *session* scope
> > and not transaction, so we'd actually need the other session
> to finish,
> > not just the transaction.
> >
> > I guess what we could do is something like this:
> >
> > 1. Don't process temporary tables in the checksumworker, period.
> > Instead, build a list of any temporary tables that existed
> when the
> > worker started in this particular database (basically anything
> that we
> > got in our scan). Once we have processed the complete
> database, keep
> > re-scanning pg_class until those particular tables are gone
> (search by oid).
> >
> > That means that any temporary tables that are created *while*
> we are
> > processing a database are ignored, but they should already be
> receiving
> > checksums.
> >
> > It definitely leads to a potential issue with long running
> temp tables.
> > But as long as we look at the *actual tables* (by oid), we
> should be
> > able to handle long-running sessions once they have dropped
> their temp
> > tables.
> >
> > Does that sound workable to you?
> >
>
> Yes, that's pretty much what I meant by 'wait until all
> temporary tables
> disappear'. Again, we need to make it easy to determine which
> OIDs are
> we waiting for, which sessions may need DBA's attention.
>
> I don't think it makes sense to log OIDs of the temporary
> tables. There
> can be many of them, and in most cases the connection/session is
> managed
> by the application, so the only thing you can do is kill the
> connection.
>
>
> Yeah, agreed. I think it makes sense to show the *number* of temp
> tables. That's also a predictable amount of information -- logging
> all temp tables may as you say lead to an insane amount of data.
>
> PFA a patch that does this. I've also added some docs for it.
>
> And I also noticed pg_verify_checksums wasn't installed, so fixed
> that too.
>
>
> PFA a rebase on top of the just committed verify-checksums patch.
>

This seems OK in terms of handling errors in the worker and passing it
to the launcher. I haven't managed to do any crash testing today, but
code-wise it seems sane.

It however still fails to initialize the attempts field after allocating
the db entry in BuildDatabaseList, so if you try running with
-DRANDOMIZE_ALLOCATED_MEMORY it'll get initialized to values like this:

WARNING: attempts = -1684366952
WARNING: attempts = 1010514489
WARNING: attempts = -1145390664
WARNING: attempts = 1162101570

I guess those are not the droids we're looking for?

Likewise, I don't see where ChecksumHelperShmemStruct->abort gets
initialized. I think it only ever gets set in launcher_exit(), but that
does not seem sufficient. I suspect it's the reason for this behavior:

test=# select pg_enable_data_checksums(10, 10);
ERROR: database "template0" does not allow connections
HINT: Allow connections using ALTER DATABASE and try again.
test=# alter database template0 allow_connections = true;
ALTER DATABASE
test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: already running
test=# select pg_disable_data_checksums();
pg_disable_data_checksums
---------------------------

(1 row)

test=# select pg_enable_data_checksums(10, 10);
ERROR: could not start checksumhelper: has been cancelled

At which point the only thing you can do is restarting the cluster,
which seems somewhat unnecessary. But perhaps it's intentional?

Attached is a diff with a couple of minor comment tweaks, and correct
initialization of the attempts field.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment	Content-Type	Size
checksums-tweaks.diff	text/x-patch	3.1 KB

In response to

Re: Online enabling of checksums at 2018-04-03 12:05:04 from Magnus Hagander

Responses

Re: Online enabling of checksums at 2018-04-05 09:07:26 from Magnus Hagander

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2018-04-03 23:01:50	Re: [HACKERS] Restrict concurrent update/delete with UPDATE of partition key
Previous Message	Thomas Munro	2018-04-03 22:07:10	Re: Optimize Arm64 crc32c implementation in Postgresql