Re: Online enabling of checksums

From: Michael Banck <michael(dot)banck(at)credativ(dot)de>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Subject: Re: Online enabling of checksums
Date: 2018-03-18 14:02:35
Message-ID: 20180318140235.GC469@nighthawk.caipicrew.dd-dns.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Thu, Mar 15, 2018 at 02:01:26PM +0100, Daniel Gustafsson wrote:
> > On 10 Mar 2018, at 16:09, Michael Banck <michael(dot)banck(at)credativ(dot)de> wrote:
> > I am aware that this is discussed already, but as-is the user experience
> > is pretty bad, I think pg_enable_data_checksums() should either bail
> > earlier if it cannot connect to all databases, or it should be better
> > documented.
>
> Personally I think we should first attempt to solve the "allow-connections in
> background workers” issue which has been raised on another thread. For now I’m
> documenting this better.

I had a look at that thread and it seems stalled, I am a bit worried
that this will not be solved before the end of the CF.

So I think unless the above gets solved, pg_enable_data_checksums()
should error out with the hint. I've had a quick look and it seems one
can partly duplicate the check from BuildDatabaseList() (or optionally
move it) to the beginning of StartChecksumHelperLauncher(), see
attached.

That results in:

postgres=# SELECT pg_enable_data_checksums();
ERROR: Database "template0" does not allow connections.
HINT: Allow connections using ALTER DATABASE and try again.
postgres=#

Which I think is much nice than what we have right now:

postgres=# SELECT pg_enable_data_checksums();
pg_enable_data_checksums
--------------------------
t
(1 row)

postgres=# \q
postgres(at)fock:~$ tail -3 pg1.log
2018-03-18 14:00:08.512 CET [25514] ERROR: Database "template0" does not allow connections.
2018-03-18 14:00:08.512 CET [25514] HINT: Allow connections using ALTER DATABASE and try again.
2018-03-18 14:00:08.513 CET [24930] LOG: background worker "checksum helper launcher" (PID 25514) exited with exit code 1

> Attached v4 of this patch, which addresses this review, and flipping status
> back in the CF app back to Needs Review.

Thanks!

The following errmsg() capitalize the error message without the first
word being a specific term, which I believe is not project style:

+ (errmsg("Checksum helper is currently running, cannot disable checksums"),
+ (errmsg("Database \"%s\" dropped, skipping", db->dbname)));
+ (errmsg("Checksums enabled, checksumhelper launcher shutting down")));
+ (errmsg("Database \"%s\" does not allow connections.", NameStr(pgdb->datname)),
+ (errmsg("Checksum worker starting for database oid %d", dboid)));
+ (errmsg("Checksum worker completed in database oid %d", dboid)));

Also, in src/backend/postmaster/checksumhelper.c there are few
multi-line comments (which are not function comments) that do not have a
full stop at the end, which I think is also project style:

+ * Failed to set means somebody else started

Could be changed to a one-line (/* ... */) comment?

+ * Force a checkpoint to get everything out to disk

Should have a full stop.

+ * checksummed, so skip

Should have a full stop.

+ * Enable vacuum cost delay, if any

Could be changed to a one-line comment?

+ * Create and set the vacuum strategy as our buffer strategy

Could be changed to a one-line comment?

Apart from that, I previously complained about the error in
pg_verify_checksums:

+ fprintf(stderr, _("%s: %s, block %d, invalid checksum in file %X, calculated %X\n"),
+ progname, fn, blockno, header->pd_checksum, csum);

I still propose something like in backend/storage/page/bufpage.c's
PageIsVerified(), e.g.:

|fprintf(stderr, _("%s: checksum verification failed in file \"%s\", block %d: calculated checksum %X but expected %X\n"),
| progname, fn, blockno, csum, header->pd_checksum);

Otherwise, I had a quick look over v4 and found no further issues.
Hopefully I will be able to test it on some bigger test databases next
week.

I'm switching the state back to 'Waiting on Author'; if you think the
above points are moot, maybe switch it back to 'Needs Review' as Andrey
Borodin also marked himself down as reviewer and might want to have
another look as well.

Cheers,

Michael

--
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax: +49 2166 9901-100
Email: michael(dot)banck(at)credativ(dot)de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Attachment Content-Type Size
online_checksums_check_datallowconn.patch text/x-diff 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2018-03-18 16:21:26 Re: Online enabling of checksums
Previous Message Joe Wildish 2018-03-18 12:29:50 Re: Implementing SQL ASSERTION