Re: Online checksums patch - once again

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online checksums patch - once again
Date: 2020-01-22 19:50:47
Message-ID: CABUevEwns5iuvJ_kH_9KXNyo3Z8=RHB-DM1-TKP26yoVo0=xvg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jan 20, 2020 at 12:14 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Jan 18, 2020 at 6:18 PM Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
> > Thanks again for reviewing (and working on the infrastructure required for this
> > patch to begin with)! Regarding the persisting the progress; that would be a
> > really neat feature but I don't have any suggestion on how to do that safely
> > for real use-cases.
>
> Leaving to one side the question of how much work is involved, could
> we do something conceptually similar to relfrozenxid/datfrozenxid,
> i.e. use catalog state to keep track of which objects have been
> handled and which not?
>
> Very rough sketch:
>
> * set a flag indicating that checksums must be computed for all page writes
> * use barriers and other magic to make sure everyone has gotten the
> memo from the previous step
> * use new catalog fields pg_class.relhaschecksums and
> pg_database.dathaschecksums to track whether checksums are enabled
> * keep launching workers for databases where !pg_class.dathaschecksums
> until none remain
> * mark checksums as fully enabled
> * party

We did discuss this back when we started work on this (I can't
remember if it was just me and Daniel and someone else or on a list --
but that's not important atm).

The reasoning that led us to *not* doing that is that it's a one-off
operation. That along with the fact that we hope to at some point be
able to change the default to chekcsums on (and t wouldn't be
necessary for the transition on->off as that is very fast), it would
become an increasingly rate on-off operation. And by adding these
flags to the catalogs, everybody is paying the overhead for this
one-off rare operation. Another option would be to add the flag on the
pg_database level which would decrease the overhead, but our guess was
that this would also decrease the usefulness in most cases to make it
not worth it (most people with big databases don't have many big
databases in the same cluster -- it's usually just one or two, so in
the end the results would be more or less the same was we have now as
it would have to keep re-doing the big ones)

Unless we actually want to support running systems more or less
permanently with some tables with checksums and other tables without
checksums. But that's going to have an effect on the validation of
checksums that would generate a huge overhead (since each buffer check
would have to look up the pg_class entry).

FYI, Daniel is working on an update that will include this -- so we
can see what the actual outcome is of it in th case of complexity as
well. Should hopefully be ready soon.

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-01-22 19:57:01 Re: making the backend's json parser work in frontend code
Previous Message Alvaro Herrera 2020-01-22 19:26:33 Re: making the backend's json parser work in frontend code