Re: Online enabling of checksums

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Magnus Hagander <magnus(at)hagander(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject: Re: Online enabling of checksums
Date: 2018-02-25 21:26:05
Message-ID: 88e9e5ed-8eec-4bdd-14bb-8719394a5212@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/24/2018 03:51 AM, Stephen Frost wrote:
> Greetings,
>
> * Tomas Vondra (tomas(dot)vondra(at)2ndquadrant(dot)com) wrote:
>> On 02/24/2018 03:11 AM, Andres Freund wrote:
>>> On 2018-02-24 03:07:28 +0100, Tomas Vondra wrote:
>>>> I agree having to restart the whole operation after a crash is not
>>>> ideal, but I don't see how adding a flag actually solves it. The problem
>>>> is the large databases often store most of the data (>80%) in one or two
>>>> central tables (think fact tables in star schema, etc.). So if you
>>>> crash, it's likely half-way while processing this table, so the whole
>>>> table would still have relchecksums=false and would have to be processed
>>>> from scratch.
>>>
>>> I don't think it's quite as large a problem as you make it out to
>>> be. Even in those cases you'll usually have indexes, toast tables and so
>>> forth.
>>
>> Hmmm, right. I've been focused on tables and kinda forgot that the other
>> objects need to be transformed too ... :-/
>
> There's also something of a difference between just scanning a table or
> index, where you don't have to do much in the way of actual writes
> because most of the table already has valid checksums, and having to
> actually write out all the changes.
>
>>>> But perhaps you meant something like "position" instead of just a simple
>>>> true/false flag?
>>>
>>> I think that'd incur a much larger complexity cost.
>>
>> Yep, that was part of the point that I was getting to - that actually
>> addressing the issue would be more expensive than simple flags. But as
>> you pointed out, that was not quite ... well thought through.
>
> No, but it's also not entirely wrong. Huge tables aren't uncommon.
>
> That said, I'm not entirely convinced that these new flags would be
> as unnoticed as is being suggested here, but rather than focus on
> either side of that, I'm thinking about what we want to have *next*-
> we know that enabling/disabling checksums is an issue that needs to
> be solved, and this patch is making progress towards that, but the
> next question is what does one do when a page has been detected as
> corrupted? Are there flag fields which would be useful to have at a
> per-relation level to support some kind of corrective action or
> setting that says "don't care about checksums on this table, even
> though the entire database is supposed to have valid checksums, but
> instead do X with failed pages" or similar.
>

Those questions are definitely worth asking, and I agree our ability to
respond to data corruption (incorrect checksums) needs improvements. But
I don't really see how a single per-relation flag will make any that any
easier?

Perhaps there are other flags/fields that might help, like for example
the maximum number of checksum errors per relation (although I don't
consider that very useful in practice), but that seems rather unrelated
to this patch.

> Beyond dealing with corruption-recovery cases, are there other use
> cases for having a given table not have checksums?
>

Well, I see checksums are a way to detect data corruption caused by
storage, so if you have tablespaces backed by different storage systems,
you could disable checksums for objects on the storage you 100% trust.
That would limit the overhead of computing checksums.

But then again, this seems entirely unrelated to the patch discussed
here. That would obviously require flags in catalogs, and if the patch
eventually adds flags those would need to be separate.

> Would it make sense to introduce a flag or field which indicates that
> an entire table's pages has some set of attributes, of which
> 'checksums' is just one attribute? Perhaps a page version, which
> potentially allows us to have a way to change page layouts in the
> future?
>
> I'm happy to be told that we simply don't have enough information at
> this point to make anything larger than a relchecksums field-level
> decision, but perhaps these thoughts will spark an idea about how we
> could define something a bit broader with clear downstream
> usefulness that happens to also cover the "does this relation have
> checksums?" question.
>

I don't know. But I think we need to stop moving the goalposts further
and further away, otherwise we won't get anything until PostgreSQL 73.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2018-02-25 21:38:52 Re: TupleTableSlot abstraction
Previous Message Anthony Communier 2018-02-25 20:05:48 Using JSONB directly from application