Re: Online checksums patch - once again

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Online checksums patch - once again
Date: 2020-01-23 11:18:41
Message-ID: A5CA3D81-9C8A-4D04-987D-1BEB0559B3D3@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 22 Jan 2020, at 23:07, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jan 22, 2020 at 3:28 PM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
>>> I think the argument about adding catalog flags adding overhead is
>>> pretty much bogus. Fixed-width fields in catalogs are pretty cheap.
>>
>> If that's the general view, then yeah our "cost calculations" were
>> off. I guess I may have been colored by the cost of adding statistics
>> counters, and had that influence the thinking. Incorrect judgement on
>> that cost certainly contributed to the decision. back then.
>
> For either statistics or for pg_class, the amount of data that we have
> to manage is proportional to the number of relations (which could be
> big) multiplied by the data stored for each relation. But the
> difference is that the stats file has to be rewritten, at least on a
> per-database basis, very frequently, while pg_class goes through
> shared-buffers and so doesn't provoke the same stupid
> write-the-whole-darn-thing behavior. That is a pretty key difference,
> IMHO.

I think the cost is less about performance and more about carrying around an
attribute which wont be terribly interesting during the cluster lifetime,
except for the transition. But, it's as you say probably a manageable expense.

A bigger question is how to handle the offline capabilities. pg_checksums can
enable or disable checksums in an offline cluster, which will put the cluster
in a state where the pg_control file and the catalog don't match at startup.
One strategy could be to always trust the pg_control file and alter the catalog
accordingly, but that still leaves a window of inconsistent cluster state.

cheers ./daniel

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mahendra Singh Thalor 2020-01-23 12:21:00 Re: Error message inconsistency
Previous Message Amit Kapila 2020-01-23 10:47:03 Re: Parallel grouping sets