Re: 16-bit page checksums for 9.2

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, david(at)fetter(dot)org, aidan(at)highrise(dot)ca, stark(at)mit(dot)edu, pgsql-hackers(at)postgresql(dot)org
Subject: Re: 16-bit page checksums for 9.2
Date: 2012-02-20 09:18:44
Message-ID: CA+U5nMLRtUokXQTnipof6+y3SdyXsYMU-3E8W9Jy=YiAWSLOYA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Feb 20, 2012 at 12:42 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> I think we could do worse than this patch, but I don't really believe
> it's ready for commit.  We don't have a single performance number
> showing how much of a performance regression this causes, either on
> the master or on the standby, on any workload, much less those where a
> problem is reasonably forseeable from the design you've chosen.  Many
> controversial aspects of the patch, such as the way you're using the
> buffer header spinlocks as a surrogate for x-locking the buffer, or
> the right way of obtaining the bit-space the patch requires, haven't
> really been discussed, and to the extent that they have been
> discussed, they have not been agreed.

These points are being discussed here. If you have rational
objections, say what they are.

> On the former point, you haven't updated
> src/backend/storage/buffer/README at all;

I've updated the appropriate README file which relates to page LSNs to explain.

> but updating is not by
> itself sufficient.  Before we change the buffer-locking rules, we
> ought to have some discussion of whether it's OK to do that, and why
> it's necessary.  "Why it's necessary" would presumably include
> demonstrating that the performance of the straightforward
> implementation stinks, and that with this change is better.

What straightforward implementation is that?? This *is* the straightforward one.

God knows what else we'd break if we drop the lock, reacquire as an
exclusive, then drop it again and reacquire in shared mode. Code tends
to assume that when you take a lock you hold it until you release;
doing otherwise would allow all manner of race conditions to emerge.

And do you really think that would be faster?

> You
> haven't made any effort to do that whatsoever, or if you have, you
> haven't posted the results here.  I'm pretty un-excited by that
> change, first because I think it's a modularity violation and possibly
> unsafe, and second because I believe we've already got a problem with
> buffer header spinlock contention which this might exacerbate.
>
>>> Another disadvantage of the current scheme is that there's no
>>> particularly easy way to know that your whole cluster has checksums.
>>> No matter how we implement checksums, you'll have to rewrite every
>>> table in the cluster in order to get them fully turned on.  But with
>>> the current design, there's no easy way to know how much of the
>>> cluster is actually checksummed.
>>
>> You can read every block and check.
>
> Using what tool?

Pageinspect?

>>> If you shut checksums off, they'll
>>> linger until those pages are rewritten, and there's no easy way to
>>> find the relations from which they need to be removed, either.
>>
>> We can't have it both ways. Either we have an easy upgrade, or we
>> don't. I'm told that an easy upgrade is essential.
>
> Easy upgrade and removal of checksums are unrelated issues AFAICS.
>
>>> I'm tempted to suggest a relation-level switch: when you want
>>> checksums, you use ALTER TABLE to turn them on, and when you don't
>>> want them any more you use ALTER TABLE to shut them off again, in each
>>> case rewriting the table.  That way, there's never any ambiguity about
>>> what's in the data pages in a given relation: either they're either
>>> all checksummed, or none of them are.  This moves the decision about
>>> whether checksums are enabled or disabled quite a but further away
>>> from the data itself, and also allows you to determine (by catalog
>>> inspection) which parts of the cluster do in fact have checksums.  It
>>> might be kind of a pain to implement, though: you'd have to pass the
>>> information about how any given relation was configured down to the
>>> place where we validate page sanity.  I'm not sure whether that's
>>> practical.
>>
>> It's not practical as the only mechanism, given the requirement for upgrade.
>
> Why not?
>
>> The version stuff is catered for by the current patch.
>
> Your patch reuses the version number field for an unrelated purpose
> and includes some vague hints of how we might do versioning using some
> of the page-level flag bits.  Since bit-space is fungible, I think it
> makes more sense to keep the version number where it is and carve the
> checksum out of whatever bit space you would have moved the version
> number into.  Whether that's pd_tli or pd_flags is somewhat secondary;
> the point is that you can certainly arrange to leave the 8 bits that
> currently contain the version number as they are.

If you've got some rational objections, please raise them.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Urbański 2012-02-20 10:05:43 Re: Potential reference miscounts and segfaults in plpython.c
Previous Message Albe Laurenz 2012-02-20 09:18:33 Re: pgsql_fdw, FDW for PostgreSQL server