Re: Checksum errors in pg_stat_database

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Michael Paquier <michael(at)paquier(dot)xyz>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksum errors in pg_stat_database
Date: 2019-04-02 05:43:12
Message-ID: CAOBaU_ad_VpR-WS5YxDh2fE+Gz1_sDs+sCeaKtjJrL3KNN=yXg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 2, 2019 at 6:56 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
>
> On Sat, Mar 30, 2019 at 06:15:11PM +0100, Julien Rouhaud wrote:
> > I'd also have to get more feedback on this. For now, I'll add this
> > thread to the pg12 open items, as a follow up of the initial code
> > drop.
>
> Catching up here... I think that having a completely separate view
> with one row for each database and one row for shared objects makes
> the most sense based on what has been proposed on this thread. Being
> able to track checksum failures for shared catalogs is really
> something I'd like to be able to see easily, and I have seen
> corruption involving such objects from time to time. I think that we
> should have a design which is extensible.

Ok!

> One thing which is not
> proposed on this patch, and I am fine with it as a first draft, is
> that we don't have any information about the broken block number and
> the file involved. My gut tells me that we'd want a separate view,
> like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
> blck) to be complete. But that's just for future work.

That could indeed be nice.

> For the progress part, we would most likely have a separate view for
> that as well, as the view should show no rows if there is no operation
> in progress.

Ok.

> The patch looks rather clean to me, I have some comments.
>
> - <application>pg_checksums</application>. The exit status is zero if there
> - are no checksum errors when checking them, and nonzero if at least one
> - checksum failure is detected. If enabling or disabling checksums, the
> - exit status is nonzero if the operation failed.
> + <application>pg_checksums</application>. As a consequence, the
> + <structname>pg_stat_checksums</structnameview won't reflect this activity.
> + The exit status is zero if there are no checksum errors when checking them,
> + and nonzero if at least one checksum failure is detected. If enabling or
> + disabling checksums, the exit status is nonzero if the operation failed.
>
> The docs of pg_checksums already clearly state that the cluster needs
> to be offline, so I am not sure that this addition is necessary.

Agreed, removed.

> @@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid,
> int failurecount)
>
> Please note that there is no need to have the list of arguments in the
> comment block at the top of pgstat_report_checksum_failures_in_db().

Indeed, fixed.

> + if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
> + result = 0;
> + else
> + result = dbentry->last_checksum_failure;
> +
> + if (result == 0)
> + PG_RETURN_NULL();
> + else
> + PG_RETURN_TIMESTAMPTZ(result);
> +}
>
> No need for two ifs here. What about just that?
> if (NULL)
> PG_RETURN_NULL();
> else
> PG_RETURN_TIMESTAMPTZ(last_checksum_failure);

I do agree, but this is done like this everywhere in pgstatfuncs.c, so
I think it's better to keep it as-is for consistency.

Attachment Content-Type Size
pg_stat_checksums-v3.diff text/x-patch 11.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Павлухин Иван 2019-04-02 05:48:31 Re: Column lookup in a row performance
Previous Message Pavel Stehule 2019-04-02 05:39:49 Re: proposal: type info support functions for functions that use "any" type