Re: Checksum errors in pg_stat_database

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Magnus Hagander <magnus(at)hagander(dot)net>, David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksum errors in pg_stat_database
Date: 2019-04-02 04:56:19
Message-ID: 20190402045619.GM16093@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 30, 2019 at 06:15:11PM +0100, Julien Rouhaud wrote:
> I'd also have to get more feedback on this. For now, I'll add this
> thread to the pg12 open items, as a follow up of the initial code
> drop.

Catching up here... I think that having a completely separate view
with one row for each database and one row for shared objects makes
the most sense based on what has been proposed on this thread. Being
able to track checksum failures for shared catalogs is really
something I'd like to be able to see easily, and I have seen
corruption involving such objects from time to time. I think that we
should have a design which is extensible. One thing which is not
proposed on this patch, and I am fine with it as a first draft, is
that we don't have any information about the broken block number and
the file involved. My gut tells me that we'd want a separate view,
like pg_stat_checksums_details with one tuple per (dboid, rel, fork,
blck) to be complete. But that's just for future work.

For the progress part, we would most likely have a separate view for
that as well, as the view should show no rows if there is no operation
in progress.

The patch looks rather clean to me, I have some comments.

- <application>pg_checksums</application>. The exit status is zero if there
- are no checksum errors when checking them, and nonzero if at least one
- checksum failure is detected. If enabling or disabling checksums, the
- exit status is nonzero if the operation failed.
+ <application>pg_checksums</application>. As a consequence, the
+ <structname>pg_stat_checksums</structnameview won't reflect this activity.
+ The exit status is zero if there are no checksum errors when checking them,
+ and nonzero if at least one checksum failure is detected. If enabling or
+ disabling checksums, the exit status is nonzero if the operation failed.

The docs of pg_checksums already clearly state that the cluster needs
to be offline, so I am not sure that this addition is necessary.

@@ -1539,6 +1539,8 @@ pgstat_report_checksum_failures_in_db(Oid dboid,
int failurecount)

Please note that there is no need to have the list of arguments in the
comment block at the top of pgstat_report_checksum_failures_in_db().

+ if ((dbentry = pgstat_fetch_stat_dbentry(dbid)) == NULL)
+ result = 0;
+ else
+ result = dbentry->last_checksum_failure;
+
+ if (result == 0)
+ PG_RETURN_NULL();
+ else
+ PG_RETURN_TIMESTAMPTZ(result);
+}

No need for two ifs here. What about just that?
if (NULL)
PG_RETURN_NULL();
else
PG_RETURN_TIMESTAMPTZ(last_checksum_failure);
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yuzuko Hosoya 2019-04-02 05:02:08 RE: Problem with default partition pruning
Previous Message Michael Paquier 2019-04-02 03:05:14 Re: Unified logging system for command-line programs