Re: Online checksums verification in the backend

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-03-16 04:53:35
Message-ID: CA+fd4k5_gxSrcjXoxBk_47vB+xYGKzvcLGtwaLCbcBmW3v5oGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 11 Mar 2020 at 16:18, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Tue, Dec 10, 2019 at 11:12:34AM +0100, Julien Rouhaud wrote:
> > On Tue, Dec 10, 2019 at 3:26 AM Michael Paquier <michael(at)paquier(dot)xyz> wrote:
> > >
> > > On Mon, Dec 09, 2019 at 07:02:43PM +0100, Julien Rouhaud wrote:
> > > > On Mon, Dec 9, 2019 at 5:21 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > >> Some people might prefer notices, because you can get those while the
> > > >> thing is still running, rather than a result set, which you will only
> > > >> see when the query finishes. Other people might prefer an SRF, because
> > > >> they want to have the data in structured form so that they can
> > > >> postprocess it. Not sure what you mean by "more globally."
> > > >
> > > > I meant having the results available system-wide, not only to the
> > > > caller. I think that emitting a log/notice level should always be
> > > > done on top on whatever other communication facility we're using.
> > >
> > > The problem of notice and logs is that they tend to be ignored. Now I
> > > don't see no problems either in adding something into the logs which
> > > can be found later on for parsing on top of a SRF returned by the
> > > caller which includes all the corruption details, say with pgbadger
> > > or your friendly neighborhood grep. I think that any backend function
> > > should also make sure to call pgstat_report_checksum_failure() to
> > > report a report visible at database-level in the catalogs, so as it is
> > > possible to use that as a cheap high-level warning. The details of
> > > the failures could always be dug from the logs or the result of the
> > > function itself after finding out that something is wrong in
> > > pg_stat_database.
> >
> > I agree that adding extra information in the logs and calling
> > pgstat_report_checksum_failure is a must do, and I changed that
> > locally. However, I doubt that the logs is the right place to find
> > the details of corrupted blocks. There's no guarantee that the file
> > will be accessible to the DBA, nor that the content won't get
> > truncated by the time it's needed. I really think that corruption is
> > important enough to justify more specific location.
>
>
> The cfbot reported a build failure, so here's a rebased v2 which also contains
> the pg_stat_report_failure() call and extra log info.

In addition to comments from Michael-san, here are my comments:

1.
+ if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
+ ereport(ERROR,
+ (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("only superuser or a member of the
pg_read_server_files role may use this function")));
+
+ if (!DataChecksumsEnabled())
+ elog(ERROR, "Data checksums are not enabled");

I think it's better to reverse the order of the above checks.

2.
+#define CRF_COLS 5 /* Number of output arguments in the SRF */

Should it be SRF_COLS?

3.
+static void
+check_delay_point(void)
+{
+ /* Always check for interrupts */
+ CHECK_FOR_INTERRUPTS();
+
+ /* Nap if appropriate */
+ if (!InterruptPending && VacuumCostBalance >= VacuumCostLimit)
+ {
+ int msec;
+
+ msec = VacuumCostDelay * VacuumCostBalance / VacuumCostLimit;
+ if (msec > VacuumCostDelay * 4)
+ msec = VacuumCostDelay * 4;
+
+ pg_usleep(msec * 1000L);
+
+ VacuumCostBalance = 0;
+
+ /* Might have gotten an interrupt while sleeping */
+ CHECK_FOR_INTERRUPTS();
+ }
+}

Even if we use vacuum delay for this function, I think we need to set
VacuumDelayActive and return if it's false, or it's better to just
return if VacuumCostDelay == 0.

4.
+static void
+check_all_relations(TupleDesc tupdesc, Tuplestorestate *tupstore,
+ ForkNumber forknum)

I also agree with Michael-san to remove this function. Instead we can
check all relations by:

select pg_check_relation(oid) from pg_class;

6.
Other typos

s/dirted/dirtied/
s/explictly/explicitly/

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Suraj Kharage 2020-03-16 05:07:35 Re: backup manifests
Previous Message Justin Pryzby 2020-03-16 04:34:25 Re: Berserk Autovacuum (let's save next Mandrill)