Re: Online checksums verification in the backend

From: Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Online checksums verification in the backend
Date: 2020-04-05 09:08:06
Message-ID: CA+fd4k5VS_QAkdGwU-mwsr1z78XM7egixZ2bDhfeSJm4HCrRGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 5 Apr 2020 at 17:44, Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Sun, Apr 05, 2020 at 01:13:30PM +0900, Masahiko Sawada wrote:
> >
> > Thank you for updating the patch! The patch looks good to me. Here are
> > some random comments mostly about cosmetic changes.
> >
>
> Thanks a lot for the review!

Thank you for updating the patch.

>
> >
> > 1.
> > I think we can have two separate SQL functions:
> > pg_check_relation(regclass, text) and pg_check_relation(regclass),
> > instead of setting NULL by default to the second argument.
> >
>
> I'm fine with it, so implemented this way with the required documentation
> changes.

Why do we need two rows in the doc? For instance, replication slot
functions have some optional arguments but there is only one row in
the doc. So I think we don't need to change the doc from the previous
version patch.

And I think these are not necessary as we already defined in
include/catalog/pg_proc.dat:

+CREATE OR REPLACE FUNCTION pg_check_relation(
+ IN relation regclass,
+ OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
+ OUT expected_checksum integer, OUT found_checksum integer)
+ RETURNS SETOF record STRICT VOLATILE LANGUAGE internal AS 'pg_check_relation'
+ PARALLEL RESTRICTED;
+
+CREATE OR REPLACE FUNCTION pg_check_relation(
+ IN relation regclass, IN fork text,
+ OUT relid oid, OUT forknum integer, OUT failed_blocknum bigint,
+ OUT expected_checksum integer, OUT found_checksum integer)
+ RETURNS SETOF record STRICT VOLATILE LANGUAGE internal
+ AS 'pg_check_relation_fork'
+ PARALLEL RESTRICTED;

>
> >
> > 2.
> > + * Check data sanity for a specific block in the given fork of the given
> > + * relation, always retrieved locally with smgrred even if a version exists in
> >
> > s/smgrred/smgrread/
>
> Fixed.
>
> >
> > 3.
> > + /* The buffer will have to check checked. */
> > + Assert(checkit);
> >
> > Should it be "The buffer will have to be checked"?
> >
>
> Oops indeed, fixed.
>
> >
> > 4.
> > + if (!is_member_of_role(GetUserId(), DEFAULT_ROLE_READ_SERVER_FILES))
> > + ereport(ERROR,
> > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > + errmsg("only superuser or a member of the
> > pg_read_server_files role may use this function")));
> >
> > Looking at the definition of pg_stat_read_server_files role, this role
> > seems to be for operations that could read non-database files such as
> > csv files. Therefore, currently this role is used by file_fdw and COPY
> > command. I personally think pg_stat_scan_tables would be more
> > appropriate for this function but I'm not sure.
> >
>
> That's a very good point, especially since the documentation of this default
> role is quite relevant for those functions:
>
> "Execute monitoring functions that may take ACCESS SHARE locks on tables,
> potentially for a long time."
>
> So changed!
>
> >
> > 5.
> > + /* Set cost-based vacuum delay */
> > + ChecksumCostActive = (ChecksumCostDelay > 0);
> > + ChecksumCostBalance = 0;
> >
> > s/vacuum/checksum verification/
> >
>
> Fixed.
>
> >
> > 6.
> > + ereport(WARNING,
> > + (errcode(ERRCODE_DATA_CORRUPTED),
> > + errmsg("invalid page in block %u of relation %s",
> > + blkno,
> > + relpath(relation->rd_smgr->smgr_rnode, forknum))));
> >
> > I think it's better to show the relation name instead of the relation path here.
> >
>
> I'm here using the same pattern as what ReadBuffer_common() would display if a
> corrupted block is read. I think it's better to keep the format for both, so
> any existing log analyzer will keep working with those new functions.

Ok, I agree with you.

Regards,

--
Masahiko Sawada http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2020-04-05 09:44:59 Re: Online checksums verification in the backend
Previous Message Julien Rouhaud 2020-04-05 08:43:55 Re: Online checksums verification in the backend