Re: Checksum errors in pg_stat_database

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Magnus Hagander <magnus(at)hagander(dot)net>
Cc: David Steele <david(at)pgmasters(dot)net>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksum errors in pg_stat_database
Date: 2019-03-13 15:54:42
Message-ID: CAOBaU_ZTFe-xrYNVu4pDPsi3JUOoncXBm5=hh7ocv_gd-hxDig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 13, 2019 at 4:53 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
>
> On Sun, Mar 10, 2019 at 1:13 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> >
> > On Sat, Mar 9, 2019 at 7:58 PM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > >
> > > On Sat, Mar 9, 2019 at 7:50 PM Magnus Hagander <magnus(at)hagander(dot)net> wrote:
> > > >
> > > > On Sat, Mar 9, 2019 at 10:41 AM Julien Rouhaud <rjuju123(at)gmail(dot)com> wrote:
> > > >>
> > > >> Sorry, I have again new comments after a little bit more thinking.
> > > >> I'm wondering if we can do something about shared objects while we're
> > > >> at it. They don't belong to any database, so it's a little bit
> > > >> orthogonal to this proposal, but it seems quite important to track
> > > >> error on those too!
> > > >>
> > > >> What about adding a new field in PgStat_GlobalStats for that? We can
> > > >> use the same lastDir to easily detect such objects and slightly adapt
> > > >> sendFile again, which seems quite straightforward.
> > >
> > > > Question is then what number that should show -- only the checksum counter in non-database-fields, or the total number across the cluster?
> > >
> > > I'd say only for non-database-fields errors, especially if we can
> > > reset each counters separately. If necessary, we can add a new view
> > > to give a global overview of checksum errors for DBA convenience.
> >
> > I'm considering adding a new PgStat_ChecksumStats for that purpose
> > instead, but I don't know if that's acceptable to do so in the last
> > commitfest. It seems worthwhile to add it eventually, since we'll
> > probably end up having more things to report to users related to
> > checksum. Online enabling of checksum could be the most immediate
> > potential target.
>
> I wasn't aware that we were already storing informations about shared
> objects in PgStat_StatDBEntry, with an InvalidOid as databaseid
> (though we don't have any system view that are actually showing
> information for such objects).
>
> As a result I ended up simply adding counters for the number of total
> checks and the timestamp of the last failure in PgStat_StatDBEntry,
> making attached patch very lightweight. I moved all the checksum
> related counters out of pg_stat_database in a new pg_stat_checksum
> view. It avoids to make pg_stat_database too wide, and also allows to
> display information about shared object in this new view (some of the
> other counters don't really make sense for shared objects or could
> break existing monitoring query). While at it, I tried to add a
> little bit of documentation wrt. checksum monitoring.

and of course I forgot to attach the patch.

Attachment Content-Type Size
pg_stat_checksum-v1.diff application/octet-stream 19.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Ramsey 2019-03-13 16:05:29 Re: Compressed TOAST Slicing
Previous Message Dagfinn Ilmari Mannsåker 2019-03-13 15:54:26 Using the return value of strlcpy() and strlcat()