Re: Checksum errors in pg_stat_database

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Checksum errors in pg_stat_database
Date: 2019-01-11 20:25:56
Message-ID: CABUevExzWLiQrAf-UohmTn5seLcNet0X5-HVP9Wq_TRRm_43xw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 11, 2019 at 9:20 PM Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
wrote:

>
>
>
> On 1/11/19 7:40 PM, Robert Haas wrote:
> > On Fri, Jan 11, 2019 at 5:21 AM Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> >> Would it make sense to add a column to pg_stat_database showing
> >> the total number of checksum errors that have occurred in a database?
> >>
> >> It's really a ">1 means it's bad", but it's a lot easier to monitor
> >> that in the statistics views, and given how much a lot of people
> >> set their systems out to log, it's far too easy to miss individual
> >> checksum matches in the logs.
> >>
> >> If we track it at the database level, I don't think the overhead
> >> of adding one more counter would be very high either.
> >
> > It's probably not the idea way to track it. If you have a terabyte or
> > fifty of data, and you see that you have some checksum failures, good
> > luck finding the offending blocks.
> >
>
> Isn't that somewhat similar to deadlocks, which we also track in
> pg_stat_database? The number of deadlocks is rather useless on it's own,
> you need to dive into the server log to find the details. Same for
> checksum errors.
>

It is a bit similar yeah. Though a checksum counter is really a "you need
to look at fixing this right away" in a bit more sense than deadlocks. But
yes, the fact that we already tracks deadlocks there is a good example. (Of
course, I believe I added that one at some point as well, so I'm clearly
biased there)

> But I'm tentatively in favor of your proposal anyway, because it's
> > pretty simple and cheap and might help people, and doing something
> > noticeably better is probably annoyingly complicated.
> >
>
> +1
>

Yeah, that's the idea behind it -- it's cheap, and an
early-warning-indicator.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-01-11 21:05:05 Re: port of INSTALL file generation to XSLT
Previous Message Tomas Vondra 2019-01-11 20:20:20 Re: Checksum errors in pg_stat_database