Re: Inconsistency in determining the timestamp of the db statfile.

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Subject: Re: Inconsistency in determining the timestamp of the db statfile.
Date: 2020-09-09 09:45:20
Message-ID: CABUevEwr=D1s5ZrCG8HKMg__8MgTq5dCvmMEC3yw0Br_w3q-1Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 9, 2020 at 5:04 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:

> On Tue, Sep 8, 2020 at 7:03 PM Magnus Hagander <magnus(at)hagander(dot)net>
> wrote:
> >
> > On Tue, Sep 8, 2020 at 3:11 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
> wrote:
> >>
> >>
> >>
> >> On 2020/09/08 19:28, Magnus Hagander wrote:
> >> >
> >> >
> >> > On Tue, Sep 8, 2020 at 8:10 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com
> <mailto:amit(dot)kapila16(at)gmail(dot)com>> wrote:
> >> >
> >> > We use the timestamp of the global statfile if we are not able to
> >> > determine it for a particular database either because the entry
> for
> >> > that database doesn't exist or there is an error while reading the
> >> > specific database entry. This was not taken care of while reading
> >> > other entries like ArchiverStats or SLRUStats. See
> >> > pgstat_read_db_statsfile_timestamp. I have observed this while
> >> > reviewing Sawada-San's patch related to logical replication stats
> [1].
> >> >
> >> > I think this can only happen if due to some reason the statfile
> got
> >> > corrupt or we
> >> > have some bug in stats writing code, the chances of both are rare
> and even
> >> > if that happens we will use stale statistics.
> >> >
> >> > The attached patch by Sawada-San will fix this issue. As the
> chances of this
> >> > the problem is rare and nobody has reported any issue related to
> this,
> >> > so it might be okay not to backpatch this.
> >> >
> >> > Thoughts?
> >> >
> >> >
> >> > Why are we reading the archiver statis and and slru stats in
> "pgstat_read_db_statsfile_timestamp" in the first place?
> >>
> >> Maybe because they are written before database stats entries? That is,
> >> to read the target database stats entry and get the timestamp from it,
> >> we need to read (or lseek) all the global stats entries written before
> them.
> >>
> >
> > Oh meh. Yeah, I'm reading this thing completely wrong :/ Ignore my
> comments :)
> >
> >
> >
> >> > That seems well out of scope for that function.
> >> >
> >> > If nothing else the comment at the top of that function is out of
> touch with reality. We do seem to read it into local buffers and then
> ignore the contents. I guess we're doing it just to verify that it's not
> corrupt -- so perhaps the function should actually have a different name
> than it does now, since it certainly seems to do more than actually read
> the statsfile timestamp.
> >> >
> >> > Specifically in this patch it looks wrong -- in the case of say the
> SLRU stats being corrupt, we will now return the timestamp of the global
> stats file even if there is one existing for the database requested, which
> definitely breaks the contract of the function.
> >>
> >> Yes.
> >> We should return false when fread() for database entry fails, instead?
> That is,
> >>
> >> 1. If corrupted stats file is found, the function always returns false.
> >> 2. If the file is not currupted and the target database entry is found,
> its timestamp is returned.
> >> 3. If the file is not corrupted and the target is NOT found, the
> timestamp of global entry is returned
> >
> >
> > Yeah, with more coffee and re-reading it, I'm not sure how we could have
> the database stats being OK if the archiver or slru stats are not.
> >
> > That said, I think it still makes sense to return false if the stats
> file is corrupt. How much can we trust the first block, if the block right
> after it is corrupt? So yeah, I think your described order seems correct.
> But that's also what the code already did before this patch, isn't it?
> >
>
> No, before patch as well, if we can't read the DB entry say because
> the file is corrupt, we return true and use timestamp of global stats
> file and this is what is established by the original commit 187492b6.
>

Uh, the patch changes:
- return false;
+ return true;

In the "codepath of corruption". After also setting the value.

So AFAICT before the patch, it returns false if the file is corrupt
(there's a set of such scenarios, all returning false), just after logging
that it's corrupt.The patch changes it to log that it's corrupt and then
return true.

The fact that it doesn't find the database doesn't mean the file is
corrupt, it just means the database is inactive. But missing global,
archiver or slru stats means it's corrupt.

> So, if we consider that as correct then the functionality is something
> like once we have read the timestamp of the global statfile, we use
> that if we can't read the actual db entry for whatever reason. It
> seems if we return false, the caller will call this function again in
> the loop. Now, I see the point that if we can't read some part of the
> file we should return false instead but not sure if that is helpful
> from the perspective of the caller of
> pgstat_read_db_statsfile_timestamp.
>

But we are not talking about the "if we can't read the actual db entry"
case, we are talking about the *global* parts of the file.

Though in fact the one inconsistent place in the code now is that if it is
corrupt in the db entry part of the file it returns true and the global
timestamp, which I would argue is perhaps incorrect and it should return
false.

By returning false in the corrupt case we make the caller sleep and try
again, which seems to be the correct thing to do (since the stats collector
will be rewriting the file regularly)

I have included Alvaro as he is a committer for 187492b6, so he might
> remember something and let us know if this is a mistake or there is
> some reason for doing so (return true even when the db entry we are
> trying to read is corrupt).
>

Again, it looks to me like 187492b6 is doing exactly that -- it returns
false on corrupt file. It returns true if the file is OK, regardless of if
it finds the database, using the global timestamp if the database is not
found.

--
Magnus Hagander
Me: https://www.hagander.net/ <http://www.hagander.net/>
Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2020-09-09 09:46:07 Re: PG 13 release notes, first draft
Previous Message Alexey Kondratov 2020-09-09 09:42:09 Re: [POC] Fast COPY FROM command for the table with foreign partitions