Quick Links

Re: RFE: Make statistics robust for unplanned events

From:	Magnus Hagander <magnus(at)hagander(dot)net>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	Patrik Novotny <panovotn(at)redhat(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: RFE: Make statistics robust for unplanned events
Date:	2021-04-23 08:21:20
Message-ID:	CABUevEz-2dBWhqz2edHEGCyFvHj-0oWwTuQZyNPMDmBqwsP9cg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Apr 23, 2021 at 12:41 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2021-04-21 14:38:44 +0200, Magnus Hagander wrote:
> > Andres mentioned at least once over in the thread about shared memory
> > stats collection that being able to have persistent stats could come
> > out of that one in the future. Whatever is done on the topic should
> > probably be done based on that work, as it provides a better starting
> > point and also one that will stay around.
>
> Yea. I think the main benefit from the shared memory stats patch that
> would make this a easier is that it tracks (with one small hole that can
> probably be addressed) dropped objects accurately, even across crashes /
> replication. Having old stats around runs into danger of mixing stats
> for an old dropped object being combined with stats for a new object.
>
> I don't think making pgstat.c fully durable by continually storing the
> data in a table or something like that is an option. For one, the stats
> for a replica and primary are independent. For another, the overhead
> would be prohibitive.
>
> But after we gain the accurate dropping of stats we can store a stats
> snapshot corresponding to certain WAL records (by serializing to
> something like pg_stats_%redo_lsn%) without ending up with dropped stats
> surviving.
>
> A big question around this is how often we'd want to write out the
> stats. Obviously, the more often we do, the higher the overhead. And the
> less frequently, the more stats updates might be lost.

Yeah, that's what I was thinking as well -- dumping snapshot at
regular intervals, so that on crash recovery we lose a "controlled
amount" of recent starts instead of losing *everything*.

I think in most situations a fairly long interval is OK -- if you have
tables that take so many hits that you need a really quick reaction
from autovacuum it will probably pick that up quickly enough even
after a reset. And if it's moer the long-term tracking that's
important, then a longer interval is probably OK.

But perhaps make it configurable with a timeout and a "reasonable default"?

> Patrik, for your use cases, would loosing at most the stats since the
> start of last checkpoint be an issue?

Unless there's a particular benefit to tie it specifically to the
checkpoint occuring, I'd rather keep it as a separate setting. They
might both come with the same default of course, btu I can certainly
envision cases where you want to increase the checkpoint distance
while keeping the stats interval lower for example. Many people
increase the checkpoint timeout quite a lot...

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Re: RFE: Make statistics robust for unplanned events at 2021-04-22 22:41:42 from Andres Freund

Responses

Re: RFE: Make statistics robust for unplanned events at 2021-04-23 08:51:02 from Patrik Novotny

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Laurenz Albe	2021-04-23 08:33:39	Re: Support tab completion for upper character inputs in psql
Previous Message	Fujii Masao	2021-04-23 08:09:38	Re: TRUNCATE on foreign table