Re: Eager page freeze criteria clarification

From: Joe Conway <mail(at)joeconway(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Jeff Davis <pgsql(at)j-davis(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: Eager page freeze criteria clarification
Date: 2023-12-21 16:36:43
Message-ID: 145ad2d0-8c9b-4dd8-9385-636c0e29708a@joeconway.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/21/23 10:56, Melanie Plageman wrote:
> On Sat, Dec 9, 2023 at 9:24 AM Joe Conway <mail(at)joeconway(dot)com> wrote:
>> However, even if we assume a more-or-less normal distribution, we should
>> consider using subgroups in a way similar to Statistical Process
>> Control[1]. The reasoning is explained in this quote:
>>
>> The Math Behind Subgroup Size
>>
>> The Central Limit Theorem (CLT) plays a pivotal role here. According
>> to CLT, as the subgroup size (n) increases, the distribution of the
>> sample means will approximate a normal distribution, regardless of
>> the shape of the population distribution. Therefore, as your
>> subgroup size increases, your control chart limits will narrow,
>> making the chart more sensitive to special cause variation and more
>> prone to false alarms.
>
> I haven't read anything about statistical process control until you
> mentioned this. I read the link you sent and also googled around a
> bit. I was under the impression that the more samples we have, the
> better. But, it seems like this may not be the assumption in
> statistical process control?
>
> It may help us to get more specific. I'm not sure what the
> relationship between "unsets" in my code and subgroup members would
> be. The article you linked suggests that each subgroup should be of
> size 5 or smaller. Translating that to my code, were you imagining
> subgroups of "unsets" (each time we modify a page that was previously
> all-visible)?

Basically, yes.

It might not makes sense, but I think we could test the theory by
plotting a histogram of the raw data, and then also plot a histogram
based on sub-grouping every 5 sequential values in your accumulator.

If the former does not look very normal (I would guess most workloads it
will be skewed with a long tail) and the latter looks to be more normal,
then it would say we were on the right track.

There are statistical tests for "normalness" that could be applied too
(<quickly looks> e.g.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6350423/#sec2-13title )
which be a more rigorous approach, but the quick look at histograms
might be sufficiently convincing.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2023-12-21 17:06:25 Functions to return random numbers in a given range
Previous Message Andres Freund 2023-12-21 16:07:57 Re: index prefetching