Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Date: 2023-03-10 00:05:27
Message-ID: CA+hUKGLaL4n=WVWkuK5ETUkk6BDDU10St5q+p7VotpPa+dA4Zg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 10, 2023 at 11:37 AM Nathan Bossart
<nathandbossart(at)gmail(dot)com> wrote:
>
> On Thu, Mar 09, 2023 at 05:27:08PM -0500, Tom Lane wrote:
> > Is it reasonable to assume that all modern platforms can time
> > millisecond delays accurately? Ten years ago I'd have suggested
> > truncating the delay to a multiple of 10msec and using this logic
> > to track the remainder, but maybe now that's unnecessary.
>
> If so, it might also be worth updating or removing this comment in
> pgsleep.c:
>
> * NOTE: although the delay is specified in microseconds, the effective
> * resolution is only 1/HZ, or 10 milliseconds, on most Unixen. Expect
> * the requested delay to be rounded up to the next resolution boundary.
>
> I've had doubts for some time about whether this is still accurate...

What I see with the old select(), or a more modern clock_nanosleep()
call, is that Linux, FreeBSD, macOS are happy sleeping for .1ms, .5ms,
1ms, 2ms, 3ms, and through innaccuracies and scheduling overheads etc
it works out to about 5-25% extra sleep time (I expect that can be
affected by choice of time source/available hardware, and perhaps
various system calls use different tricks). I definitely recall the
behaviour described, back in the old days where more stuff was
scheduler-tick based. I have no clue for Windows; quick googling
tells me that it might still be pretty chunky, unless you do certain
other stuff that I didn't follow up; we could probably get more
accurate sleep times by rummaging through nt.dll. It would be good to
find out how well WaitEventSet does on Windows; perhaps we should have
a little timing accuracy test in the tree to collect build farm data?

FWIW epoll has a newer _pwait2() call that has higher res timeout
argument, and Windows WaitEventSet could also do high res timers if
you add timer events rather than using the timeout argument, and I
guess conceptually even the old poll() thing could do the equivalent
with a signal alarm timer, but it sounds a lot like a bad idea to do
very short sleeps to me, burning so much CPU on scheduling. I kinda
wonder if the 10ms + residual thing might even turn out to be a better
idea... but I dunno.

The 1ms residual thing looks pretty good to me as a fix to the
immediate problem report, but we might also want to adjust the wording
in config.sgml?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2023-03-10 00:25:52 Re: Sub-millisecond [autovacuum_]vacuum_cost_delay broken
Previous Message Nathan Bossart 2023-03-10 00:03:13 Re: improving user.c error messages