Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?

From: David Geier <geidav(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Lukas Fittl <lukas(at)fittl(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, Maciek Sakrejda <m(dot)sakrejda(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Date: 2023-01-23 17:49:37
Message-ID: 5a09bfe1-8a6c-de2e-d2b0-62abc5a8fe7a@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 1/21/23 05:12, Andres Freund wrote:
> We do currently do the conversion quite frequently. Admittedly I was
> partially motivated by trying to get the per-loop overhead in pg_test_timing
> down ;)
>
> But I think it's a real issue. Places where we do, but shouldn't, convert:
>
> - ExecReScan() - quite painful, we can end up with a lot of those
> - InstrStopNode() - adds a good bit of overhead to simple
InstrStopNode() doesn't convert in the general case but only for the
first tuple or when async. So it goes somewhat hand in hand with
ExecReScan().
> - PendingWalStats.wal_write_time - this is particularly bad because it happens
> within very contended code
> - calls to pgstat_count_buffer_read_time(), pgstat_count_buffer_write_time() -
> they can be very frequent
> - pgbench.c, as we already discussed
> - pg_stat_statements.c
> - ...
>
> These all will get a bit slower when moving to a "variable" frequency.
I wonder if we will be able to measure any of them easily. But given
that it's many more places than I had realized and given that the
optimized code is not too involved, let's give it a try.
> What was your approach for avoiding the costly operation? I ended up with a
> integer multiplication + shift approximation for the floating point
> multiplication (which in turn uses the inverse of the division by the
> frequency). To allow for sufficient precision while also avoiding overflows, I
> had to make that branch conditional, with a slow path for large numbers of
> nanoseconds.

It seems like we ended up with the same. I do:

sec = ticks / frequency_hz
ns  = ticks / frequency_hz * 1,000,000,000
ns  = ticks * (1,000,000,000 / frequency_hz)
ns  = ticks * (1,000,000 / frequency_khz) <-- now in kilohertz

Now, the constant scaling factor in parentheses is typically a floating
point number. For example for a frequency of 2.5 GHz it would be 2.5. To
work around that we can do something like:

ns  = ticks * (1,000,000 * scaler / frequency_khz) / scaler

Where scaler is a power-of-2, big enough to maintain enough precision
while allowing for a shift to implement the division.

The additional multiplication with scaler makes that the maximum range
go down, because we must ensure we never overflow. I'm wondering if we
cannot pick scaler in such a way that remaining range of cycles is large
enough for our use case and we can therefore live without bothering for
the overflow case. What would be "enough"? 1 year? 10 years? ...

Otherwise, we indeed need code that cares for the potential overflow. My
hunch is that it can be done branchless, but it for sure adds dependent
instructions. Maybe in that case a branch is better that almost
certainly will never be taken?

I'll include the code in the new patch set which I'll latest submit
tomorrow.

> I think it'd be great - but I'm not sure we're there yet, reliability and
> code-complexity wise.
Thanks to your commits, the diff of the new patch set will be already
much smaller and easier to review. What's your biggest concern in terms
of reliability?
> I think it might be worth makign the rdts aspect somewhat
> measurable. E.g. allowing pg_test_timing to use both at the same time, and
> have it compare elapsed time with both sources of counters.
I haven't yet looked into pg_test_timing. I'll do that while including
your patches into the new patch set.

--
David Geier
(ServiceNow)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Geier 2023-01-23 17:52:44 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Pavel Stehule 2023-01-23 17:42:55 Re: Schema variables - new implementation for Postgres 15 (typo)