Re: Early hint bit setting

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Ants Aasma <ants(at)cybertec(dot)at>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Early hint bit setting
Date: 2012-05-30 22:01:04
Message-ID: CAHyXU0w_4eOSRg9q3skH-W_idg7enWW9cY0Mf45ca0ZdqXnMRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 30, 2012 at 4:42 PM, Ants Aasma <ants(at)cybertec(dot)at> wrote:
> I was thinking about what is the earliest time where we could set hint
> bits. This would be just after the commit has been made visible. When
> the transaction completes and commit confirmation is sent to the
> client the backend will usually go to sleep waiting on the network
> socket waiting for further commands. Because most clients wait for the
> commit confirmation before proceeding this means that we have atleast
> one network RTT before this backend is expected to respond again.
>
> The idea is to keep a small backend local ring buffer of pages that
> have been modified. When a transaction has just committed, we do a
> non-blocking read on the socket. When nothing is available we take the
> opportunity to go and set hint bits in the recently modified buffers.
>
> Hurting latency for single-threaded workloads using lots of
> transactions is bad. It follows that it would be a bad idea to do
> anything that could take a long time while waiting for the next
> command. Because early hinting is a performance optimisation we can
> safely skip it if it becomes bothersome. Anything that causes IO can
> take too long. So we only set the hint bits when the page is still in
> shared buffers to avoid reading in the page. Furthermore, we only hint
> the tuples that the recently completed transaction modified to avoid
> IO from CLOG (we could hint other tuples if their xid happens to be in
> the SLRU, but it probably won't be very useful).
>
> Hint bits are set sooner or later. Setting them earlier is a
> throughput win for any workload because we avoid generating extra
> load. We avoid doing any IO and we might save some so for IO this is a
> pure win. The hinting CPU work needs to be done sooner or later, so
> that's a tie, except for extremely bursty write heavy loads with lots
> of transactions. Memory loads could in principle hurt other backends.
> Refilling the whole last level cache of modern processors takes a few
> hundred microseconds at peak speed. If the WAL is on fast storage
> (BBWC, SSD) there's a pretty good chance that the page being hinted is
> still in the cpu cache, avoiding the memory bandwidth overhead.
>
> Abstraction wise, I think we need to set up a mechanism to run very
> short maintenance jobs from backends waiting for new commands.
> SocketBackend could check if there's anything to do, and call
> pq_getbyte_if_available if there is anything to do before proceeding
> to do it.
>
> Setting hint bits early would help workloads with small synchronously
> writing transactions. Async commits could also benefit from proactive
> hint bit setting, but this would require some global cooperation and
> isn't as clear of a win. One idea would be to copy the local ring
> buffer entries to a global one tagged with the LSN when the
> transaction has been made visible. When someone flushes xlog, they
> also check if it enables some background hinting and set the
> corresponding flag for any backend with spare cycles to pick up.
>
> Comments?

I think this is a really neat idea, and could solve a lot of problems.
Since you don't have to do any clog checks (you know when you commit)
-- i think it's a win all around -- so much so that it might be worth
seeing the worst case latency hit if you force one page out always
before doing the socket check. Hm, could you shave cpu cycles by just
storing the specific offsets of the hint bit bytes you want to set, or
is that too hacky?

merlin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-05-30 22:10:12 We're not lax enough about maximum time zone offset from UTC
Previous Message Erik Rijkers 2012-05-30 22:00:46 Re: FailedAssertion("!(PrivateRefCount[i] == 0)", File: "bufmgr.c", Line: 1741