Quick Links

Avoiding repeated snapshot computation

From:	Pavan Deolasee <pavan(dot)deolasee(at)gmail(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Avoiding repeated snapshot computation
Date:	2011-11-26 15:52:50
Message-ID:	CABOikdMsJ4OsxtA7XBV2quhKYUo_4105fJF4N+uyRoyBAzSuuQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On some recent benchmarks and profile data, I saw GetSnapshotData
figures at the very top or near top. For lesser number of clients, it
can account for 10-20% of time, but more number of clients I have seen
it taking up as much as 40% of sample time. Unfortunately, the machine
of which I was running these tests is currently not available and so I
don't have the exact numbers. But the observation is almost correct.
Our recent work on separating the hot members of PGPROC in a separate
array would definitely reduce data cache misses ans reduce the
GetSnapshotData time, but it probably still accounts for a large
enough critical section for a highly contended lock.

I think now that we have reduced the run time of the function itself,
we should now try to reduce the number of times the function is
called. Robert proposed a way to reduce the number of calls per
transaction. I think we can go one more step further and reduce the
number for across the transactions.

One major problem today could be because the way LWLock works. If the
lock is currently held in SHARED mode by some backend and some other
backend now requests it in SHARED mode, it will immediately get it.
Thats probably the right thing to do because you don't want the reader
to really wait when the lock is readily available. But in the case of
GetSnapshotData(), every reader is doing exactly the same thing; they
are computing a snapshot based on the same shared state and would
compute exactly the same snapshot (if we ignore the fact that we don't
include caller's XID in xip array, but thats a minor detail). And
because the way LWLock works, more and more readers would get in to
compute the snapshot, until the exclusive waiters get a window to
sneak in, either because more and more processes slowly start sleeping
for exclusive access. To depict it, the four transactions make
overlapping calls for GetSnapshotData() and hence the total critical
section starts when the first caller enters it and the ends when the
last caller exits.

Txn1 ------[ SHARED ]---------------------
Txn2 --------[ SHARED ]-------------------
Txn3 -----------------[ SHARED ]-------------
Txn4 -------------------------------------------[ SHARED
]---------
|<---------------Total Time ------------------------------------>|

Couple of ideas come to mind to solve this issue.

A snapshot once computed will remain valid for every call irrespective
of its origin until at least one transaction ends. So we can store the
last computed snapshot in some shared area and reuse it for all
subsequent GetSnapshotData calls. The shared snapshot will get
invalidated when some transaction ends by calling
ProcArrayEndTransaction(). I tried this approach and saw a 15%
improvement for 32-80 clients on the 32 core HP IA box with pgbench -s
100 -N tests. Not bad, but I think this can be improved further.

What we can do is when a transaction comes to compute its snapshot, it
checks if some other transaction is already computing a snapshot for
itself. If so, it just sleeps on the lock. When the other process
finishes computing the snapshot, it saves the snapshot is a shared
area and wakes up all processes waiting for the snapshot. All those
processes then just copy the snapshot from the shared area and they
are done. This will not only reduce the total CPU consumption by
avoiding repetitive work, but would also reduce the total time for
which ProcArrayLock is held in SHARED mode by avoiding pipeline of
GetSnapshotData calls. I am currently trying the shared work queue
mechanism to implement this, but I am sure we can do it this in some
other way too.

Thanks,
Pavan

--
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com

Responses

Re: Avoiding repeated snapshot computation at 2011-11-26 17:13:47 from Robert Haas
Re: Avoiding repeated snapshot computation at 2011-11-26 20:42:12 from Andres Freund
Re: Avoiding repeated snapshot computation at 2012-08-17 01:02:10 from Bruce Momjian

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2011-11-26 15:58:39	Re: [COMMITTERS] pgsql: Modify pg_dump to use error-free memory allocation macros. This
Previous Message	Tom Lane	2011-11-26 15:45:26	Re: vpath builds and verbose error messages