Re: Speed up transaction completion faster after many relations are accessed in a transaction

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, "'Simon Riggs'" <simon(at)2ndquadrant(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speed up transaction completion faster after many relations are accessed in a transaction
Date: 2019-04-06 03:03:11
Message-ID: 22385.1554519791@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> writes:
> I can't detect any performance improvement with the patch applied to
> current master, using the test case from Yoshikazu Imai (2019-03-19).

FWIW, I tried this patch against current HEAD (959d00e9d).
Using the test case described by Amit at
<be25cadf-982e-3f01-88b4-443a6667e16a(at)lab(dot)ntt(dot)co(dot)jp>
I do measure an undeniable speedup, close to 35%.

However ... I submit that that's a mighty extreme test case.
(I had to increase max_locks_per_transaction to get it to run
at all.) We should not be using that sort of edge case to drive
performance optimization choices.

If I reduce the number of partitions in Amit's example from 8192
to something more real-world, like 128, I do still measure a
performance gain, but it's ~ 1.5% which is below what I'd consider
a reproducible win. I'm accustomed to seeing changes up to 2%
in narrow benchmarks like this one, even when "nothing changes"
except unrelated code.

Trying a standard pgbench test case (pgbench -M prepared -S with
one client and an -s 10 database), it seems that the patch is about
0.5% slower than HEAD. Again, that's below the noise threshold,
but it's not promising for the net effects of this patch on workloads
that aren't specifically about large and prunable partition sets.

I'm also fairly concerned about the effects of the patch on
sizeof(LOCALLOCK) --- on a 64-bit machine it goes from 72 to 88
bytes, a 22% increase. That's a lot if you're considering cases
with many locks.

On the whole I don't think there's an adequate case for committing
this patch.

I'd also point out that this is hardly the only place where we've
seen hash_seq_search on nearly-empty hash tables become a bottleneck.
So I'm not thrilled about attacking that with one-table-at-time patches.
I'd rather see us do something to let hash_seq_search win across
the board.

I spent some time wondering whether we could adjust the data structure
so that all the live entries in a hashtable are linked into one chain,
but I don't quite see how to do it without adding another list link to
struct HASHELEMENT, which seems pretty expensive.

I'll sketch the idea I had, just in case it triggers a better idea
in someone else. Assuming we are willing to add another pointer to
HASHELEMENT, use the two pointers to create a doubly-linked circular
list that includes all live entries in the hashtable, with a list
header in the hashtable's control area. (Maybe we'd use a dlist for
this, but it's not essential.) Require this list to be organized so
that all entries that belong to the same hash bucket are consecutive in
the list, and make each non-null hash bucket header point to the first
entry in the list for its bucket. To allow normal searches to detect
when they've run through their bucket, add a flag to HASHELEMENT that
is set only in entries that are the first, or perhaps last, of their
bucket (so you'd detect end-of-bucket by checking the flag instead of
testing for a null pointer). Adding a bool field is free due to
alignment considerations, at least on 64-bit machines. Given this,
I think normal hash operations are more-or-less the same cost as
before, while hash_seq_search just has to follow the circular list.

I tried to figure out how to do the same thing with a singly-linked
instead of doubly-linked list, but it doesn't quite work: if you need
to remove the first element of a bucket, you have no cheap way to find
its predecessor in the overall list (which belongs to some other
bucket, but you don't know which one). Maybe we could just mark such
entries dead (there's plenty of room for another flag bit) and plan
to clean them up later? But it's not clear how to ensure that they'd
get cleaned up in any sort of timely fashion.

Another issue is that probably none of this works for the partitioned
hash tables we use for some of the larger shared-memory hashes. But
I'm not sure we care about hash_seq_search for those, so maybe we just
say those are a different data structure.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-04-06 04:36:23 Re: ToDo: show size of partitioned table
Previous Message Masahiko Sawada 2019-04-06 02:49:33 Re: Re: Copy function for logical replication slots