Re: Speed up transaction completion faster after many relations are accessed in a transaction

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, "Imai, Yoshikazu" <imai(dot)yoshikazu(at)jp(dot)fujitsu(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Simon Riggs <simon(at)2ndquadrant(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Speed up transaction completion faster after many relations are accessed in a transaction
Date: 2019-07-24 03:05:37
Message-ID: CAKJS1f-7T9F1xLw5PqgOApcV6YX3WYC4XJHHCpxh8hzcZsA-xA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks for having a look at this.

On Wed, 24 Jul 2019 at 04:13, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> David Rowley <david(dot)rowley(at)2ndquadrant(dot)com> writes:
> > I'm pretty happy with v7 now. If anyone has any objections to it,
> > please speak up very soon. Otherwise, I plan to push it about this
> > time tomorrow.
>
> I dunno, this seems close to useless in this form. As it stands,
> once hash_get_max_bucket has exceeded the threshold, you will
> arbitrarily reset the table 1000 transactions later (since the
> max bucket is certainly not gonna decrease otherwise). So that's
> got two big problems:
>
> 1. In the assumed-common case where most of the transactions take
> few locks, you wasted cycles for 999 transactions.
>
> 2. You'll reset the table independently of subsequent history,
> even if the session's usage is pretty much always over the
> threshold. Admittedly, if you do this only once per 1K
> transactions, it's probably not a horrible overhead --- but you
> can't improve point 1 without making it a bigger overhead.

This is true, but I think you might be overestimating just how much
effort is wasted with #1. We're only seeing this overhead in small
very fast to execute xacts. In the test case in [1], I was getting
about 10k TPS unpatched and about 15k patched. This means that, on
average, 1 unpatched xact takes 100 microseconds and 1 average patched
xact takes 66 microseconds, so the additional time spent doing the
hash_seq_search() must be about 34 microseconds. So we'll waste a
total of 34 *milliseconds* if we wait for 1000 xacts before we reset
the table. With 10k TPS we're going to react to the change in 0.1
seconds.

I think you'd struggle to measure that 1 xact is taking 34
microseconds longer without running a few thousand queries. My view is
that nobody is ever going to notice that it takes 1k xacts to shrink
the table, and I've already shown that the overhead of the shrink
every 1k xact is tiny. I mentioned 0.34% in [1] using v6. This is
likely a bit smaller in v7 due to swapping the order of the if
condition to put the less likely case first. Since the overhead of
rebuilding the table was 7% when done every xact, then it stands to
reason that this has become 0.007% doing it every 1k xats and that the
additional overhead to make up that 0.34% is from testing if the reset
condition has been met (or noise). That's not something we can remove
completely with any solution that resets the hash table.

> I did complain about the cost of tracking the stats proposed by
> some earlier patches, but I don't think the answer to that is to
> not track any stats at all. The proposed hash_get_max_bucket()
> function is quite cheap, so I think it wouldn't be out of line to
> track the average value of that at transaction end over the
> session's lifespan, and reset if the current value is more than
> some-multiple of the average.
>
> The tricky part here is that if some xact kicks that up to
> say 64 entries, and we don't choose to reset, then the reading
> for subsequent transactions will be 64 even if they use very
> few locks. So we probably need to not use a plain average,
> but account for that effect somehow. Maybe we could look at
> how quickly the number goes up after we reset?
>
> [ thinks for awhile... ] As a straw-man proposal, I suggest
> the following (completely untested) idea:
>
> * Make the table size threshold variable, not constant.
>
> * If, at end-of-transaction when the table is empty,
> the table bucket count exceeds the threshold, reset
> immediately; but if it's been less than K transactions
> since the last reset, increase the threshold (by say 10%).
>
> I think K can be a constant; somewhere between 10 and 100 would
> probably work. At process start, we should act as though the last
> reset were more than K transactions ago (i.e., don't increase the
> threshold at the first reset).

I think the problem with this idea is that there is no way once the
threshold has been enlarged to recover from that to work better
workloads that require very few locks again. If we end up with some
large value for the variable threshold, there's no way to decrease
that again. All this method stops is the needless hash table resets
if the typical case requires many locks. The only way to know if we
can reduce the threshold again is to count the locks released during
LockReleaseAll(). Some version of the patch did that, and you
objected.

> The main advantage this has over v7 is that we don't have the
> 1000-transaction delay before reset, which ISTM is giving up
> much of the benefit of the whole idea. Also, if the session
> is consistently using lots of locks, we'll adapt to that after
> awhile and not do useless table resets.

True, but you neglected to mention the looming and critical drawback,
which pretty much makes that idea useless. All we'd need to do is give
this a workload that throws that variable threshold up high so that it
can't recover. It would be pretty simple then to show that
LockReleaseAll() is still slow with workloads that just require a
small number of locks... permanently with no means to recover.

To be able to reduce the threshold down again we'd need to make a
hash_get_num_entries(LockMethodLocalHash) call before performing the
guts of LockReleaseAll(). We could then weight that onto some running
average counter with a weight of, say... 10, so we react to changes
fairly quickly, but not instantly. We could then have some sort of
logic like "rebuild the hash table if running average 4 times less
than max_bucket"

I've attached a spreadsheet of that idea and the algorithm we could
use to track the running average. Initially, I've mocked it up a
series of transactions that use 1000 locks, then at row 123 dropped
that to 10 locks. If we assume the max_bucket is 1000, then it takes
until row 136 for the running average to drop below the max_bucket
count, i.e 13 xacts. There we'd reset there at the init size of 16. If
the average went up again, then we'd automatically expand the table as
we do now. To make this work we'd need an additional call to
hash_get_num_entries(), before we release the locks, so there is more
overhead.

[1] https://www.postgresql.org/message-id/CAKJS1f_ycJ-6QTKC70pZRYdwsAwUo+t0_CV0eXC=J-b5A2MegQ@mail.gmail.com

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
hash_reset_logic.ods application/vnd.oasis.opendocument.spreadsheet 28.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-24 03:52:40 Re: On the stability of TAP tests for LDAP
Previous Message Michael Paquier 2019-07-24 02:36:16 Re: pgbench tests vs Windows