Re: [HACKERS] [POC] Faster processing at Gather node

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [POC] Faster processing at Gather node
Date: 2018-02-27 21:03:17
Message-ID: CA+TgmobkN-g10qNU4-qPApAPG9w7euFJgU1Y-TGtXr9TUeZ3=w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 7, 2018 at 1:41 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> Well, it's more than just systems like that - for 64bit atomics we
> sometimes do fall back to spinlock based atomics on 32bit systems, even
> if they support 32 bit atomics.

I built with -m32 on my laptop and tried "select aid, count(*) from
pgbench_accounts group by 1 having count(*) > 1" on pgbench at scale
factor 100 with pgbench_accounts_pkey dropped and
max_parallel_workers_per_gather set to 10 on (a) commit
0b5e33f667a2042d7022da8bef31a8be5937aad1 (I know this is a little old,
but I think it doesn't matter), (b) same plus
shm-mq-less-spinlocks-v3, and (c) same plus shm-mq-less-spinlocks-v3
and shm-mq-reduce-receiver-latch-set-v2.

(a) 16563.790 ms, 16625.257 ms, 16496.062 ms
(b) 17217.051 ms, 17157.745 ms, 17225.755 ms [median to median +3.9% vs. (a)]
(c) 15491.947 ms, 15455.840 ms, 15452.649 ms [median to median -7.0%
vs. (a), -10.2% vs (b)]

Do you think that's a problem? If it is, what do you think we should
do about it? It seems to me that it's probably OK because (1) with
both patches we still come out ahead and (2) 32-bit systems will
presumably continue to become rarer as time goes on, but you might
disagree.

>> > Hm, do all paths here guarantee that mq->mq_detached won't be stored on
>> > the stack / register in the first iteration, and then not reread any
>> > further? I think it's fine because every branch of the if below ends up
>> > in a syscall / barrier, but it might be worth noting on that here.
>>
>> Aargh. I hate compilers. I added a comment. Should I think about
>> changing mq_detached to pg_atomic_uint32 instead?
>
> I think a pg_compiler_barrier() would suffice to alleviate my concern,
> right? If you wanted to go for an atomic, using pg_atomic_flag would
> probably be more appropriate than pg_atomic_uint32.

Hmm, all right, I'll add pg_compiler_barrier().

>> >> - /* Write as much data as we can via a single memcpy(). */
>> >> + /*
>> >> + * Write as much data as we can via a single memcpy(). Make sure
>> >> + * these writes happen after the read of mq_bytes_read, above.
>> >> + * This barrier pairs with the one in shm_mq_inc_bytes_read.
>> >> + */
>> >
>> > s/above/above. Otherwise a newer mq_bytes_read could become visible
>> > before the corresponding reads have fully finished./?
>>
>> I don't find that very clear. A newer mq_bytes_read could become
>> visible whenever, and a barrier doesn't prevent that from happening.
>
> Well, my point was that the barrier prevents the the write to
> mq_bytes_read becoming visible before the corresponding reads have
> finished. Which then would mean the memcpy() could overwrite them. And a
> barrier *does* prevent that from happening.

I think we're talking about the same thing, but not finding each
others' explanations very clear.

>> Hmm, I'm not sure I understand what you're suggesting, here. In
>> general, we return with the data for the current message unconsumed,
>> and then consume it the next time the function is called, so that
>> (except when the message wraps the end of the buffer) we can return a
>> pointer directly into the buffer rather than having to memcpy(). What
>> this patch does is postpone consuming the data further, either until
>> we can free up at least a quarter of the ring buffer or until we need
>> to wait for more data. It seemed worthwhile to free up space in the
>> ring buffer occasionally even if we weren't to the point of waiting
>> yet, so that the sender has an opportunity to write new data into that
>> space if it happens to still be running.
>
> What I'm trying to suggest is that instead of postponing an update of
> mq_bytes_read (by storing amount of already performed reads in
> mqh_consume_pending), we continue to update mq_bytes_read but only set
> the latch if your above thresholds are crossed. That way a burst of
> writes can fully fill the ringbuffer, but the cost of doing a SetLatch()
> is amortized. In my testing SetLatch() was the expensive part, not the
> necessary barriers in shm_mq_inc_bytes_read().

OK, I'll try to check how feasible that would be.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-02-27 21:06:38 Re: [HACKERS] [POC] Faster processing at Gather node
Previous Message Robert Haas 2018-02-27 20:58:34 Re: [HACKERS] [bug-fix] Cannot select big bytea values (~600MB)