Re: Fix performance of generic atomics

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Jesper Pedersen <jesper(dot)pedersen(at)redhat(dot)com>, Sokolov Yura <funny(dot)falcon(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fix performance of generic atomics
Date: 2017-09-06 04:23:25
Message-ID: 22159.1504671805@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> What scale factor and client count? How many cores per socket? It looks
> like Sokolov was just starting to see gains at 200 clients on 72 cores,
> using -N transaction.

I spent some time poking at this with the test scaffolding I showed at
https://www.postgresql.org/message-id/17694.1504665058%40sss.pgh.pa.us

AFAICT, there are not huge differences between different coding methods
even for two processes in direct competition in the tightest possible
atomic-ops loops. So if you're testing SQL operations, you're definitely
not going to see anything without a whole lot of concurrent processes.

Moreover, it matters which primitive you're testing, on which platform,
with which compiler, because we have a couple of layers of atomic ops
implementations.

I started out testing pg_atomic_fetch_add_u32(), as shown in the above-
mentioned message. On x86_x64 with gcc, that is implemented by the
code for it in src/include/port/atomics/arch-x86.h. If you dike out
that support, it falls back to the version in atomics/generic-gcc.h,
which seems no worse and possibly better. Only if you also dike out
generic-gcc.h do you get to the version in atomics/generic.h that this
patch wants to change. However, at that point you're also down to a
spinlock-based implementation of the underlying pg_atomic_read_u32_impl
and pg_atomic_compare_exchange_u32_impl, which means that performance
is going to be less than great anyway. No platform that we consider
well-supported ought to be hitting the spinlock paths. This means
that Sokolov's proposed changes in atomics/generic.h ought to be
uninteresting for performance on any platform we care about --- at
least for pg_atomic_fetch_add_u32().

However, Sokolov also proposes adding gcc-intrinsic support for
pg_atomic_fetch_and_xxx and pg_atomic_fetch_or_xxx. This is a
different kettle of fish. I repeated the experiments I'd done
for pg_atomic_fetch_add_u32(), per the above message, using the "or"
primitive, and found largely the same results as for "add": it's
slightly better under contention than the generic code, and
significantly better if the result value of the atomic op isn't needed.

So I think we should definitely take the part of the patch that
changes generic-gcc.h --- and we should check to see if we're missing
out on any other gcc primitives we should be using there.

I'm less excited about the proposed changes in generic.h. There's
nothing wrong with them in principle, but they mostly shouldn't
make any performance difference on any platform we care about,
because we shouldn't get to that code in the first place on any
platform we care about. If we are getting to that code for any
specific primitive, then either there's a gap in the platform-specific
or compiler-specific support, or it's debatable that we ought to
consider that primitive to be very primitive.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-09-06 04:34:16 GatherMerge misses to push target list
Previous Message Pavel Stehule 2017-09-06 03:56:34 Re: proposal psql \gdesc