Re: Patch: fix lock contention for HASHHDR.mutex

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Aleksander Alekseev <a(dot)alekseev(at)postgrespro(dot)ru>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Anastasia Lubennikova <a(dot)lubennikova(at)postgrespro(dot)ru>
Subject: Re: Patch: fix lock contention for HASHHDR.mutex
Date: 2016-03-01 13:09:16
Message-ID: CAA4eK1+ORDtkbtis4NWMLOWzs6akKmpkA5ekMCyj5N0FYmOXQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 12, 2016 at 9:04 PM, Aleksander Alekseev <
a(dot)alekseev(at)postgrespro(dot)ru> wrote:

> Hello, Robert
>
> > It also strikes me that it's probably quite likely that slock_t
> > mutex[NUM_FREELISTS] is a poor way to lay out this data in memory.
> > For example, on a system where slock_t is just one byte, most likely
> > all of those mutexes are going to be in the same cache line, which
> > means you're going to get a LOT of false sharing. It seems like it
> > would be sensible to define a new struct that contains an slock_t, a
> > long, and a HASHELEMENT *, and then make an array of those structs.
> > That wouldn't completely eliminate false sharing, but it would reduce
> > it quite a bit. My guess is that if you did that, you could reduce
> > the number of freelists to 8 or less and get pretty much the same
> > performance benefit that you're getting right now with 32. And that,
> > too, seems likely to be good for single-client performance.
>
> I experimented for a while trying to fit every spinlock in a separate
> cache line. Indeed we can gain some speedup this way. Here are
> benchmark results on 12-core server for NUM_LOCK_PARTITIONS = 32 (in
> this case difference is more notable):
>
> | FREELISTS | SIZE = 32 | SIZE = 64 | SIZE = 128 | SIZE = 256 |
> |-----------|------------|------------|------------|------------|
> | 4 | +25.4% | +28.7% | +28.4% | +28.3% |
> | 8 | +28.2% | +29.4% | +31.7% | +31.4% |
> | 16 | +29.9% | +32.6% | +31.6% | +30.8% |
> | 32 | +33.7% | +34.2% | +33.6% | +32.6% |
>
> Here SIZE is short for FREELIST_BUFFER_SIZE (part of a hack I used to
> align FREELIST structure, see attached patch).

I am not sure, if this is exactly what has been suggested by Robert, so it
is not straightforward to see if his suggestion can allow us to
use NUM_FREELISTS as 8 rather than 32. I think instead of trying to
use FREELISTBUFF, why not do it as Robert has suggested and try with
different values of NUM_FREELISTS?

>
> > I am however wondering if it to set the freelist affinity based on
> > something other than the hash value, like say the PID, so that the
> > same process doesn't keep switching to a different freelist for every
> > buffer eviction.
>
> Also I tried to use PID to determine freeList number:
>
> ```
> #include <sys/types.h>
> #include <unistd.h>
>
> ...
>
> #define FREELIST_IDX(hctl, hashcode) \
> (IS_PARTITIONED(hctl) ? ((uint32)getpid()) % NUM_FREELISTS : 0)
>
>
...
>
> // now nentries could be negative in this case
> // Assert(FREELIST(hctl, freelist_idx).nentries > 0);
>
>
In which case, do you think entries can go negative? I think the nentries
is incremented and decremented in the way as without patch, so I am not
getting what can make it go negative.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-03-01 13:09:56 Addition of extra commit fest entry to park future patches
Previous Message Craig Ringer 2016-03-01 13:05:33 Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.