Re: LWLocks in DSM memory

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LWLocks in DSM memory
Date: 2016-07-25 17:50:48
Message-ID: CA+TgmoYfQyYZTKLf+-HkXNafX25TkY8JAb8yQXy9O8gqd-K2Mw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Jul 24, 2016 at 11:02 PM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sun, Jul 24, 2016 at 1:10 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> One solution could be to provide a non-circular variant of the dlist
>> interface that uses NULL list termination. I've attached a quick
>> sketch of something like that which seems to work correctly. It is
>> only lightly tested so far and probably buggy, but shows the general
>> idea.
>
> On reflection, it wouldn't make much sense to put "noncircular" in the
> names of interfaces, as that is an internal detail. Maybe
> "relocatable" or "position independent" or something like that, since
> that's a user-visible property of the dlist_head. Here is a version
> of the patch that uses _relocatable.
>
>> Any thoughts on this approach, or better ways to solve this problem?
>> How valuable is the branch-free property of the circular push and
>> delete operations?
>
> I investigated this a bit. If I do this on my laptop (clang, no
> asserts, -O2), it takes 3895 milliseconds, or 4.8ns per push/delete
> operation:
>
> dlist_init(&head);
> for (i = 0; i < 100000000; ++i)
> {
> dlist_push_head(&head, &nodes[0]);
> dlist_push_tail(&head, &nodes[1]);
> dlist_push_head(&head, &nodes[2]);
> dlist_push_tail(&head, &nodes[3]);
> dlist_delete(&nodes[2]);
> dlist_delete(&nodes[3]);
> dlist_delete(&nodes[0]);
> dlist_delete(&nodes[1]);
> }
>
> The relocatable version takes 5907 milliseconds, or 7.4ns per
> push/delete operation:
>
> dlist_init_relocatable(&head);
> for (i = 0; i < 100000000; ++i)
> {
> dlist_push_head_relocatable(&head, &nodes[0]);
> dlist_push_tail_relocatable(&head, &nodes[1]);
> dlist_push_head_relocatable(&head, &nodes[2]);
> dlist_push_tail_relocatable(&head, &nodes[3]);
> dlist_delete_relocatable(&head, &nodes[2]);
> dlist_delete_relocatable(&head, &nodes[3]);
> dlist_delete_relocatable(&head, &nodes[0]);
> dlist_delete_relocatable(&head, &nodes[1]);
> }
>
> Those operations are ~50% slower. So getting rid of dlist's clever
> branch-free code generally seems like a bad idea.
>
> Next I wondered if it would be a bad idea to use slower relocatable
> dlist heads for all LWLocks. Obviously LWLocks are used all over the
> place and it would be bad to slow them down, but I wondered if maybe
> dlist manipulation might not be a very significant part of their work.
> So I put a LWLock and a counter in shared memory, and had N background
> workers run a tight loop that locks, increments and unlocks
> concurrently until the counter reaches 1 billion. This creates
> different degrees of contention and wait list sizes. The worker loop
> looks like this:
>
> while (!finished)
> {
> LWLockAcquire(&control->lock, LW_EXCLUSIVE);
> ++control->counter;
> if (control->counter >= control->goal)
> finished = true;
> LWLockRelease(&control->lock);
> }
>
> I measured the following times for unpatched master, on my 4 core laptop:
>
> 16 workers = 73.067s, 74.869s, 75.338s
> 8 workers = 65.846s, 67.622s, 68.039s
> 4 workers = 68.763s, 68.980s, 69.035s <-- curiously slower here
> 3 workers = 59.701s, 59.991s, 60.133s
> 2 workers = 53.620s, 55.300s, 55.790s
> 1 worker = 21.578s, 21.535s, 21.598s
>
> With the attached patched I got:
>
> 16 workers = 75.341s, 77.445s, 77.635s <- +3.4%
> 8 workers = 67.462s, 68.622s, 68.851s <- +1.4%
> 4 workers = 64.983s, 65.021s, 65.496s <- -5.7%
> 3 workers = 60.247s, 60.425s, 60.492s <- +0.7%
> 2 workers = 57.544s, 57.626s, 58.133s <- +2.3%
> 1 worker = 21.403s, 21.486s, 21.661s <- -0.2%
>
> Somewhat noisy data and different effects at different numbers of workers.
>
> I can post the source for those tests if anyone is interested. If you
> have any other ideas for access patterns to test, or clever ways to
> keep push and delete branch-free while also avoiding internal pointers
> back to dlist_head, I'm all ears. Otherwise, if a change affecting
> all LWLocks turns out to be unacceptable, maybe we would need to have
> a different LWLock interface for relocatable LWLocks to make them
> suitable for use in DSM segments. Any thoughts?

Thanks for looking into this. Any theory on why this is slower in the
abstract but not slower, perhaps even faster, for LWLocks? I wonder
if it has to do with how likely it is for a list to be completely
empty.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-07-25 18:00:33 Re: LWLocks in DSM memory
Previous Message Robert Haas 2016-07-25 17:46:08 Re: No longer possible to query catalogs for index capabilities?