From: | Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Merlin Moncure <mmoncure(at)gmail(dot)com>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, YUriy Zhuravlev <u(dot)zhuravlev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Move PinBuffer and UnpinBuffer to atomics |
Date: | 2016-04-13 11:41:32 |
Message-ID: | CAPpHfdt9bbHj5nNGazuDyw=ZotNbOVznofmwPMeBBxu3sxyYNA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Apr 12, 2016 at 5:12 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> On Tue, Apr 12, 2016 at 3:48 PM, Alexander Korotkov <
> a(dot)korotkov(at)postgrespro(dot)ru> wrote:
>
>> On Tue, Apr 12, 2016 at 12:40 AM, Andres Freund <andres(at)anarazel(dot)de>
>> wrote:
>>
>>> I did get access to the machine (thanks!). My testing shows that
>>> performance is sensitive to various parameters influencing memory
>>> allocation. E.g. twiddling with max_connections changes
>>> performance. With max_connections=400 and the previous patches applied I
>>> get ~1220000 tps, with 402 ~1620000 tps. This sorta confirms that we're
>>> dealing with an alignment/sharing related issue.
>>>
>>> Padding PGXACT to a full cache-line seems to take care of the largest
>>> part of the performance irregularity. I looked at perf profiles and saw
>>> that most cache misses stem from there, and that the percentage (not
>>> absolute amount!) changes between fast/slow settings.
>>>
>>> To me it makes intuitive sense why you'd want PGXACTs to be on separate
>>> cachelines - they're constantly dirtied via SnapshotResetXmin(). Indeed
>>> making it immediately return propels performance up to 1720000, without
>>> other changes. Additionally cacheline-padding PGXACT speeds things up to
>>> 1750000 tps.
>>>
>>
>> It seems like padding PGXACT to a full cache-line is a great
>> improvement. We have not so many PGXACTs to care about bytes wasted to
>> padding.
>>
>
> Yes, it seems generally it is a good idea, but not sure if it is a
> complete fix for variation in performance we are seeing when we change
> shared memory structures. Andres suggested me on IM to take performance
> data on x86 m/c by padding PGXACT and the data for the same is as below:
>
> median of 3, 5-min runs
>
> Client_Count/Patch_ver 8 64 128
> HEAD 59708 329560 173655
> PATCH 61480 379798 157580
>
> Here, at 128 client-count the performance with patch still seems to have
> variation. The highest tps with patch (170363) is close to HEAD (175718).
> This could be run-to-run variation, but I think it indicates that there are
> more places where we might need such padding or may be optimize them, so
> that they are aligned.
>
> I can do some more experiments on similar lines, but I am out on vacation
> and might not be able to access the m/c for 3-4 days.
>
Could share details of hardware you used? I could try to find something
similar to reproduce this.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
From | Date | Subject | |
---|---|---|---|
Next Message | Robert Haas | 2016-04-13 11:57:09 | Re: Missing PG_INT32_MIN in numutils.c |
Previous Message | Robert Haas | 2016-04-13 10:57:15 | Re: Detrimental performance impact of ringbuffers on performance |