Skip site navigation (1) Skip section navigation (2)

Proposal of tunable fix for scalability of 8.4

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: pgsql-performance(at)postgresql(dot)org
Subject: Proposal of tunable fix for scalability of 8.4
Date: 2009-03-11 20:53:49
Message-ID: 49B824DD.7090302@sun.com (view raw or flat)
Thread:
Lists: pgsql-performance
Hello All,

As  you know that one of the thing that constantly that I have been 
using benchmark kits to see how we can scale PostgreSQL on the 
UltraSPARC T2 based 1 socket (64 threads) and 2 socket (128 threads) 
servers that Sun sells.

During last PgCon 2008 
http://www.pgcon.org/2008/schedule/events/72.en.html you might remember 
that I mentioned that ProcArrayLock is pretty hot when you have many users.

Rerunning similar tests on a 64-thread UltraSPARC T2plus based server 
config,  I found that even with 8.4snap that I took I was still having 
similar problems (IO is not a problem... all in RAM .. no disks):
Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10552.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22897.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33099.000 Avg Medium Resp: 0.009
240: 400: Medium Throughput: 44692.000 Avg Medium Resp: 0.007
300: 500: Medium Throughput: 56455.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67220.000 Avg Medium Resp: 0.008
420: 700: Medium Throughput: 77592.000 Avg Medium Resp: 0.009
480: 800: Medium Throughput: 87277.000 Avg Medium Resp: 0.011
540: 900: Medium Throughput: 98029.000 Avg Medium Resp: 0.012
600: 1000: Medium Throughput: 102547.000 Avg Medium Resp: 0.023
660: 1100: Medium Throughput: 100503.000 Avg Medium Resp: 0.044
720: 1200: Medium Throughput: 99506.000 Avg Medium Resp: 0.065
780: 1300: Medium Throughput: 95474.000 Avg Medium Resp: 0.089
840: 1400: Medium Throughput: 86254.000 Avg Medium Resp: 0.130
900: 1500: Medium Throughput: 91947.000 Avg Medium Resp: 0.139
960: 1600: Medium Throughput: 94838.000 Avg Medium Resp: 0.147
1020: 1700: Medium Throughput: 92446.000 Avg Medium Resp: 0.173
1080: 1800: Medium Throughput: 91032.000 Avg Medium Resp: 0.194
1140: 1900: Medium Throughput: 88236.000 Avg Medium Resp: 0.221
 runDynamic: uCount =  2000delta = 1900
 runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1352555.000 Avg Medium Resp: 0.071
1260: 2000: Medium Throughput: 88872.000 Avg Medium Resp: 0.238
1320: 2000: Medium Throughput: 88484.000 Avg Medium Resp: 0.248
1380: 2000: Medium Throughput: 90777.000 Avg Medium Resp: 0.231
1440: 2000: Medium Throughput: 90769.000 Avg Medium Resp: 0.229

You will notice that throughput drops around 1000 users.. Nothing new 
you have already heard me mention that zillion times..

Now while working on this today I was going through LWLockRelease like I 
have probably done quite a few times before to see what can be done.. 
The quick synopsis is that LWLockRelease releases the lock and wakes up 
the next waiter to take over and if the next waiter is waiting for 
exclusive then it only wakes that waiter up  and if next waiter is 
waiting on shared then it goes through all shared waiters following and 
wakes them all up.

Earlier last  year I had tried various ways of doing intelligent waking 
up (finding all shared together and waking them up, coming up with a 
different lock type and waking multiple of them up simultaneously but 
ended up defining a new lock mode and of course none of them were 
stellar enough to make an impack..

Today I tried something else.. Forget the distinction of exclusive and 
shared and just wake them all up so I changed the code from
                            /*
                            * Remove the to-be-awakened PGPROCs from the 
queue.  If the front
                            * waiter wants exclusive lock, awaken him 
only. Otherwise awaken
                            * as many waiters as want shared access.
                            */
                        proc = head;
                        if (!proc->lwExclusive)
                        {
                               while (proc->lwWaitLink != NULL &&
                                           !proc->lwWaitLink->lwExclusive)
                                   proc = proc->lwWaitLink;
                        }
                        /* proc is now the last PGPROC to be released */
                        lock->head = proc->lwWaitLink;
                        proc->lwWaitLink = NULL;
                        /* prevent additional wakeups until retryer gets 
to run */
                        lock->releaseOK = false;


to basically wake them all up:
            /*
             * Remove the to-be-awakened PGPROCs from the queue.  If the 
front
             * waiter wants exclusive lock, awaken him only. Otherwise 
awaken
             * as many waiters as want shared access.
             */
                        proc = head;
            //if (!proc->lwExclusive)
            if (1)
            {
                             while (proc->lwWaitLink != NULL &&
                                          1)
                                           // 
!proc->lwWaitLink->lwExclusive)
                                        proc = proc->lwWaitLink;
            }
                        /* proc is now the last PGPROC to be released */
            lock->head = proc->lwWaitLink;
                        proc->lwWaitLink = NULL;
                        /* prevent additional wakeups until retryer gets 
to run */
                        lock->releaseOK = false;


Which basically wakes them all up and let them find (technically causing 
thundering herds what the original logic was trying to avoid) I reran 
the test and saw the results:

Time:Users:Type:TPM: Response Time
60: 100: Medium Throughput: 10457.000 Avg Medium Resp: 0.006
120: 200: Medium Throughput: 22809.000 Avg Medium Resp: 0.006
180: 300: Medium Throughput: 33665.000 Avg Medium Resp: 0.008
240: 400: Medium Throughput: 45042.000 Avg Medium Resp: 0.006
300: 500: Medium Throughput: 56655.000 Avg Medium Resp: 0.007
360: 600: Medium Throughput: 67170.000 Avg Medium Resp: 0.007
420: 700: Medium Throughput: 78343.000 Avg Medium Resp: 0.008
480: 800: Medium Throughput: 87979.000 Avg Medium Resp: 0.008
540: 900: Medium Throughput: 100369.000 Avg Medium Resp: 0.008
600: 1000: Medium Throughput: 110697.000 Avg Medium Resp: 0.009
660: 1100: Medium Throughput: 121255.000 Avg Medium Resp: 0.010
720: 1200: Medium Throughput: 132915.000 Avg Medium Resp: 0.010
780: 1300: Medium Throughput: 141505.000 Avg Medium Resp: 0.012
840: 1400: Medium Throughput: 147084.000 Avg Medium Resp: 0.021
light: customer: No result set for custid 0
900: 1500: Medium Throughput: 157906.000 Avg Medium Resp: 0.018
light: customer: No result set for custid 0
960: 1600: Medium Throughput: 160289.000 Avg Medium Resp: 0.026
1020: 1700: Medium Throughput: 152191.000 Avg Medium Resp: 0.053
1080: 1800: Medium Throughput: 157949.000 Avg Medium Resp: 0.054
1140: 1900: Medium Throughput: 161923.000 Avg Medium Resp: 0.063
 runDynamic: uCount =  2000delta = 1900
 runDynamic: ALL Threads Have Been created
1200: 2000: Medium Throughput: -1781969.000 Avg Medium Resp: 0.019
light: customer: No result set for custid 0
1260: 2000: Medium Throughput: 140741.000 Avg Medium Resp: 0.115
light: customer: No result set for custid 0
1320: 2000: Medium Throughput: 165379.000 Avg Medium Resp: 0.070
1380: 2000: Medium Throughput: 166585.000 Avg Medium Resp: 0.070
1440: 2000: Medium Throughput: 169163.000 Avg Medium Resp: 0.063
1500: 2000: Medium Throughput: 157508.000 Avg Medium Resp: 0.086
light: customer: No result set for custid 0
1560: 2000: Medium Throughput: 170112.000 Avg Medium Resp: 0.063

An improvement of 1.89X in throughput and still not drastically dropping 
which means now I can go forward still stressing up PostgreSQL 8.4 to 
the limits of the box.

My proposal is if we build a quick tunable for 8.4 
wake-up-all-waiters=on (or something to that effect) in postgresql.conf  
before the beta then people can try the option and report back to see if 
that helps improve performance on various other benchmarks that people 
are running and collect feedback. This way it will be not intrusive so 
late in the game and also put an important scaling fix back in... Of 
course as usual this is open for debate.. I know avoiding thundering 
herd was the goal here.. but waking up 1 exclusive waiter who may not be 
even on CPU is pretty expensive from what I have seen till date.

What do you all think ?

Regards,
Jignesh


Responses

pgsql-performance by date

Next:From: Kevin GrittnerDate: 2009-03-11 22:27:12
Subject: Re: Proposal of tunable fix for scalability of 8.4
Previous:From: Tom LaneDate: 2009-03-11 20:46:10
Subject: Re: Full statement logging problematic on larger machines?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group