Skip site navigation (1) Skip section navigation (2)

Re: User concurrency thresholding: where do I look?

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: User concurrency thresholding: where do I look?
Date: 2007-07-26 15:27:30
Message-ID: 46A8BD62.2080904@sun.com (view raw or flat)
Thread:
Lists: pgsql-performance
I will look for runs with longer samples..

Now the script could have mislabeled lock names.. Anyway digging into 
the one that seems to increase over time... I did stack profiles on how 
that increases... and here are some numbers..


For  600-850 Users: that potential mislabeled CheckPointStartLock  or 
LockID==12 comes from various sources where the top source (while system 
is still doing great) comes from:



              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage_ReadOnly+0xc
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
          8202100

              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
         30822100


However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:



              postgres`LWLockAcquire+0x1c8
              postgres`TransactionIdSetStatus+0x1c
              postgres`RecordTransactionCommit+0x2a8
              postgres`CommitTransaction+0xc8
              postgres`CommitTransactionCommand+0x90
              postgres`finish_xact_command+0x60
              postgres`exec_execute_message+0x3d8
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        406601300

              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesUpdate+0x360
              postgres`heap_lock_tuple+0x27c
              postgres`ExecutePlan+0x33c
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
              postgres`main+0x394
              postgres`_start+0x108
        444523100

              postgres`LWLockAcquire+0x1c8
              postgres`SimpleLruReadPage+0x1ac
              postgres`TransactionIdGetStatus+0x14
              postgres`TransactionLogFetch+0x58
              postgres`TransactionIdDidCommit+0x4
              postgres`HeapTupleSatisfiesSnapshot+0x234
              postgres`heap_release_fetch+0x1a8
              postgres`index_getnext+0xf4
              postgres`IndexNext+0x7c
              postgres`ExecScan+0x8c
              postgres`ExecProcNode+0xb4
              postgres`ExecutePlan+0xdc
              postgres`ExecutorRun+0xb0
              postgres`PortalRunSelect+0x9c
              postgres`PortalRun+0x244
              postgres`exec_execute_message+0x3a0
              postgres`PostgresMain+0x1300
              postgres`BackendRun+0x278
              postgres`ServerLoop+0x63c
              postgres`PostmasterMain+0xc40
       1661300000



Maybe you all will understand more than I do on what it does here.. 
Looks like IndexNext has a problem at high number of users to me.. but I 
could be wrong..

-Jignesh




Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>   
>> The count is only for a 10-second snapshot.. Plus remember there are 
>> about 1000 users running so the connection  being profiled only gets 
>> 0.01 of the period on CPU..  And the count is for that CONNECTION only.
>>     
>
> OK, that explains the low absolute levels of the counts, but if the
> counts are for a regular backend then there's still a lot of bogosity
> here.  Backends won't be taking the CheckpointLock at all, nor do they
> take CheckpointStartLock in exclusive mode.  The bgwriter would do that
> but it'd not be taking most of these other locks.  So I think the script
> is mislabeling the locks somehow.
>
> Also, elementary statistics should tell you that a sample taken as above
> is going to have enormous amounts of noise.  You should be sampling over
> a much longer period, say on the order of a minute of runtime, to have
> numbers that are trustworthy.
>
> 			regards, tom lane
>   

In response to

Responses

pgsql-performance by date

Next:From: Tom LaneDate: 2007-07-26 15:42:48
Subject: Re: User concurrency thresholding: where do I look?
Previous:From: Simon RiggsDate: 2007-07-26 15:25:31
Subject: Re: User concurrency thresholding: where do I look?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group