Re: User concurrency thresholding: where do I look?

From: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-performance(at)postgresql(dot)org, Alvaro Herrera <alvherre(at)commandprompt(dot)com>
Subject: Re: User concurrency thresholding: where do I look?
Date: 2007-07-26 15:27:30
Message-ID: 46A8BD62.2080904@sun.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I will look for runs with longer samples..

Now the script could have mislabeled lock names.. Anyway digging into
the one that seems to increase over time... I did stack profiles on how
that increases... and here are some numbers..

For 600-850 Users: that potential mislabeled CheckPointStartLock or
LockID==12 comes from various sources where the top source (while system
is still doing great) comes from:

postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage_ReadOnly+0xc
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesSnapshot+0x234
postgres`heap_release_fetch+0x1a8
postgres`index_getnext+0xf4
postgres`IndexNext+0x7c
postgres`ExecScan+0x8c
postgres`ExecProcNode+0xb4
postgres`ExecutePlan+0xdc
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
8202100

postgres`LWLockAcquire+0x1c8
postgres`TransactionIdSetStatus+0x1c
postgres`RecordTransactionCommit+0x2a8
postgres`CommitTransaction+0xc8
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
30822100

However at 900 Users where the big drop in throughput occurs:
It gives a different top "consumer" of time:

postgres`LWLockAcquire+0x1c8
postgres`TransactionIdSetStatus+0x1c
postgres`RecordTransactionCommit+0x2a8
postgres`CommitTransaction+0xc8
postgres`CommitTransactionCommand+0x90
postgres`finish_xact_command+0x60
postgres`exec_execute_message+0x3d8
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
406601300

postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesUpdate+0x360
postgres`heap_lock_tuple+0x27c
postgres`ExecutePlan+0x33c
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
postgres`main+0x394
postgres`_start+0x108
444523100

postgres`LWLockAcquire+0x1c8
postgres`SimpleLruReadPage+0x1ac
postgres`TransactionIdGetStatus+0x14
postgres`TransactionLogFetch+0x58
postgres`TransactionIdDidCommit+0x4
postgres`HeapTupleSatisfiesSnapshot+0x234
postgres`heap_release_fetch+0x1a8
postgres`index_getnext+0xf4
postgres`IndexNext+0x7c
postgres`ExecScan+0x8c
postgres`ExecProcNode+0xb4
postgres`ExecutePlan+0xdc
postgres`ExecutorRun+0xb0
postgres`PortalRunSelect+0x9c
postgres`PortalRun+0x244
postgres`exec_execute_message+0x3a0
postgres`PostgresMain+0x1300
postgres`BackendRun+0x278
postgres`ServerLoop+0x63c
postgres`PostmasterMain+0xc40
1661300000

Maybe you all will understand more than I do on what it does here..
Looks like IndexNext has a problem at high number of users to me.. but I
could be wrong..

-Jignesh

Tom Lane wrote:
> "Jignesh K. Shah" <J(dot)K(dot)Shah(at)Sun(dot)COM> writes:
>
>> The count is only for a 10-second snapshot.. Plus remember there are
>> about 1000 users running so the connection being profiled only gets
>> 0.01 of the period on CPU.. And the count is for that CONNECTION only.
>>
>
> OK, that explains the low absolute levels of the counts, but if the
> counts are for a regular backend then there's still a lot of bogosity
> here. Backends won't be taking the CheckpointLock at all, nor do they
> take CheckpointStartLock in exclusive mode. The bgwriter would do that
> but it'd not be taking most of these other locks. So I think the script
> is mislabeling the locks somehow.
>
> Also, elementary statistics should tell you that a sample taken as above
> is going to have enormous amounts of noise. You should be sampling over
> a much longer period, say on the order of a minute of runtime, to have
> numbers that are trustworthy.
>
> regards, tom lane
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2007-07-26 15:42:48 Re: User concurrency thresholding: where do I look?
Previous Message Simon Riggs 2007-07-26 15:25:31 Re: User concurrency thresholding: where do I look?