Re: Any better plan for this query?..

From: Dimitri <dimitrik(dot)fr(at)gmail(dot)com>
To: Scott Carey <scott(at)richrelevance(dot)com>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Alvaro Herrera <alvherre(at)commandprompt(dot)com>, Aidan Van Dyk <aidan(at)highrise(dot)ca>, Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Performance <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Any better plan for this query?..
Date: 2009-05-18 22:32:54
Message-ID: 5482c80a0905181532h304ede79jb9e92c6ef99ab94a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 5/18/09, Scott Carey <scott(at)richrelevance(dot)com> wrote:
> Great data Dimitri!'

Thank you! :-)

>
> I see a few key trends in the poor scalability:
>
> The throughput scales roughly with %CPU fairly well. But CPU used doesn't
> go past ~50% on the 32 core tests. This indicates lock contention.
>

You should not look on #1 STATs, but on #2 - they are all with the
latest "fixes" - on all of them CPU is used well (90% in pic on
32cores).
Also, keep in mind these cores are having 2 threads, and from Solaris
point of view they are seen as CPU (so 64 CPU) and %busy is accounted
as for 64 CPU

> Other proof of lock contention are the mutex locks / sec graph which climbs

exactly, except no locking was seen on processes while I tried to
trace them.. What I think will be needed here is a global and
corelated tracing of all PG processes - I did not expect to do it now,
but next time

> rapidly as the system gets more inefficient (along with context switches).
>
> Another trend is the system calls/sec which caps out with the test, at about
> 400,000 per sec on the peak (non-prepared statement) result. Note that when
> the buffer size is 256MB, the performance scales much worse and is slower.
> And correlated with this the system calls/sec per transaction is more than
> double, at slower throughput.

of course, because even the data were cached by filesystem to get them
you still need to call a read() system call..

>
> Using the OS to cache pages is not as fast as pages in shared_buffers, by a
> more significant amount with many cores and higher concurrency than in the
> low concurrency case.

exactly, it's what I also wanted to demonstrate because I often hear
"PG is delegating caching to the filesystem" - and I don't think it's
optimal :-)

>
> The system is largely lock limited in the poor scaling results. This holds
> true with or without the use of prepared statements -- which help a some,
> but not a lot and don't affect the scalability.

we are agree here, but again - 20K mutex spins/sec is a quite low
value, that's why I hope on the bigger server it'll be more clear
where is a bottleneck :-)

Rgds,
-Dimitri

>
>
> 4096MB shared buffers, 32 cores, 8.4, read only:
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html
>
> 256MB cache, 32 cores, 8.4, read-only:
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_52.html
>
> 4096MB shared buffs, 32 cores, 8.4, read only, prepared statements
> http://dimitrik.free.fr/Report_20090505/5539_dim_STAT_70.html
>
> On 5/18/09 11:00 AM, "Dimitri" <dimitrik(dot)fr(at)gmail(dot)com> wrote:
>
>> Folks, I've just published a full report including all results here:
>> http://dimitrik.free.fr/db_STRESS_PostgreSQL_837_and_84_May2009.html
>>
>> From my point of view it needs first to understand where the time is
>> wasted on a single query (even when the statement is prepared it runs
>> still slower comparing to MySQL).
>>
>> Then to investigate on scalability issue I think a bigger server will
>> be needed here (I'm looking for 64cores at least :-))
>>
>> If you have some other ideas or patches (like Simon) - don't hesitate
>> to send them - once I'll get an access to the server again the
>> available test time will be very limited..
>>
>> Best regards!
>> -Dimitri
>>
>>
>> On 5/18/09, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>>>
>>> On Thu, 2009-05-14 at 20:25 +0200, Dimitri wrote:
>>>
>>>> # lwlock_wait_8.4.d `pgrep -n postgres`
>>>
>>>> Lock Id Mode Combined Time (ns)
>>>> FirstLockMgrLock Exclusive 803700
>>>> BufFreelistLock Exclusive 3001600
>>>> FirstLockMgrLock Shared 4586600
>>>> FirstBufMappingLock Exclusive 6283900
>>>> FirstBufMappingLock Shared 21792900
>>>
>>> I've published two patches to -Hackers to see if we can improve the read
>>> only numbers on 32+ cores.
>>>
>>> Try shared_buffer_partitions = 256
>>>
>>> --
>>> Simon Riggs www.2ndQuadrant.com
>>> PostgreSQL Training, Services and Support
>>>
>>>
>>
>> --
>> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dimitri 2009-05-18 22:33:07 Re: Any better plan for this query?..
Previous Message Dimitri 2009-05-18 22:32:43 Re: Any better plan for this query?..