Re: hyperthreaded cpu still an issue in 8.4?

From: Scott Carey <scott(at)richrelevance(dot)com>
To: Dave Youatt <dave(at)meteorsolutions(dot)com>, Greg Smith <gsmith(at)gregsmith(dot)com>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: hyperthreaded cpu still an issue in 8.4?
Date: 2009-07-27 19:05:47
Message-ID: C693489B.D9D8%scott@richrelevance.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


On 7/27/09 11:05 AM, "Dave Youatt" <dave(at)meteorsolutions(dot)com> wrote:

> On 01/-10/-28163 11:59 AM, Greg Smith wrote:
>> On Tue, 21 Jul 2009, Doug Hunley wrote:
>>
> Also, and this is getting maybe too far off topic, beyond the buzzwords,
> what IS the new "hyperthreading" in Nehalems? -- opportunistic
> superpipelined cpus?, superscalar? What's shared by the cores
> (bandwidth, cache(s))? What's changed about the new hyperthreading
> that makes it actually seem to work (or at least not causes other
> problems)? smarter scheduling of instructions to take advantage of
> stalls, hazards another thread's instruction stream? Fixed
> instruction-level locking/interlocks, or avoiding locking whenever
> possible? better cache coherency mechanicms (related to the
> interconnects)? Jedi mind tricks???
>

The Nehalems are an iteration off the "Core" processor line, which is a
4-way superscalar, out of order CPU. Also, it has some very sophisticated
memory access reordering capability.
So, the HyperThreading here (Symmetric Multi-Threading, SMT, is the academic
name) will take advantage of that processor's inefficiencies -- a mix of
stalls due to waiting for memory, and unused execution 'width' resources.
So, if both threads are active and not stalled on memory access or other
execution bubbles, there are a lot of internal processor resources to share.
And if one of them is misbehaving and spinning, it won't dominate those
resources.

On the old Pentium-4 based HyperThreading, was also SMT, but those
processors were built to be high frequency and 'narrow' in terms of
superscalar execution (2-way superscalar, I believe). So the main advantage
of HT there was that one thread could schedule work while another was
waiting on memory access. If both were putting demands on the core
execution resources there was not much to gain unless one thread stalled on
memory access a lot, and if one of them was spinning it would eat up most of
the shared resources.

In both cases, the main execution resources get split up. L1 cache,
instruction buffers and decoders, instruction reorder buffers, etc. But in
this release, Intel increased several of these to beyond what is optimal for
one thread, to make the HT more efficient.

But the type of applications that will benefit the most from this HT is not
always the same as the older one, since the two CPU lines have different
weaknesses for SMT to mask or strengths to enhance.

> I'm guessing it's the better interconnect, but work interferes with
> finding the time to research and benchmark.

The new memory and interconnect architecture has a huge impact on
performance, but it is separate from the other big features (Turbo being the
other one not discussed here). For scalability to many CPUs it is probably
the most significant however.

Note, that these CPU's have some good power saving technology that helps
quite a bit when idle or using just one core or thread, but when all threads
are ramped up and all the memory banks are filled the systems draw a LOT of
power.

AMD still does quite well if you're on a power budget with their latest
CPUs.

>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Mike Ivanov 2009-07-27 23:31:00 Re: select query performance question
Previous Message Developer 2009-07-27 19:04:39 Re: More speed counting rows