Re: getting the most of out multi-core systems for repeated complex SELECT statements

From: Andy Colson <andy(at)squeakycode(dot)net>
To: Greg Smith <greg(at)2ndquadrant(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, gnuoytr(at)rcn(dot)com, pgsql-performance(at)postgresql(dot)org
Subject: Re: getting the most of out multi-core systems for repeated complex SELECT statements
Date: 2011-02-04 03:21:21
Message-ID: 4D4B70B1.2080809@squeakycode.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On 02/03/2011 04:56 PM, Greg Smith wrote:
> Scott Marlowe wrote:
>> On Thu, Feb 3, 2011 at 8:57 AM,<gnuoytr(at)rcn(dot)com> wrote:
>>
>>> Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And
>>>
>>
>> I'm pretty sure multi-core query processing is in the TODO list. Not
>> sure anyone's working on it tho. Writing a big check might help.
>>
>
> Work on the exciting parts people are interested in is blocked behind completely mundane tasks like coordinating how the multiple sessions are going to end up with a consistent view of the database. See "Export snapshots to other sessions" at http://wiki.postgresql.org/wiki/ClusterFeatures for details on that one.
>
> Parallel query works well for accelerating CPU-bound operations that are executing in RAM. The reality here is that while the feature sounds important, these situations don't actually show up that often. There are exactly zero clients I deal with regularly who would be helped out by this. The ones running web applications whose workloads do fit into memory are more concerned about supporting large numbers of users, not optimizing things for a single one. And the ones who have so much data that single users running large reports would seemingly benefit from this are usually disk-bound instead.
>
> The same sort of situation exists with SSDs. Take out the potential users whose data can fit in RAM instead, take out those who can't possibly get an SSD big enough to hold all their stuff anyway, and what's left in the middle is not very many people. In a database context I still haven't found anything better to do with a SSD than to put mid-sized indexes on them, ones a bit too large for RAM but not so big that only regular hard drives can hold them.
>
> I would rather strongly disagree with the suggestion that embracing either of these fancy but not really as functional as they appear at first approaches is critical to PostgreSQL's future. They're specialized techniques useful to only a limited number of people.
>
> --
> Greg Smith 2ndQuadrant USgreg(at)2ndQuadrant(dot)com Baltimore, MD
> PostgreSQL Training, Services, and 24x7 Supportwww.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance":http://www.2ndQuadrant.com/books
>

4 cores is cheap and popular now, 6 in a bit, 8 next year, 16/24 cores in 5 years. You can do 16 cores now, but its a bit expensive. I figure hundreds of cores will be expensive in 5 years, but possible, and available.

Cpu's wont get faster, but HD's and SSD's will. To have one database connection, which runs one query, run fast, it's going to need multi-core support.

That's not to say we need "parallel query's". Or we need multiple backends to work on one query. We need one backend, working on one query, using mostly the same architecture, to just use more than one core.

You'll notice I used _mostly_ and _just_, and have no knowledge of PG internals, so I fully expect to be wrong.

My point is, there must be levels of threading, yes? If a backend has data to sort, has it collected, nothing locked, what would it hurt to use multi-core sorting?

-- OR --

Threading (and multicore), to me, always mean queues. What if new type's of backend's were created that did "simple" things, that normal backends could distribute work to, then go off and do other things, and come back to collect the results.

I thought I read a paper someplace that said shared cache (L1/L2/etc) multicore cpu's would start getting really slow at 16/32 cores, and that message passing was the way forward past that. If PG started aiming for 128 core support right now, it should use some kinda message passing with queues thing, yes?

-Andy

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Greg Smith 2011-02-04 03:28:55 Re: [PERFORM] pgbench to the MAXINT
Previous Message Scott Marlowe 2011-02-04 03:13:22 Re: [HACKERS] Slow count(*) again...