Skip site navigation (1) Skip section navigation (2)

Re: Scaling up PostgreSQL in Multiple CPU / Dual Core

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: Scaling up PostgreSQL in Multiple CPU / Dual Core
Date: 2006-03-24 18:21:23
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
jnasby(at)pervasive(dot)com ("Jim C. Nasby") writes:
> On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
>> Martha Stewart called it a Good Thing when smarlowe(at)g2switchworks(dot)com (Scott Marlowe) wrote:
>> > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
>> >> > Has someone been working on the problem of splitting a query into pieces
>> >> > and running it on multiple CPUs / multiple machines?  Yes.  Bizgress has
>> >> > done that.  
>> >> 
>> >> I believe that is limited to Bizgress MPP yes?
>> >
>> > Yep.  I hope that someday it will be released to the postgresql global
>> > dev group for inclusion.  Or at least parts of it.
>> Question: Does the Bizgress/MPP use threading for this concurrency?
>> Or forking?
>> If it does so via forking, that's more portable, and less dependent on
>> specific complexities of threading implementations (which amounts to
>> non-portability ;-)).
>> Most times Jan comes to town, we spend a few minutes musing about the
>> "splitting queries across threads" problem, and dismiss it again; if
>> there's the beginning of a "split across processes," that's decidedly
>> neat :-).
> Correct me if I'm wrong, but there's no way to (reasonably) accomplish
> that without having some dedicated extra processes laying around that
> you can use to execute the queries, no? In other words, the cost of a
> fork() during query execution would be too prohibitive...


The sort of scenario we keep musing about is where you split off a
(thread|process) for each partition of a big table.  There is in fact
a natural such partitioning, in that tables get split at the 1GB mark,
by default.

Consider doing a join against 2 tables that are each 8GB in size
(e.g. - they consist of 8 data files).  Let's assume that the query
plan indicates doing seq scans on both.

You *know* you'll be reading through 16 files, each 1GB in size.
Spawning a process for each of those files doesn't strike me as
"prohibitively expensive."

A naive read on this is that you might start with one backend process,
which then spawns 16 more.  Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.

That particular scenario is one where the fork()s would hardly be

> FWIW, DB2 executes all queries in a dedicated set of processes. The
> process handling the connection from the client will pass a query
> request off to one of the executor processes. I can't remember which
> process actually plans the query, but I know that the executor runs
> it.

It seems to me that the kinds of cases where extra processes/threads
would be warranted are quite likely to be cases where fork()ing may be
an immaterial cost.
let name="cbbrowne" and tld="" in String.concat "@" [name;tld];;
TECO Madness: a moment of convenience, a lifetime of regret.
-- Dave Moon

In response to


pgsql-performance by date

Next:From: Chris BrowneDate: 2006-03-24 18:24:09
Subject: Re: Scaling up PostgreSQL in Multiple CPU / Dual Core
Previous:From: Kris JurkaDate: 2006-03-24 18:09:26
Subject: Re: [PERFORM] WAL logging of SELECT ... INTO command

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group