jnasby(at)pervasive(dot)com ("Jim C. Nasby") writes:
> On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote:
>> Martha Stewart called it a Good Thing when smarlowe(at)g2switchworks(dot)com (Scott Marlowe) wrote:
>> > On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote:
>> >> > Has someone been working on the problem of splitting a query into pieces
>> >> > and running it on multiple CPUs / multiple machines? Yes. Bizgress has
>> >> > done that.
>> >> I believe that is limited to Bizgress MPP yes?
>> > Yep. I hope that someday it will be released to the postgresql global
>> > dev group for inclusion. Or at least parts of it.
>> Question: Does the Bizgress/MPP use threading for this concurrency?
>> Or forking?
>> If it does so via forking, that's more portable, and less dependent on
>> specific complexities of threading implementations (which amounts to
>> non-portability ;-)).
>> Most times Jan comes to town, we spend a few minutes musing about the
>> "splitting queries across threads" problem, and dismiss it again; if
>> there's the beginning of a "split across processes," that's decidedly
>> neat :-).
> Correct me if I'm wrong, but there's no way to (reasonably) accomplish
> that without having some dedicated extra processes laying around that
> you can use to execute the queries, no? In other words, the cost of a
> fork() during query execution would be too prohibitive...
The sort of scenario we keep musing about is where you split off a
(thread|process) for each partition of a big table. There is in fact
a natural such partitioning, in that tables get split at the 1GB mark,
Consider doing a join against 2 tables that are each 8GB in size
(e.g. - they consist of 8 data files). Let's assume that the query
plan indicates doing seq scans on both.
You *know* you'll be reading through 16 files, each 1GB in size.
Spawning a process for each of those files doesn't strike me as
A naive read on this is that you might start with one backend process,
which then spawns 16 more. Each of those backends is scanning through
one of those 16 files; they then throw relevant tuples into shared
memory to be aggregated/joined by the central one.
That particular scenario is one where the fork()s would hardly be
> FWIW, DB2 executes all queries in a dedicated set of processes. The
> process handling the connection from the client will pass a query
> request off to one of the executor processes. I can't remember which
> process actually plans the query, but I know that the executor runs
It seems to me that the kinds of cases where extra processes/threads
would be warranted are quite likely to be cases where fork()ing may be
an immaterial cost.
let name="cbbrowne" and tld="ntlug.org" in String.concat "@" [name;tld];;
TECO Madness: a moment of convenience, a lifetime of regret.
-- Dave Moon
In response to
pgsql-performance by date
|Next:||From: Chris Browne||Date: 2006-03-24 18:24:09|
|Subject: Re: Scaling up PostgreSQL in Multiple CPU / Dual Core|
|Previous:||From: Kris Jurka||Date: 2006-03-24 18:09:26|
|Subject: Re: [PERFORM] WAL logging of SELECT ... INTO command|