From: | Hans-Jürgen Schönig <hs(at)cybertec(dot)at> |
---|---|
To: | Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us> |
Cc: | Martin Rusoff <mrusoff(at)columbus(dot)rr(dot)com>, pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at |
Subject: | Re: Parallel postgresql |
Date: | 2003-10-09 10:38:26 |
Message-ID: | 3F853AA2.7080809@cybertec.at |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Bruce Momjian wrote:
> Martin Rusoff wrote:
>
>>I was just contemplating how to make postgres parallel (for DSS
>>applications)... Has anyone done work on this? It looks to me like there
>>are a couple of obvious places to add parallel operation:
>>
>>Stage 1) I/O , perhaps through MPIO - would improve tablescanning and
>>load/unload operations. One (or more) Postgresql servers would use
>>MPIO/ROMIO to access a parallel file system like PVFS or GPFS(IBM).
>>
>>Stage 2) Parallel Postgres Servers, with the postmaster spawning off the
>>server on a different node (possibly borrowing some code from GNU queue)
>>and doing any buffer twiddling with RPC for that connection, The client
>>connection would still be through the proxy on the postmaster node? (kind
>>of like MOSIX)
>
>
> One idea would be to throw parts of the executor (like a table sort) to
> different machines or to different processors on the same machine,
> perhaps via dblink. You could use threads to send several requests and
> wait for their results.
>
> Threading the entire backend would be hard, but we could thread some
> parts of it by having slave backends doing some of the work in parallel.
This would be nice - especially for huge queries needed in warehouses.
Maybe it could even make sense to do things in par. if there is just one
machine (e.g. computing a function while a sort process is waiting for
I/O or so).
Which operations can run in par.? What do you think?
I guess implementing something like that means 20 years more work on the
planner ...
By the way: NCR has a quite nice solution for problems like that.
Teradata has been designed to run everything on multiple nodes (they
call it AMPs).
Teradata has been designed for A LOT OF data and reporting purposes.
There are just three problems:
- not Open Source
- ~$70k / node
- runs on Windows and NCR's UNIX implementation.
Is anybody familiar with Teradata?
Hans
--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Eisentraut | 2003-10-09 11:06:46 | Re: _GNU_SOURCE |
Previous Message | Andrew Sullivan | 2003-10-09 09:53:38 | Re: PostgreSQL vs. MySQL |