Re: Parallel postgresql

From: Hans-Jürgen Schönig <hs(at)cybertec(dot)at>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
Cc: Martin Rusoff <mrusoff(at)columbus(dot)rr(dot)com>, pgsql-hackers(at)postgresql(dot)org, eg(at)cybertec(dot)at
Subject: Re: Parallel postgresql
Date: 2003-10-09 10:38:26
Message-ID: 3F853AA2.7080809@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:
> Martin Rusoff wrote:
>
>>I was just contemplating how to make postgres parallel (for DSS
>>applications)... Has anyone done work on this? It looks to me like there
>>are a couple of obvious places to add parallel operation:
>>
>>Stage 1) I/O , perhaps through MPIO - would improve tablescanning and
>>load/unload operations. One (or more) Postgresql servers would use
>>MPIO/ROMIO to access a parallel file system like PVFS or GPFS(IBM).
>>
>>Stage 2) Parallel Postgres Servers, with the postmaster spawning off the
>>server on a different node (possibly borrowing some code from GNU queue)
>>and doing any buffer twiddling with RPC for that connection, The client
>>connection would still be through the proxy on the postmaster node? (kind
>>of like MOSIX)
>
>
> One idea would be to throw parts of the executor (like a table sort) to
> different machines or to different processors on the same machine,
> perhaps via dblink. You could use threads to send several requests and
> wait for their results.
>
> Threading the entire backend would be hard, but we could thread some
> parts of it by having slave backends doing some of the work in parallel.

This would be nice - especially for huge queries needed in warehouses.
Maybe it could even make sense to do things in par. if there is just one
machine (e.g. computing a function while a sort process is waiting for
I/O or so).

Which operations can run in par.? What do you think?
I guess implementing something like that means 20 years more work on the
planner ...

By the way: NCR has a quite nice solution for problems like that.
Teradata has been designed to run everything on multiple nodes (they
call it AMPs).
Teradata has been designed for A LOT OF data and reporting purposes.
There are just three problems:
- not Open Source
- ~$70k / node
- runs on Windows and NCR's UNIX implementation.

Is anybody familiar with Teradata?

Hans

--
Cybertec Geschwinde u Schoenig
Ludo-Hartmannplatz 1/14, A-1160 Vienna, Austria
Tel: +43/2952/30706 or +43/660/816 40 77
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2003-10-09 11:06:46 Re: _GNU_SOURCE
Previous Message Andrew Sullivan 2003-10-09 09:53:38 Re: PostgreSQL vs. MySQL