Re: [HACKERS] Mariposa

From: "Ross J(dot) Reedstrom" <reedstrm(at)wallace(dot)ece(dot)rice(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [HACKERS] Mariposa
Date: 1999-08-02 22:23:54
Message-ID: 19990802172354.B17969@wallace.ece.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 02, 1999 at 04:44:10PM -0400, Bruce Momjian wrote:
>
> We still have a directory called tioga which is also related to
> Mariposa. Basically, at the time, no one understood the academic stuff,
> and we had tons of bugs in general areas. We just didn't see any reason
> to keep around unusual features while our existing code was so poorly
> maintained from Berkeley.

The right thing to do, I concur. Get the basics stable and working well,
_then_ tack on the interesting stuff :-) A common complaint about us
academics: we only want to do the interesting stuff.

>
> The mariposa remote access features looked like they were heavily done
> in the executor directory. This makes sense assuming they wanted the
> access to be done remotely. They also tried to fix some things while
> doing Mariposa. A few of those fixes have been added over the years.
>

Right. As I've been able to make out so far, in Mariposa a query passes
through the regular parser and single-site optimizer, then the selected
plan tree is handed to a 'fragmenter' to break the work up into chunks,
which are then handed around to a 'broker' which uses a microeconomic
'bid' process to parcels them out to both local and remote executors. The
results from each site then go through a local 'coordinator' which merges
the result sets, and hands them back to the original client.

Whew!

It's interesting to compare the theory describing the workings of Mariposa
(such as the paper in VLDB), and the code. For the fragmenter, the paper
describes basically a rational decomposition of the plan, while the code
applies non-deterministic, but tuneable, methods (lots of calls to random
and comparisions to user specified odds ratios).

It strikes me as a bit odd to optimize the plan for a single site,
then break it all apart again. My thoughts on this are to implement
a two new node types: one a remote table, and one which represents
access to a remote table. Remote tables have host info in them, and
always be added to the plan with a remote-access node directly above
them. Remote-access nodes would be seperate from their remote-table,
to allow the communications cost to be slid up the plan tree, and merged
with other remote-access nodes talking to the same server. This should
maintain the order-agnostic nature of the optimizer. The executor will
need to build SQL statements and from the sub-plans and submit them via
standard network db access client librarys.

First step, create a remote-table node, and teach the excutor how to get
info from it. Later, add the seperable remote-access node.

How insane does this sound now? Am I still a mad scientist? (...always!)

Ross
--
Ross J. Reedstrom, Ph.D., <reedstrm(at)rice(dot)edu>
NSBRI Research Scientist/Programmer
Computer and Information Technology Institute
Rice University, 6100 S. Main St., Houston, TX 77005

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 1999-08-02 22:30:33 Re: [HACKERS] pg_upgrade may be mortally wounded
Previous Message Brett W. McCoy 1999-08-02 21:36:46 Re: your mail