Re: Exposing quals

From: Jan Wieck <JanWieck(at)Yahoo(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Exposing quals
Date: 2008-07-08 18:41:45
Message-ID: 4873B4E9.4000203@Yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 7/8/2008 11:38 AM, David Fetter wrote:
> On Tue, Jul 08, 2008 at 06:22:23PM +0300, Heikki Linnakangas wrote:
>> Simon Riggs wrote:
>>> SQL, in text form, is the interface to other databases. You can't
>>> pass half a plan tree to Oracle, especially not a PostgreSQL plan
>>> tree. It has to be text if you wish to send a query to another
>>> RDBMS, or another version of PostgreSQL.
>>
>> Oh, I see. Agreed.
>>
>> Though note that there's big differences in SQL dialects, so a
>> one-size-fits-all approach to generating SQL to be executed in the
>> remote database won't work. (not that I think anyone has suggested
>> that)
>>
>>> So if I understand you, you want to pass the partial plan tree and
>>> then have a plugin construct the SQL text.
>>
>> Exactly.
>>
>>> Maybe you thought I meant internal interfaces should be in text?
>>
>> Yeah, that's exactly what I thought you meant.
>>
>>> No, that would be bizarre. I meant we should not attempt to pass
>>> partial plan trees outside of the database, since that would limit
>>> the feature to only working with the same version of PostgreSQL
>>> database.
>>
>> Agreed. I'm glad we're on the same page now.
>
> Everybody's weighed in on this thread except the guy who's actually
> doing the work.
>
> Jan?

Here,

I talked to my supervisor here in Toronto (that's where I am this week)
and Afilias actually sees enough value in this for me to go and spend
time officially on it.

The ideas I have so far are as follows:

Down in the exec nodes like SeqScan or IndexScan, there are several
parts available that are important.

- Scanned relation
- Targetlist
- Filter (for SeqScan)
- IndexQual (for IndexScan)

These pieces are available at least in the scans Init function and
actually can be converted back into some SQL statement that effectively
represents this one single table scan. However, parsing it back at that
point is nonsense, as we cannot expect everything out there to actually
be an SQL database.

Also, both the qualification as well as the targetlist can contain
things like user defined function calls. We neither want to deny nor
require that this sort of construct is actually handed over to the
external data source, so the interface needs to be more flexible.
Therefore it is best to divide the functionality into several user exit
functions.

The several functions that implement a scan type inside of the executor
very much resemble opening a cursor for a single table query, fetching
rows from it, eventually (in the case of a nested loop for example)
close and reopen the cursor with different key values from the outer
tuple, close the cursor. So it makes total sense to actually require an
implementation of an external data source to provide functions to open a
cursor, fetch rows, close the cursor.

There will be some connection and transaction handling around all this
that I have in mind but think it would distract from the problem to be
solved right here, so more about that another time.

The C implementation for open cursor would be called with a scan handle,
containing the connection, the relname, the targetlist and the
qualification subtrees. These are modified from the real ones in the
scan node so that all Var's have varno=1 and that all OUTER Var's have
been replaced with a Const that reflects the current outer tuples
values. From here there are several support functions available to "dumb
down" each of those to whatever the external data source may support. In
case of the targetlist, this could mean to filter out a unique list of
Var nodes only, removing all expressions from it. In case of the
qualification, this could mean remove everything that isn't a standard
operator (=, <>, ...), or remove everything that isn't Postgres builtin.
Finally, there is a support function that will build a SQL statement
according to what's left inside that scan handle.

The scan handle would track which modifications have been done to the
various pieces so that the outer support framework knows if it gets back
the originally requested targetlist, or if it has to run the projection
on the returned unique list of Var's. And if it has to recheck the
returned tuples for qualification, because some of the qual's had been
removed.

In order to allow the user exits to be written in PL's, I can think of
makiing a complex data type containing the scan handle. The subtrees
could be accessed by the PL via support functions that return them in
nodeToString() or other formats.

I'll try to write up a more complete proposal until end of next week.

Jan

--
Anyone who trades liberty for security deserves neither
liberty nor security. -- Benjamin Franklin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Teodor Sigaev 2008-07-08 18:43:20 Re: [PATCHES] GIN improvements
Previous Message Tom Lane 2008-07-08 18:23:15 Re: Identifier case folding notes