Re: subselects

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: vadim(at)sable(dot)krasnoyarsk(dot)su (Vadim B(dot) Mikheev)
Cc: hackers(at)postgresql(dot)org, lockhart(at)alumni(dot)caltech(dot)edu
Subject: Re: subselects
Date: 1998-01-11 05:59:23
Message-ID: 199801110559.AAA11801@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I would like to have something done in parser near Jan 17 to get
> subqueries working by Feb 1. I vote for support of all standard
> things (1. - 3.) in parser right now - if there will be no time
> to implement something like (a, b, c) then optimizer will call
> elog(WARN) (oh, sorry, - elog(ERROR)).

First, let me say I am glad we are still on schedule for Feb 1. I was
panicking because I thought we wouldn't make it in time.

> > > (is it allowable by standards ?) - in this case it's better
> > > to don't add tabA to 1st subselect but add tabA to second one
> > > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > > (And I'm still not sure that using temp tables is best of what can be
> > > done in all cases...)
> >
> > I don't see any use for temp tables in subselects anymore. After having
> > implemented UNIONS, I now see how much can be done in the upper
> > optimizer. I see you just putting the subquery PLAN into the proper
> > place in the plan tree, with some proper JOIN nodes for IN, NOT IN.
>
> When saying about temp tables, I meant tables created by node Material
> for subquery plan. This is one of two ways - run subquery once for all
> possible upper plan tuples and then just join result table with upper
> query. Another way is re-run subquery for each upper query tuple,
> without temp table but may be with caching results by some ways.
> Actually, there is special case - when subquery can be alternatively
> formulated as joins, - but this is just special case.

This is interesting. It really only applies for correlated subqueries,
and certainly it may help sometimes to just evaluate the subquery for
valid values that are going to come from the upper query than for all
possible values. Perhaps we can use the 'cost' value of each query to
decide how to handle this.

>
> > > > In the parent query, to parse the WHERE clause, we create a new operator
> > > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> > > ^^^^^^^^^^^^^^^^^^
> > > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > > Sybase has this.
> >
> > I have never seen this in my eight years of SQL. Perhaps we can leave
> > this for later, maybe much later.
>
> Are you saying about (a, b, c) or about 'a_constant' ?
> Again, can someone comment on are they in standards or not ?
> Tom ?
> If yes then please add parser' support for them now...

OK, Thomas says it is, so we will put in as much code as we can to handle
it.

> Should we say users that subselect will work for standard data types only ?
> I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
> Is there difference between handling = ANY and ~ ANY ? I don't see any.
> Currently we can't get IN working properly for boxes (and may be for others too)
> and I don't like to try to resolve these problems now, but hope that someday
> we'll be able to do this. At the moment - just convert IN into = ANY and
> NOT IN into <> ALL in parser.

OK.

>
> (BTW, do you know how DISTINCT is implemented ? It doesn't use = but
> use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

I did not know that either.

> There is big difference between subqueries and queries in UNION -
> there are not dependences between UNION queries.

Yes, I know UNIONS are trivial compared to subselects.

>
> Ok, opened issues:
>
> 1. Is using upper query' vars in all subquery levels in standard ?
> 2. Is (a, b, c) OP (subselect) in standard ?
> 3. What types of expressions (Var, Const, ...) are allowed on the left
> side of operator with subquery on the right ?
> 4. What types of operators should we support (=, >, ..., like, ~, ...) ?
> (My vote for all boolean operators).
>
> And - did we get consensus on presentation subqueries stuff in Query,
> Expr and Var ?

OK, here are my concrete ideas on changes and structures.

I think we all agreed that Query needs new fields:

Query *parentQuery;
List *subqueries;

Maybe query level too, but I don't think so (see later ideas on Var).

We need a new Node structure, call it Sublink:

int linkType (IN, NOTIN, ANY, EXISTS, OPERATOR...)
Oid operator /* subquery must return single row */
List *lefthand; /* parent stuff */
Node *subquery; /* represents nodes from parser */
Index Subindex; /* filled in to index Query->subqueries */

Of course, the names are just suggestions. Every time we run through
the parsenodes of a query to create a Query* structure, when we do the
WHERE clause, if we come upon one of these Sublink nodes (created in the
parser), we move the supplied Query* in Sublink->subquery to a local
List variable, and we set Subquery->subindex to equal the index of the
new query, i.e. is it the first subquery we found, 1, or the second, 2,
etc.

After we have created the parent Query structure, we run through our
local List variable of subquery parsenodes we created above, and add
Query* entries to Query->subqueries. In each subquery Query*, we set
the parentQuery pointer.

Also, when parsing the subqueries, we need to keep track of correlated
references. I recommend we add a field to the Var structure:

Index sublevel; /* range table reference:
= 0 current level of query
< 0 parent above this many levels
> 0 index into subquery list
*/

This way, a Var node with sublevel 0 is the current level, and is true
in most cases. This helps us not have to change much code. sublevel =
-1 means it references the range table in the parent query. sublevel =
-2 means the parent's parent. sublevel = 2 means it references the range
table of the second entry in Query->subqueries. Varno and varattno are
still meaningful. Of course, we can't reference variables in the
subqueries from the parent in the parser code, but Vadim may want to.

When doing a Var lookup in the parser, we look in the current level
first, but if not found, if it is a subquery, we can look at the parent
and parent's parent to set the sublevel, varno, and varatno properly.

We create no phantom range table entries in the subquery, and no phantom
target list entries. We can leave that all for the upper optimizer.

--
Bruce Momjian
maillist(at)candle(dot)pha(dot)pa(dot)us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas G. Lockhart 1998-01-11 06:10:27 Re: [HACKERS] 6.3 patches
Previous Message Thomas G. Lockhart 1998-01-11 05:58:01 Re: [HACKERS] Re: subselects