Re: subselects

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: vadim(at)sable(dot)krasnoyarsk(dot)su (Vadim B(dot) Mikheev)
Cc: hackers(at)postgresql(dot)org
Subject: Re: subselects
Date: 1998-01-09 22:31:41
Message-ID: 199801092231.RAA24282@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> Bruce Momjian wrote:
> >
> > Vadim, I know you are still thinking about subselects, but I have some
> > more clarification that may help.
> >
> > We have to add phantom range table entries to correlated subselects so
> > they will pass the parser. We might as well add those fields to the
> > target list of the subquery at the same time:
> >
> > select *
> > from taba
> > where col1 = (select col2
> > from tabb
> > where taba.col3 = tabb.col4)
> >
> > becomes:
> >
> > select *
> > from taba
> > where col1 = (select col2, tabb.col4 <---
> > from tabb, taba <---
> > where taba.col3 = tabb.col4)
> >
> > We add a field to TargetEntry and RangeTblEntry to mark the fact that it
> > was entered as a correlation entry:
> >
> > bool isCorrelated;
>
> No, I don't like to add anything in parser. Example:
>
> select *
> from tabA
> where col1 = (select col2
> from tabB
> where tabA.col3 = tabB.col4
> and exists (select *
> from tabC
> where tabB.colX = tabC.colX and
> tabC.colY = tabA.col2)
> )
>
> : a column of tabA is referenced in sub-subselect

This is a strange case that I don't think we need to handle in our first
implementation.

> (is it allowable by standards ?) - in this case it's better
> to don't add tabA to 1st subselect but add tabA to second one
> and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> this gives us 2-tables join in 1st subquery instead of 3-tables join.
> (And I'm still not sure that using temp tables is best of what can be
> done in all cases...)

I don't see any use for temp tables in subselects anymore. After having
implemented UNIONS, I now see how much can be done in the upper
optimizer. I see you just putting the subquery PLAN into the proper
place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

>
> Instead of using isCorrelated in TE & RTE we can add
>
> Index varlevel;

OK. Sounds good.

>
> to Var node to reflect (sub)query from where this Var is come
> (where is range table to find var's relation using varno). Upmost query
> will have varlevel = 0, all its (dirrect) children - varlevel = 1 and so on.
> ^^^ ^^^^^^^^^^^^
> (I don't see problems with distinguishing Vars of different children
> on the same level...)
>
> >
> > Second, we need to hook the subselect to the main query. I recommend we
> > add two fields to Query for this:
> >
> > Query *parentQuery;
> > List *subqueries;
>
> Agreed. And maybe Index queryLevel.

Sure. If it helps.

>
> > In the parent query, to parse the WHERE clause, we create a new operator
> > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> ^^^^^^^^^^^^^^^^^^
> No. We have to handle (a,b,c) OP (select x, y, z ...) and
> '_a_constant_' OP (select ...) - I don't know is last in standards,
> Sybase has this.

I have never seen this in my eight years of SQL. Perhaps we can leave
this for later, maybe much later.

>
> Well,
>
> typedef enum OpType
> {
> OP_EXPR, FUNC_EXPR, OR_EXPR, AND_EXPR, NOT_EXPR
>
> + OP_EXISTS, OP_ALL, OP_ANY
>
> } OpType;
>
> typedef struct Expr
> {
> NodeTag type;
> Oid typeOid; /* oid of the type of this expr */
> OpType opType; /* type of the op */
> Node *oper; /* could be Oper or Func */
> List *args; /* list of argument nodes */
> } Expr;
>
> OP_EXISTS: oper is NULL, lfirst(args) is SubSelect (index in subqueries
> List, following your suggestion)
>
> OP_ALL, OP_ANY:
>
> oper is List of Oper nodes. We need in list because of data types of
> a, b, c (above) can be different and so Oper nodes will be different too.
>
> lfirst(args) is List of expression nodes (Const, Var, Func ?, a + b ?) -
> left side of subquery' operator.
> lsecond(args) is SubSelect.
>
> Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> by parser into corresponding ANY and ALL. At the moment we can do:
>
> IN --> = ANY, NOT IN --> <> ALL
>
> but this will be "known bug": this breaks OO-nature of Postgres, because of
> operators can be overrided and '=' can mean s o m e t h i n g (not equality).
> Example: box data type. For boxes, = means equality of _areas_ and =~
> means that boxes are the same ==> =~ ANY should be used for IN.

That is interesting, to use =~ for ANY.

Yes, but how many operators take a SUBQUERY as an operand. This is a
special case to me.

I think I see where you are trying to go. You want subselects to behave
like any other operator, with a subselect type, and you do all the
subselect handling in the optimizer, with special Nodes and actions.

I think this may be just too much of a leap. We have such clean query
logic for single queries, I can't imagine having an operator that has a
Query operand, and trying to get everything to properly handle it.
UNIONS were very easy to implement as a List off of Query, with some
foreach()'s in rewrite and the high optimizer.

Subselects are SQL standard, and are never going to be over-ridden by a
user. Same with UNION. They want UNION, they get UNION. They want
Subselect, we are going to spin through the Query structure and give
them what they want.

The complexities of subselects and correlated queries and range tables
and stuff is so bizarre that trying to get it to work inside the type
system could be a huge project.

>
> > right side is an index to a slot in the subqueries List.

I guess the question is what can we have by February 1?

I have been reading some postings, and it seems to me that subselects
are the litmus test for many evaluators when deciding if a database
engine is full-featured.

Sorry to be so straightforward, but I want to keep hashing this around
until we get a conclusion, so coding can start.

My suggestions have been, I believe, trying to get subselects working
with the fullest functionality by adding the least amount of code, and
keeping the logic clean.

Have you checked out the UNION code? It is very small, but it works. I
think it could make a good sample for subselects.

--
Bruce Momjian
maillist(at)candle(dot)pha(dot)pa(dot)us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shiby Thomas 1998-01-09 22:40:45 Re: [HACKERS] Disk block size issues.
Previous Message Darren King 1998-01-09 22:24:04 Re: [HACKERS] Disk block size issues.