Re: subselects

From: "Vadim B(dot) Mikheev" <vadim(at)sable(dot)krasnoyarsk(dot)su>
To: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
Cc: hackers(at)postgresql(dot)org, "Thomas G(dot) Lockhart" <lockhart(at)alumni(dot)caltech(dot)edu>
Subject: Re: subselects
Date: 1998-01-10 17:19:08
Message-ID: 34B7AD8C.5ED59CB5@sable.krasnoyarsk.su
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Bruce Momjian wrote:
>
> > No, I don't like to add anything in parser. Example:
> >
> > select *
> > from tabA
> > where col1 = (select col2
> > from tabB
> > where tabA.col3 = tabB.col4
> > and exists (select *
> > from tabC
> > where tabB.colX = tabC.colX and
> > tabC.colY = tabA.col2)
> > )
> >
> > : a column of tabA is referenced in sub-subselect
>
> This is a strange case that I don't think we need to handle in our first
> implementation.

I don't know is this strange case or not :)
But I would like to know is this allowed by standards - can someone
comment on this ?
And I don't see problems with handling this...

>
> > (is it allowable by standards ?) - in this case it's better
> > to don't add tabA to 1st subselect but add tabA to second one
> > and change tabA.col3 in 1st to reference col3 in 2nd subquery temp table -
> > this gives us 2-tables join in 1st subquery instead of 3-tables join.
> > (And I'm still not sure that using temp tables is best of what can be
> > done in all cases...)
>
> I don't see any use for temp tables in subselects anymore. After having
> implemented UNIONS, I now see how much can be done in the upper
> optimizer. I see you just putting the subquery PLAN into the proper
> place in the plan tree, with some proper JOIN nodes for IN, NOT IN.

When saying about temp tables, I meant tables created by node Material
for subquery plan. This is one of two ways - run subquery once for all
possible upper plan tuples and then just join result table with upper
query. Another way is re-run subquery for each upper query tuple,
without temp table but may be with caching results by some ways.
Actually, there is special case - when subquery can be alternatively
formulated as joins, - but this is just special case.

> > > In the parent query, to parse the WHERE clause, we create a new operator
> > > type, called IN or NOT_IN, or ALL, where the left side is a Var, and the
> > ^^^^^^^^^^^^^^^^^^
> > No. We have to handle (a,b,c) OP (select x, y, z ...) and
> > '_a_constant_' OP (select ...) - I don't know is last in standards,
> > Sybase has this.
>
> I have never seen this in my eight years of SQL. Perhaps we can leave
> this for later, maybe much later.

Are you saying about (a, b, c) or about 'a_constant' ?
Again, can someone comment on are they in standards or not ?
Tom ?
If yes then please add parser' support for them now...

> > Note, that there are no OP_IN, OP_NOTIN in OpType-s for Expr. We need in
> > IN, NOTIN in A_Expr (parser node), but both of them have to be transferred
> > by parser into corresponding ANY and ALL. At the moment we can do:
> >
> > IN --> = ANY, NOT IN --> <> ALL
> >
> > but this will be "known bug": this breaks OO-nature of Postgres, because of
> > operators can be overrided and '=' can mean s o m e t h i n g (not equality).
> > Example: box data type. For boxes, = means equality of _areas_ and =~
> > means that boxes are the same ==> =~ ANY should be used for IN.
>
> That is interesting, to use =~ for ANY.
>
> Yes, but how many operators take a SUBQUERY as an operand. This is a
> special case to me.
>
> I think I see where you are trying to go. You want subselects to behave
> like any other operator, with a subselect type, and you do all the
> subselect handling in the optimizer, with special Nodes and actions.
>
> I think this may be just too much of a leap. We have such clean query
> logic for single queries, I can't imagine having an operator that has a
> Query operand, and trying to get everything to properly handle it.
> UNIONS were very easy to implement as a List off of Query, with some
> foreach()'s in rewrite and the high optimizer.
>
> Subselects are SQL standard, and are never going to be over-ridden by a
> user. Same with UNION. They want UNION, they get UNION. They want
> Subselect, we are going to spin through the Query structure and give
> them what they want.
>
> The complexities of subselects and correlated queries and range tables
> and stuff is so bizarre that trying to get it to work inside the type
> system could be a huge project.

PostgreSQL is a robust, next-generation, Object-Relational DBMS (ORDBMS),
derived from the Berkeley Postgres database management system. While
PostgreSQL retains the powerful object-relational data model, rich data types and
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
easy extensibility of Postgres, it replaces the PostQuel query language with an
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
extended subset of SQL.
^^^^^^^^^^^^^^^^^^^^^^

Should we say users that subselect will work for standard data types only ?
I don't see why subquery can't be used with ~, ~*, @@, ... operators, do you ?
Is there difference between handling = ANY and ~ ANY ? I don't see any.
Currently we can't get IN working properly for boxes (and may be for others too)
and I don't like to try to resolve these problems now, but hope that someday
we'll be able to do this. At the moment - just convert IN into = ANY and
NOT IN into <> ALL in parser.

(BTW, do you know how DISTINCT is implemented ? It doesn't use = but
use type_out funcs and uses strcmp()... DISTINCT is standard SQL thing...)

> >
> > > right side is an index to a slot in the subqueries List.
>
> I guess the question is what can we have by February 1?
>
> I have been reading some postings, and it seems to me that subselects
> are the litmus test for many evaluators when deciding if a database
> engine is full-featured.
>
> Sorry to be so straightforward, but I want to keep hashing this around
> until we get a conclusion, so coding can start.
>
> My suggestions have been, I believe, trying to get subselects working
> with the fullest functionality by adding the least amount of code, and
> keeping the logic clean.
>
> Have you checked out the UNION code? It is very small, but it works. I
> think it could make a good sample for subselects.

There is big difference between subqueries and queries in UNION -
there are not dependences between UNION queries.

Ok, opened issues:

1. Is using upper query' vars in all subquery levels in standard ?
2. Is (a, b, c) OP (subselect) in standard ?
3. What types of expressions (Var, Const, ...) are allowed on the left
side of operator with subquery on the right ?
4. What types of operators should we support (=, >, ..., like, ~, ...) ?
(My vote for all boolean operators).

And - did we get consensus on presentation subqueries stuff in Query,
Expr and Var ?
I would like to have something done in parser near Jan 17 to get
subqueries working by Feb 1. I vote for support of all standard
things (1. - 3.) in parser right now - if there will be no time
to implement something like (a, b, c) then optimizer will call
elog(WARN) (oh, sorry, - elog(ERROR)).

Vadim

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vadim B. Mikheev 1998-01-10 17:41:19 Re: [HACKERS] subselects
Previous Message Thomas G. Lockhart 1998-01-10 16:34:25 Re: [HACKERS] grant broken