Re: [HACKERS] Subselects are in CVS...

From: Bruce Momjian <maillist(at)candle(dot)pha(dot)pa(dot)us>
To: vadim(at)sable(dot)krasnoyarsk(dot)su (Vadim B(dot) Mikheev)
Cc: hackers(at)postgreSQL(dot)org
Subject: Re: [HACKERS] Subselects are in CVS...
Date: 1998-02-13 20:57:34
Message-ID: 199802132058.PAA20488@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Want to add this as a README in the optimizer/planner directory?

>
> This is some implementation notes and opened issues...
>
> First, implementation uses new type of parameters - PARAM_EXEC - to deal
> with correlation Vars. When query_planner() is called, it first tries to
> replace all upper queries Var referenced in current query with Param of
> this type. Some global variables are used to keep mapping of Vars to
> Params and Params to Vars.
>
> After this, all current query' SubLinks are processed: for each SubLink
> found in query' qual union_planner() (old planner() function) will be
> called to plan corresponding subselect (union_planner() calls
> query_planner() for "simple" query and supports UNIONs). After subselect
> are planned, optimizer knows about is this correlated, un-correlated or
> _undirect_ correlated (references some grand-parent Vars but no parent
> ones: uncorrelated from the parent' point of view) query.
>
> For uncorrelated and undirect correlated subqueries of EXPRession or
> EXISTS type SubLinks will be replaced with "normal" clauses from
> SubLink->Oper list (I changed this list to be list of EXPR nodes,
> not just Oper ones). Right sides of these nodes are replaced with
> PARAM_EXEC parameters. This is second use of new parameter type.
> At run-time these parameters get value from result of subquery
> evaluation (i.e. - from target list of subquery). Execution plan of
> subquery itself becomes init plan of parent query. InitPlan knows
> what parameters are to get values from subquery' results and will be
> executed "on-demand" (for query select * from table where x > 0 and
> y > (select max(a) from table_a) subquery will not be executed at all
> if there are no tuples with x > 0 _and_ y is not used in index scan).
>
> SubLinks for subqueries of all other types are transformed into
> new type of Expr node - SUBPLAN_EXPR. Expr->args are just correlation
> variables from _parent_ query. Expr->oper is new SubPlan node.
>
> This node is used for InitPlan too. It keeps subquery range table,
> indices of Params which are to get value from _parent_ query Vars
> (i.e. - from Expr->args), indices of Params into which subquery'
> results are to be substituted (this is for InitPlans), SubLink
> and subquery' execution plan.
>
> Plan node was changed to know about dependencies on Params from
> parent queries and InitPlans, to keep list of changed Params
> (from the above) and so be re-scanned if this list is not NULL.
> Also, added list of InitPlans (actually, all of them for current
> query are in topmost plan node now) and other SubPlans (from
> plan->qual) - to initialize them and let them know about changed
> Params (from the list of their "interests").
>
> After all SubLinks are processed, query_planner() calls qual'
> canonificator and does "normal" work. By using Params optimizer
> is mostly unchanged.
>
> Well, Executor. To get subplans re-evaluated without ExecutorStart()
> and ExecutorEnd() (without opening and closing relations and indices
> and without many palloc() and pfree() - this is what SQL-funcs does
> on each call) ExecReScan() now supports most of Plan types...
>
> Explanation of EXPLAIN.
>
> vac=> explain select * from tmp where x >= (select max(x2) from test2
> where y2 = y and exists (select * from tempx where tx = x));
> NOTICE: QUERY PLAN:
>
> Seq Scan on tmp (cost=40.03 size=101 width=8)
> SubPlan
> ^^^^^^^ subquery is in Seq Scan' qual, its plan is below
> -> Aggregate (cost=2.05 size=0 width=0)
> InitPlan
> ^^^^^^^^ EXISTS subsubquery is InitPlan of subquery
> -> Seq Scan on tempx (cost=4.33 size=1 width=4)
> -> Result (cost=2.05 size=0 width=0)
> ^^^^^^ EXISTS subsubquery was transformed into Param
> and so we have Result node here
> -> Index Scan on test2 (cost=2.05 size=1 width=4)
>
>
> Opened issues.
>
> 1. No read permissions checking (easy, just not done yet).
> 2. readfuncs.c can't read subplan-s (easy, not critical, because of
> we currently nowhere use ascii representation of execution plans).
> 3. ExecReScan() doesn't support all plan types. At least support for
> MergeJoin has to be implemented.
> 4. Memory leaks in ExecReScan().
> 5. I need in advice: if subquery introduced with NOT IN doesn't return
> any tuples then qualification is failed, yes ?
> 6. Regression tests !!!!!!!!!!!!!!!!!!!!
> (Could we use data/queries from MySQL' crash.me ?
> Copyright-ed ? Could they give us rights ?)
> 7. Performance.
> - Should be good when subquery is transformed into InitPlan.
> - Something should be done for uncorrelated subqueries introduced
> with ANY/ALL - keep thinking. Currently, subplan will be re-scanned
> for each parent tuple - very slow...
>
> Results of some test. TMP is table with x,y (int4-s), x in 0-9,
> y = 100 - x, 1000 tuples (10 duplicates of each tuple). TEST2 is table
> with x2, y2 (int4-s), x2 in 1-99, y2 = 100 -x2, 10000 tuples (100 dups).
>
> Trying
>
> select * from tmp where x >= (select max(x2) from test2 where y2 = y);
>
> and
>
> begin;
> select y as ty, max(x2) as mx into table tsub from test2, tmp
> where y2 = y group by ty;
> vacuum tsub;
> select x, y from tmp, tsub where x >= mx and y = ty;
> drop table tsub;
> end;
>
> Without index on test2(y2):
>
> SubSelect -> 320 sec
> Using temp table -> 32 sec
>
> Having index
>
> SubSelect -> 17 sec (2M of memory)
> Using temp table -> 32 sec (12M of memory: -S 8192)
>
> Vadim
>
>

--
Bruce Momjian
maillist(at)candle(dot)pha(dot)pa(dot)us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 1998-02-13 21:29:27 Open 6.3 issues
Previous Message Kent S. Gordon 1998-02-13 20:55:17 Re: [HACKERS] postgres initdb on ALPHA/Digital Unix