Re: FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
Date: 2008-03-24 17:44:08
Message-ID: 200803241744.m2OHi8b08048@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance


Patch rejected because of Tom Lane's comments:

> This is useless since it does not represent a complete patch; the
> provided code calls a lot of Greenplum-private routines that weren't
> provided. It's not even reviewable let alone a candidate to apply.

---------------------------------------------------------------------------

Luke Lonergan wrote:
> Below is a patch against Greenplum Database that fixes the problem.
>
> - Luke
>
> ------ Forwarded Message
> From: Luke Lonergan <llonergan(at)greenplum(dot)com>
> Date: Fri, 24 Aug 2007 09:25:53 -0700
> To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Anton <anton200(at)gmail(dot)com>
> Cc: <pgsql-performance(at)postgresql(dot)org>
> Conversation: [PERFORM] partitioned table and ORDER BY indexed_field DESC
> LIMIT 1
> Subject: Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC
> LIMIT 1
>
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
>
> This enables the use of index scan of a child table by recognizing sort
> order of the append node. Kurt Harriman did the work.
>
> - Luke
>
> Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/indxpath.c,v
> diff -u -N -r1.22 -r1.22.2.1
> --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21
> -0000 1.22
> +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15
> -0000 1.22.2.1
> @@ -379,8 +379,51 @@
> index_pathkeys = build_index_pathkeys(root, index,
> ForwardScanDirection,
> true);
> - useful_pathkeys = truncate_useless_pathkeys(root, rel,
> - index_pathkeys);
> + /*
> + * CDB: For appendrel child, pathkeys contain Var nodes in
> terms
> + * of the child's baserel. Transform the pathkey list to refer
> to
> + * columns of the appendrel.
> + */
> + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> + {
> + AppendRelInfo *appinfo = NULL;
> + RelOptInfo *appendrel = NULL;
> + ListCell *appcell;
> + CdbPathLocus notalocus;
> +
> + /* Find the appendrel of which this baserel is a child. */
> + foreach(appcell, root->append_rel_list)
> + {
> + appinfo = (AppendRelInfo *)lfirst(appcell);
> + if (appinfo->child_relid == rel->relid)
> + break;
> + }
> + Assert(appinfo);
> + appendrel = find_base_rel(root, appinfo->parent_relid);
> +
> + /*
> + * The pathkey list happens to have the same format as the
> + * partitioning key of a Hashed locus, so by disguising it
> + * we can use cdbpathlocus_pull_above_projection() to do
> the
> + * transformation.
> + */
> + CdbPathLocus_MakeHashed(&notalocus, index_pathkeys);
> + notalocus =
> + cdbpathlocus_pull_above_projection(root,
> + notalocus,
> + rel->relids,
> + rel->reltargetlist,
> +
> appendrel->reltargetlist,
> + appendrel->relid);
> + if (CdbPathLocus_IsHashed(notalocus))
> + index_pathkeys = truncate_useless_pathkeys(root,
> appendrel,
> +
> notalocus.partkey);
> + else
> + index_pathkeys = NULL;
> + }
> +
> + useful_pathkeys = truncate_useless_pathkeys(root, rel,
> + index_pathkeys);
> }
> else
> useful_pathkeys = NIL;
> Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/pathkeys.c,v
> diff -u -N -r1.18 -r1.18.2.1
> --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07
> -0000 1.18
> +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15
> -0000 1.18.2.1
> @@ -1403,55 +1403,53 @@
> {
> PathKeyItem *item;
> Expr *newexpr;
> + AttrNumber targetindex;
>
> Assert(pathkey);
>
> - /* Use constant expr if available. Will be at head of list. */
> - if (CdbPathkeyEqualsConstant(pathkey))
> + /* Find an expr that we can rewrite to use the projected columns. */
> + item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> + relids,
> + targetlist,
> + &targetindex); // OUT
> +
> + /* If not found, see if the equiv class contains a constant expr. */
> + if (!item &&
> + CdbPathkeyEqualsConstant(pathkey))
> {
> item = (PathKeyItem *)linitial(pathkey);
> newexpr = (Expr *)copyObject(item->key);
> }
>
> - /* New vars for old! */
> - else
> - {
> - AttrNumber targetindex;
> + /* Fail if no usable expr. */
> + else if (!item)
> + return NULL;
>
> - /* Find an expr that we can rewrite to use the projected columns.
> */
> - item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> - relids,
> - targetlist,
> - &targetindex); // OUT
> - if (!item)
> - return NULL;
> + /* If found matching targetlist item, make a Var that references it. */
> + else if (targetindex > 0)
> + newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> + targetindex,
> + newvarlist,
> + (Expr *)item->key);
>
> - /* If found matching targetlist item, make a Var that references
> it. */
> - if (targetindex > 0)
> - newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> - targetindex,
> - newvarlist,
> - (Expr *)item->key);
> + /* Replace expr's Var nodes with new ones referencing the targetlist.
> */
> + else
> + newexpr = cdbpullup_expr((Expr *)item->key,
> + targetlist,
> + newvarlist,
> + newrelid);
>
> - /* Replace expr's Var nodes with new ones referencing the
> targetlist. */
> - else
> - newexpr = cdbpullup_expr((Expr *)item->key,
> - targetlist,
> - newvarlist,
> - newrelid);
> + /* Pull up RelabelType node too, unless tlist expr has right type. */
> + if (IsA(item->key, RelabelType))
> + {
> + RelabelType *oldrelabel = (RelabelType *)item->key;
>
> - /* Pull up RelabelType node too, unless tlist expr has right type.
> */
> - if (IsA(item->key, RelabelType))
> - {
> - RelabelType *oldrelabel = (RelabelType *)item->key;
> -
> - if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> - newexpr = (Expr *)makeRelabelType(newexpr,
> - oldrelabel->resulttype,
> - oldrelabel->resulttypmod,
> -
> oldrelabel->relabelformat);
> - }
> + if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> + newexpr = (Expr *)makeRelabelType(newexpr,
> + oldrelabel->resulttype,
> + oldrelabel->resulttypmod,
> + oldrelabel->relabelformat);
> }
> Insist(newexpr);
>
> Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> ===================================================================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/util/pathnode.c,v
> diff -u -N -r1.52.2.4 -r1.52.2.5
> --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44
> -0000 1.52.2.4
> +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15
> -0000 1.52.2.5
> @@ -1563,7 +1563,15 @@
> pathnode->path.rescannable = false;
> }
>
> - return pathnode;
> + /*
> + * CDB: If there is exactly one subpath, its ordering is preserved.
> + * Child rel's pathkey exprs are already expressed in terms of the
> + * columns of the parent appendrel. See find_usable_indexes().
> + */
> + if (list_length(subpaths) == 1)
> + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> +
> + return pathnode;
> }
>
> /*
>
>
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> wrote:
>
> > Anton wrote:
> >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >>>> QUERY PLAN
> >>> ----------------------------------------------------------------------------
> >>> -----------------------------
> >>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
> >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
> >>>> Sort Key: public.n_traf.date_time
> >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
> >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
> >>>> -> Seq Scan on n_traf (cost=0.00..22.30
> >>>> rows=1230 width=32)
> >>>> -> Seq Scan on n_traf_y2007m01 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >> ...
> >>>> -> Seq Scan on n_traf_y2007m12 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >>>> (18 rows)
> >>>>
> >>>> Why it no uses indexes at all?
> >>>> -------------------------------------------
> >>> I'm no expert but I'd guess that the the planner doesn't know which
> >>> partition holds the latest time so it has to read them all.
> >>
> >> Agree. But why it not uses indexes when it reading them?
> >
> > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> > below the append node. Therefore it needs to fetch all rows from all the
> > tables, and the fastest way to do that is a seq scan.
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
> http://www.postgresql.org/about/donate
>
> ------ End of Forwarded Message
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2008-03-24 17:52:37 Re: [CORE] New email list for emergency communications
Previous Message Andrew Sullivan 2008-03-24 17:44:05 Re: New email list for emergency communications

Browse pgsql-performance by date

  From Date Subject
Next Message Steve Poe 2008-03-24 20:23:08 Planning hot/live backups?
Previous Message Tom Lane 2008-03-24 16:30:11 Re: Turn correlated in subquery into join