Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>
Cc: Luke Lonergan <llonergan(at)greenplum(dot)com>, Anton <anton200(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1
Date: 2008-03-12 19:16:39
Message-ID: 200803121916.m2CJGdQ24439@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Heikki Linnakangas wrote:
> Bruce, would you please add this to the 8.4 patch queue so we remember
> to look at this later?

OK, added to queue, but Tom's patch queue comment is:

This is useless since it does not represent a complete patch; the
provided code calls a lot of Greenplum-private routines that weren't
provided. It's not even reviewable let alone a candidate to apply.

---------------------------------------------------------------------------

>
> It didn't occur to me that we can do that in the degenerate case when
> there's just a single node below the Append. A more general solution
> would be to check if the pathkeys of all the child nodes match, and do a
> "merge append" similar to a merge join.
>
> Luke Lonergan wrote:
> > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> > it?
> >
> > This enables the use of index scan of a child table by recognizing sort
> > order of the append node. Kurt Harriman did the work.
> >
> > - Luke
> >
> > Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> > ===================================================================
> > RCS file:
> > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> > er/path/indxpath.c,v
> > diff -u -N -r1.22 -r1.22.2.1
> > --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21
> > -0000 1.22
> > +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15
> > -0000 1.22.2.1
> > @@ -379,8 +379,51 @@
> > index_pathkeys = build_index_pathkeys(root, index,
> > ForwardScanDirection,
> > true);
> > - useful_pathkeys = truncate_useless_pathkeys(root, rel,
> > - index_pathkeys);
> > + /*
> > + * CDB: For appendrel child, pathkeys contain Var nodes in
> > terms
> > + * of the child's baserel. Transform the pathkey list to refer
> > to
> > + * columns of the appendrel.
> > + */
> > + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> > + {
> > + AppendRelInfo *appinfo = NULL;
> > + RelOptInfo *appendrel = NULL;
> > + ListCell *appcell;
> > + CdbPathLocus notalocus;
> > +
> > + /* Find the appendrel of which this baserel is a child. */
> > + foreach(appcell, root->append_rel_list)
> > + {
> > + appinfo = (AppendRelInfo *)lfirst(appcell);
> > + if (appinfo->child_relid == rel->relid)
> > + break;
> > + }
> > + Assert(appinfo);
> > + appendrel = find_base_rel(root, appinfo->parent_relid);
> > +
> > + /*
> > + * The pathkey list happens to have the same format as the
> > + * partitioning key of a Hashed locus, so by disguising it
> > + * we can use cdbpathlocus_pull_above_projection() to do
> > the
> > + * transformation.
> > + */
> > + CdbPathLocus_MakeHashed(&notalocus, index_pathkeys);
> > + notalocus =
> > + cdbpathlocus_pull_above_projection(root,
> > + notalocus,
> > + rel->relids,
> > + rel->reltargetlist,
> > +
> > appendrel->reltargetlist,
> > + appendrel->relid);
> > + if (CdbPathLocus_IsHashed(notalocus))
> > + index_pathkeys = truncate_useless_pathkeys(root,
> > appendrel,
> > +
> > notalocus.partkey);
> > + else
> > + index_pathkeys = NULL;
> > + }
> > +
> > + useful_pathkeys = truncate_useless_pathkeys(root, rel,
> > + index_pathkeys);
> > }
> > else
> > useful_pathkeys = NIL;
> > Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> > ===================================================================
> > RCS file:
> > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> > er/path/pathkeys.c,v
> > diff -u -N -r1.18 -r1.18.2.1
> > --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07
> > -0000 1.18
> > +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15
> > -0000 1.18.2.1
> > @@ -1403,55 +1403,53 @@
> > {
> > PathKeyItem *item;
> > Expr *newexpr;
> > + AttrNumber targetindex;
> >
> > Assert(pathkey);
> >
> > - /* Use constant expr if available. Will be at head of list. */
> > - if (CdbPathkeyEqualsConstant(pathkey))
> > + /* Find an expr that we can rewrite to use the projected columns. */
> > + item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> > + relids,
> > + targetlist,
> > + &targetindex); // OUT
> > +
> > + /* If not found, see if the equiv class contains a constant expr. */
> > + if (!item &&
> > + CdbPathkeyEqualsConstant(pathkey))
> > {
> > item = (PathKeyItem *)linitial(pathkey);
> > newexpr = (Expr *)copyObject(item->key);
> > }
> >
> > - /* New vars for old! */
> > - else
> > - {
> > - AttrNumber targetindex;
> > + /* Fail if no usable expr. */
> > + else if (!item)
> > + return NULL;
> >
> > - /* Find an expr that we can rewrite to use the projected columns.
> > */
> > - item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> > - relids,
> > - targetlist,
> > - &targetindex); // OUT
> > - if (!item)
> > - return NULL;
> > + /* If found matching targetlist item, make a Var that references it. */
> > + else if (targetindex > 0)
> > + newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> > + targetindex,
> > + newvarlist,
> > + (Expr *)item->key);
> >
> > - /* If found matching targetlist item, make a Var that references
> > it. */
> > - if (targetindex > 0)
> > - newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> > - targetindex,
> > - newvarlist,
> > - (Expr *)item->key);
> > + /* Replace expr's Var nodes with new ones referencing the targetlist.
> > */
> > + else
> > + newexpr = cdbpullup_expr((Expr *)item->key,
> > + targetlist,
> > + newvarlist,
> > + newrelid);
> >
> > - /* Replace expr's Var nodes with new ones referencing the
> > targetlist. */
> > - else
> > - newexpr = cdbpullup_expr((Expr *)item->key,
> > - targetlist,
> > - newvarlist,
> > - newrelid);
> > + /* Pull up RelabelType node too, unless tlist expr has right type. */
> > + if (IsA(item->key, RelabelType))
> > + {
> > + RelabelType *oldrelabel = (RelabelType *)item->key;
> >
> > - /* Pull up RelabelType node too, unless tlist expr has right type.
> > */
> > - if (IsA(item->key, RelabelType))
> > - {
> > - RelabelType *oldrelabel = (RelabelType *)item->key;
> > -
> > - if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> > - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> > - newexpr = (Expr *)makeRelabelType(newexpr,
> > - oldrelabel->resulttype,
> > - oldrelabel->resulttypmod,
> > -
> > oldrelabel->relabelformat);
> > - }
> > + if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> > + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> > + newexpr = (Expr *)makeRelabelType(newexpr,
> > + oldrelabel->resulttype,
> > + oldrelabel->resulttypmod,
> > + oldrelabel->relabelformat);
> > }
> > Insist(newexpr);
> >
> > Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> > ===================================================================
> > RCS file:
> > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> > er/util/pathnode.c,v
> > diff -u -N -r1.52.2.4 -r1.52.2.5
> > --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44
> > -0000 1.52.2.4
> > +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15
> > -0000 1.52.2.5
> > @@ -1563,7 +1563,15 @@
> > pathnode->path.rescannable = false;
> > }
> >
> > - return pathnode;
> > + /*
> > + * CDB: If there is exactly one subpath, its ordering is preserved.
> > + * Child rel's pathkey exprs are already expressed in terms of the
> > + * columns of the parent appendrel. See find_usable_indexes().
> > + */
> > + if (list_length(subpaths) == 1)
> > + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> > +
> > + return pathnode;
> > }
> >
> > /*
> >
> >
> > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> wrote:
> >
> >> Anton wrote:
> >>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >>>>> QUERY PLAN
> >>>> ----------------------------------------------------------------------------
> >>>> -----------------------------
> >>>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
> >>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
> >>>>> Sort Key: public.n_traf.date_time
> >>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
> >>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
> >>>>> -> Seq Scan on n_traf (cost=0.00..22.30
> >>>>> rows=1230 width=32)
> >>>>> -> Seq Scan on n_traf_y2007m01 n_traf
> >>>>> (cost=0.00..22.30 rows=1230 width=32)
> >>> ...
> >>>>> -> Seq Scan on n_traf_y2007m12 n_traf
> >>>>> (cost=0.00..22.30 rows=1230 width=32)
> >>>>> (18 rows)
> >>>>>
> >>>>> Why it no uses indexes at all?
> >>>>> -------------------------------------------
> >>>> I'm no expert but I'd guess that the the planner doesn't know which
> >>>> partition holds the latest time so it has to read them all.
> >>> Agree. But why it not uses indexes when it reading them?
> >> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> >> below the append node. Therefore it needs to fetch all rows from all the
> >> tables, and the fastest way to do that is a seq scan.
> >
> >
>
>
> --
> Heikki Linnakangas
> EnterpriseDB http://www.enterprisedb.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo(at)postgresql(dot)org so that your
> message can get through to the mailing list cleanly

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://postgres.enterprisedb.com

+ If your life is a hard drive, Christ can be your backup. +

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dave Page 2008-03-12 19:20:01 Re: [HACKERS] Move the developers wiki?
Previous Message Tom Lane 2008-03-12 19:09:20 Re: [HACKERS] Move the developers wiki?

Browse pgsql-performance by date

  From Date Subject
Next Message Mark Lewis 2008-03-12 19:23:41 Re: Hardware question for a DB server
Previous Message Pascal Cohen 2008-03-12 18:58:51 Hardware question for a DB server