FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "PostgreSQL-development" <pgsql-hackers(at)postgresql(dot)org>
Subject: FW: was [PERFORM] partitioned table and ORDER BY indexed_field DESC LIMIT 1
Date: 2007-08-25 16:56:42
Message-ID: C2F5AD5A.3CB86%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Below is a patch against Greenplum Database that fixes the problem.

- Luke

------ Forwarded Message
From: Luke Lonergan <llonergan(at)greenplum(dot)com>
Date: Fri, 24 Aug 2007 09:25:53 -0700
To: Heikki Linnakangas <heikki(at)enterprisedb(dot)com>, Anton <anton200(at)gmail(dot)com>
Cc: <pgsql-performance(at)postgresql(dot)org>
Conversation: [PERFORM] partitioned table and ORDER BY indexed_field DESC
LIMIT 1
Subject: Re: [PERFORM] partitioned table and ORDER BY indexed_field DESC
LIMIT 1

Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
it?

This enables the use of index scan of a child table by recognizing sort
order of the append node. Kurt Harriman did the work.

- Luke

Index: cdb-pg/src/backend/optimizer/path/indxpath.c
===================================================================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/indxpath.c,v
diff -u -N -r1.22 -r1.22.2.1
--- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21
-0000 1.22
+++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15
-0000 1.22.2.1
@@ -379,8 +379,51 @@
index_pathkeys = build_index_pathkeys(root, index,
ForwardScanDirection,
true);
- useful_pathkeys = truncate_useless_pathkeys(root, rel,
- index_pathkeys);
+ /*
+ * CDB: For appendrel child, pathkeys contain Var nodes in
terms
+ * of the child's baserel. Transform the pathkey list to refer
to
+ * columns of the appendrel.
+ */
+ if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+ {
+ AppendRelInfo *appinfo = NULL;
+ RelOptInfo *appendrel = NULL;
+ ListCell *appcell;
+ CdbPathLocus notalocus;
+
+ /* Find the appendrel of which this baserel is a child. */
+ foreach(appcell, root->append_rel_list)
+ {
+ appinfo = (AppendRelInfo *)lfirst(appcell);
+ if (appinfo->child_relid == rel->relid)
+ break;
+ }
+ Assert(appinfo);
+ appendrel = find_base_rel(root, appinfo->parent_relid);
+
+ /*
+ * The pathkey list happens to have the same format as the
+ * partitioning key of a Hashed locus, so by disguising it
+ * we can use cdbpathlocus_pull_above_projection() to do
the
+ * transformation.
+ */
+ CdbPathLocus_MakeHashed(&notalocus, index_pathkeys);
+ notalocus =
+ cdbpathlocus_pull_above_projection(root,
+ notalocus,
+ rel->relids,
+ rel->reltargetlist,
+
appendrel->reltargetlist,
+ appendrel->relid);
+ if (CdbPathLocus_IsHashed(notalocus))
+ index_pathkeys = truncate_useless_pathkeys(root,
appendrel,
+
notalocus.partkey);
+ else
+ index_pathkeys = NULL;
+ }
+
+ useful_pathkeys = truncate_useless_pathkeys(root, rel,
+ index_pathkeys);
}
else
useful_pathkeys = NIL;
Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
===================================================================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/pathkeys.c,v
diff -u -N -r1.18 -r1.18.2.1
--- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07
-0000 1.18
+++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15
-0000 1.18.2.1
@@ -1403,55 +1403,53 @@
{
PathKeyItem *item;
Expr *newexpr;
+ AttrNumber targetindex;

Assert(pathkey);

- /* Use constant expr if available. Will be at head of list. */
- if (CdbPathkeyEqualsConstant(pathkey))
+ /* Find an expr that we can rewrite to use the projected columns. */
+ item = cdbpullup_findPathKeyItemInTargetList(pathkey,
+ relids,
+ targetlist,
+ &targetindex); // OUT
+
+ /* If not found, see if the equiv class contains a constant expr. */
+ if (!item &&
+ CdbPathkeyEqualsConstant(pathkey))
{
item = (PathKeyItem *)linitial(pathkey);
newexpr = (Expr *)copyObject(item->key);
}

- /* New vars for old! */
- else
- {
- AttrNumber targetindex;
+ /* Fail if no usable expr. */
+ else if (!item)
+ return NULL;

- /* Find an expr that we can rewrite to use the projected columns.
*/
- item = cdbpullup_findPathKeyItemInTargetList(pathkey,
- relids,
- targetlist,
- &targetindex); // OUT
- if (!item)
- return NULL;
+ /* If found matching targetlist item, make a Var that references it. */
+ else if (targetindex > 0)
+ newexpr = (Expr *)cdbpullup_makeVar(newrelid,
+ targetindex,
+ newvarlist,
+ (Expr *)item->key);

- /* If found matching targetlist item, make a Var that references
it. */
- if (targetindex > 0)
- newexpr = (Expr *)cdbpullup_makeVar(newrelid,
- targetindex,
- newvarlist,
- (Expr *)item->key);
+ /* Replace expr's Var nodes with new ones referencing the targetlist.
*/
+ else
+ newexpr = cdbpullup_expr((Expr *)item->key,
+ targetlist,
+ newvarlist,
+ newrelid);

- /* Replace expr's Var nodes with new ones referencing the
targetlist. */
- else
- newexpr = cdbpullup_expr((Expr *)item->key,
- targetlist,
- newvarlist,
- newrelid);
+ /* Pull up RelabelType node too, unless tlist expr has right type. */
+ if (IsA(item->key, RelabelType))
+ {
+ RelabelType *oldrelabel = (RelabelType *)item->key;

- /* Pull up RelabelType node too, unless tlist expr has right type.
*/
- if (IsA(item->key, RelabelType))
- {
- RelabelType *oldrelabel = (RelabelType *)item->key;
-
- if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
- oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
- newexpr = (Expr *)makeRelabelType(newexpr,
- oldrelabel->resulttype,
- oldrelabel->resulttypmod,
-
oldrelabel->relabelformat);
- }
+ if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
+ oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
+ newexpr = (Expr *)makeRelabelType(newexpr,
+ oldrelabel->resulttype,
+ oldrelabel->resulttypmod,
+ oldrelabel->relabelformat);
}
Insist(newexpr);

Index: cdb-pg/src/backend/optimizer/util/pathnode.c
===================================================================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/util/pathnode.c,v
diff -u -N -r1.52.2.4 -r1.52.2.5
--- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44
-0000 1.52.2.4
+++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15
-0000 1.52.2.5
@@ -1563,7 +1563,15 @@
pathnode->path.rescannable = false;
}

- return pathnode;
+ /*
+ * CDB: If there is exactly one subpath, its ordering is preserved.
+ * Child rel's pathkey exprs are already expressed in terms of the
+ * columns of the parent appendrel. See find_usable_indexes().
+ */
+ if (list_length(subpaths) == 1)
+ pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
+
+ return pathnode;
}

/*

On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> wrote:

> Anton wrote:
>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>> QUERY PLAN
>>> ----------------------------------------------------------------------------
>>> -----------------------------
>>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
>>>> Sort Key: public.n_traf.date_time
>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
>>>> -> Seq Scan on n_traf (cost=0.00..22.30
>>>> rows=1230 width=32)
>>>> -> Seq Scan on n_traf_y2007m01 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>> ...
>>>> -> Seq Scan on n_traf_y2007m12 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>> (18 rows)
>>>>
>>>> Why it no uses indexes at all?
>>>> -------------------------------------------
>>> I'm no expert but I'd guess that the the planner doesn't know which
>>> partition holds the latest time so it has to read them all.
>>
>> Agree. But why it not uses indexes when it reading them?
>
> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> below the append node. Therefore it needs to fetch all rows from all the
> tables, and the fastest way to do that is a seq scan.

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

------ End of Forwarded Message

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dawid Kuroczko 2007-08-25 17:22:13 LDAP service lookup
Previous Message Edoardo 2007-08-25 07:50:05 Re: simple replication

Browse pgsql-performance by date

  From Date Subject
Next Message Benjamin Arai 2007-08-25 18:12:54 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views
Previous Message Joshua D. Drake 2007-08-25 16:18:18 Re: [PERFORM] Partioning tsearch2 a table into chunks and accessing via views