Scrollable cursors and Sort performance

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Scrollable cursors and Sort performance
Date: 2006-02-10 13:32:44
Message-ID: 1139578364.1258.469.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I'm interested in the behaviour of ExecSort, which *for all queries*
prepares the sort result for randomAccess, even when part of a plan that
will *never* go backwards/rewind etc.

A recent performance test shows this output from mid-way through a heap
sort with trace_sort=on (the query itself is not relevant here)

> LOG: 00000: finished writing final run 65 to tape 64: CPU
57.50s/484.51u sec elapsed 597.46 sec
> LOG: 00000: finished merge step: CPU 107.90s/653.53u sec elapsed
941.83 sec

Which shows that the *unnecessary* final merge takes 344 secs, adding
approximately 60% to the elapsed time of the query and nearly doubling
the CPU requirement.

[Aside: you'll notice the above test was performed with my recent sort
improvement patch applied, but the behaviour of ExecSort is identical in
both cases. However in the current cvstip case, you simply don't notice
the extra expense of the request for randomAccess because of the
additional time taken by the sort]

So, why does the planner think random access is required? Well, only for
use in queries; CREATE INDEX for example does not force this.

To allow support for CURSORs, of course.

>From the code, we never call ExecSort with a direction other than
Forward unless we are issuing a FETCH with a direction other than one
identified internally as FETCH_FORWARD. According to the SQL standard,
that can only happen when a scrollable cursor has been declared using
DECLARE ... SCROLL. The current PostgreSQL manual says the following:

FETCH: "The cursor should be declared with the SCROLL option if one
intends to use any variants of FETCH other than FETCH NEXT or FETCH
FORWARD with a positive count. For simple queries PostgreSQL will allow
backwards fetch from cursors not declared with SCROLL, but this behavior
is best not relied on. If the cursor is declared with NO SCROLL, no
backward fetches are allowed."

DECLARE: "The SCROLL option should be specified when defining a cursor
that will be used to fetch backwards. This is required by the SQL
standard. However, for compatibility with earlier versions, PostgreSQL
will allow backward fetches without SCROLL, if the cursor's query plan
is simple enough that no extra overhead is needed to support it.
However, application developers are advised not to rely on using
backward fetches from a cursor that has not been created with SCROLL. If
NO SCROLL is specified, then backward fetches are disallowed in any
case."

The current behaviour is to plan every query as if it would allow
backwards scans, then immediately disallow backwards scans *if* it fails
the "no extra overhead" test later on in the Declare Cursor processing.
[portalcmds.c:PerformCursorOpen()]. (i.e. you pay, but get no benefit)

My suggestion is that the backwards-compatible behaviour of allowing
backwards/absolute FETCHes *without* a specific SCROLL command be
deprecated in the next release, so that the default is *disallow*. We've
warned people and now its time to turn it off by default. (This would be
re-enabled using default_cursor_scroll = on). If that is not acceptable,
then we should re-evaluate the idea that sorts *always* allow backward
scans [execAmi.c:ExecSupportsBackwardScan()], replacing this either with
*never* or some kind of query cost test (but that seems much less
preferable). Materialize already provides the infrastructure required to
do this.

This will then allow us to use the firm knowledge that a plan will only
ever be scanned in a Forwards direction at plan time. ...and that will
allow us to optimize out the rather large step taken during Sort to
freeze the result unnecessarily for randomAccess. This will then give a
good perfomance gain for larger joins and aggregations, neither of which
would ever allow backwards scans using them current method anyway.

I intend to add a short patch to pass down the cursor state during
planning, so that when it is explicitly specified the sort node is able
to recognise this and avoid work. Also, to add a GUC to force the
not-explicitly-specified case to be the same as the NO SCROLL case, as
the standard requires.

Comments?

Best Regards, Simon Riggs

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2006-02-10 13:39:52 Re: TODO-Item: B-tree fillfactor control
Previous Message anonymus.crux 2006-02-10 13:23:46 Re: Compiling UDF DLL under Win32

Browse pgsql-patches by date

  From Date Subject
Next Message Simon Riggs 2006-02-10 13:39:52 Re: TODO-Item: B-tree fillfactor control
Previous Message ITAGAKI Takahiro 2006-02-10 10:12:48 Re: TODO-Item: B-tree fillfactor control