Re: ECPG FETCH readahead

From: Noah Misch <noah(at)leadboat(dot)com>
To: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Meskes <meskes(at)postgresql(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: ECPG FETCH readahead
Date: 2012-03-29 00:43:23
Message-ID: 20120329004323.GA17329@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote:
> Sorry for the delay, I had been busy with other tasks and I rewrote this code
> to better cope with unknown result size, scrollable cursors and negative
> cursor positions.
>
> I think all points raised by Noah is addressed: per-cursor readahead window size,
> extensive comments, documentation and not enabling result set size discovery.

The new comments are a nice improvement; thanks.

> The problem with WHERE CURRENT OF is solved by a little more grammar
> and ecpglib code, which effectively does a MOVE ABSOLUTE N before
> executing the DML with WHERE CURRENT OF clause. No patching of the
> backend. This way, the new ECPG caching code is compatible with older
> servers but obviously reduces the efficiency of caching.

Good plan.

> diff -dcrpN postgresql.orig/doc/src/sgml/ecpg.sgml postgresql/doc/src/sgml/ecpg.sgml
> *** postgresql.orig/doc/src/sgml/ecpg.sgml 2012-03-12 09:24:31.699560098 +0100
> --- postgresql/doc/src/sgml/ecpg.sgml 2012-03-24 10:15:00.538924601 +0100
> *************** EXEC SQL COMMIT;
> *** 454,459 ****
> --- 454,479 ----
> details.
> </para>
>
> + <para>
> + ECPG may use cursor readahead to improve performance of programs
> + that use single-row FETCH statements. Option <literal>-R number</literal>
> + option for ECPG modifies the default for all cursors from <literal>NO READAHEAD</literal>

This sentence duplicates the word "option".

> + to <literal>READAHEAD number</literal>. Explicit <literal>READAHEAD number</literal> or
> + <literal>NO READAHEAD</literal> turns cursor readahead on (with <literal>number</literal>
> + as the size of the readahead window) or off for the specified cursor,
> + respectively. For cursors using a non-0 readahead window size is 256 rows,

The number 256 is no longer present in your implementation.

> + the window size may be modified by setting the <literal>ECPGFETCHSZ</literal>
> + environment variable to a different value.

I had in mind that DECLARE statements adorned with READAHEAD syntax would
always behave precisely as written, independent of ECPGFETCHSZ or "ecpg -R".
Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value
passed to "ecpg -R" if provided, and finally a value of 1 (no readahead) in
the absence of both ECPGFETCHSZ and "ecpg -R". Did you do it differently for
a particular reason? I don't particularly object to what you've implemented,
but I'd be curious to hear your thinking.

> + </para>
> +
> + <para>
> + <command>UPDATE</command> or <command>DELETE</command> with the
> + <literal>WHERE CURRENT OF</literal> clause, cursor readahead may or may not
> + actually improve performance, as <command>MOVE</command> statement has to be
> + sent to the backend before the DML statement to ensure correct cursor position
> + in the backend.
> + </para>

This sentence seems to be missing a word near its beginning.

> *************** DECLARE <replaceable class="PARAMETER">c
> *** 6639,6649 ****
> </listitem>
> </varlistentry>
> </variablelist>
>
> <para>
> ! For the meaning of the cursor options,
> ! see <xref linkend="sql-declare">.
> </para>
> </refsect1>
>
> <refsect1>
> --- 6669,6728 ----
> </listitem>
> </varlistentry>
> </variablelist>
> + </refsect1>
> +
> + <refsect1>
> + <title>Cursor options</title>
>
> <para>
> ! For the meaning of other cursor options, see <xref linkend="sql-declare">.
> </para>
> +
> + <variablelist>
> + <varlistentry>
> + <term><literal>READAHEAD number</literal></term>
> + <term><literal>NO READAHEAD</literal></term>
> + <listitem>
> + <para>
> + <literal>READAHEAD number</literal> makes the ECPG preprocessor and
> + runtime library use a client-side cursor accounting and data readahead
> + during <command>FETCH</command>. This improves performance for programs
> + that use single-row <command>FETCH</command> statements.
> + </para>
> +
> + <para>
> + <literal>NO READAHEAD</literal> disables data readahead in case
> + <parameter>-R number</parameter> is used for compiling the file.
> + </para>
> + </listitem>
> + </varlistentry>

One of the new sections about readahead should somehow reference the hazard
around volatile functions.

> +
> + <varlistentry>
> + <term><literal>OPEN RETURNS LAST ROW POSITION</literal></term>
> + <term><literal>OPEN RETURNS 0 FOR LAST ROW POSITION</literal></term>
> + <listitem>
> + <para>
> + When the cursor is opened, it's possible to discover the size of the result set
> + using <command>MOVE ALL</command> which traverses the result set and move
> + the cursor back to the beginning using <command>MOVE ABSOLUTE 0</command>.
> + The size of the result set is returned in sqlca.sqlerrd[2].
> + </para>
> +
> + <para>
> + This slows down opening the cursor from the application point of view
> + but may also have side effects. If the cursor query contains volatile function
> + calls with side effects, they will be evaluated twice because of traversing
> + the result set this way during <command>OPEN</command>.
> + </para>
> +
> + <para>
> + The default is not to discover.
> + </para>

I mildly oppose having syntax to enable this per-cursor. Readahead is a
generally handy feature that I might use in new programs, but this feature is
more of a hack for compatibility with some old Informix version. For new
code, authors should do their own MOVEs instead of using this option.

The patch also adds an undocumented ECPGOPENRETURNSRESULTSETSIZE environment
variable to control this. I see no use for such a thing, because a program's
dependency on sqlerrd[2] is fixed at build time. If a program does not
reference sqlerrd[2] for a cursor, there's no benefit from choosing at runtime
to populate that value anyway. If a program does reference it, any option
needed to have it set correctly had better be set at build time and apply to
every run of the final program.

IOW, I suggest having only the "ecpg"-time option to enable this behavior.
Thoughts?

> + </listitem>
> + </varlistentry>
> +
> + </variablelist>
> +
> </refsect1>
>
> <refsect1>
> diff -dcrpN postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml postgresql/doc/src/sgml/ref/ecpg-ref.sgml
> *** postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml 2011-08-07 11:29:16.003256959 +0200
> --- postgresql/doc/src/sgml/ref/ecpg-ref.sgml 2012-03-24 10:13:08.679284063 +0100
> *************** PostgreSQL documentation
> *** 171,176 ****
> --- 171,196 ----
> </varlistentry>
>
> <varlistentry>
> + <term><option>-R <replaceable>number</replaceable></option></term>
> + <listitem>
> + <para>
> + Turn on cursor readahead by default using <replaceable>number</replaceable>
> + as readahead window size.
> + </para>
> + </listitem>
> + </varlistentry>
> +
> + <varlistentry>
> + <term><option>--detect-cursor-resultset-size</option></term>
> + <listitem>
> + <para>
> + Detect the cursor result set size during <command>OPEN</command> and
> + return it in sqlca.sqlerrd[2].

Make reference to this option in the ecpg-sqlca section, where sqlerrd[2] has
its primary documentation.

> + </para>
> + </listitem>
> + </varlistentry>
> +
> + <varlistentry>
> <term><option>-t</option></term>
> <listitem>
> <para>

I did not re-review the code changes in the same detail as before, but nothing
caught my attention when scanning through them. If some particular section
has changed in a tricky way and deserves a close look, let me know.

The documentation-cosmetic and policy questions I raise above shouldn't entail
large-scale patch churn, so I've marked the patch Ready for Committer.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-03-29 00:46:00 Re: patch for parallel pg_dump
Previous Message Joachim Wieland 2012-03-29 00:28:54 Re: patch for parallel pg_dump