Skip site navigation (1) Skip section navigation (2)

Re: ECPG FETCH readahead

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Michael Meskes <meskes(at)postgresql(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>
Subject: Re: ECPG FETCH readahead
Date: 2012-03-29 10:59:40
Message-ID: 4F74409C.3050904@cybertec.at (view raw or flat)
Thread:
Lists: pgsql-hackers
2012-03-29 02:43 keltezéssel, Noah Misch írta:
> On Sat, Mar 24, 2012 at 10:49:07AM +0100, Boszormenyi Zoltan wrote:
>> Sorry for the delay, I had been busy with other tasks and I rewrote this code
>> to better cope with unknown result size, scrollable cursors and negative
>> cursor positions.
>>
>> I think all points raised by Noah is addressed: per-cursor readahead window size,
>> extensive comments, documentation and not enabling result set size discovery.
> The new comments are a nice improvement; thanks.
>
>> The problem with WHERE CURRENT OF is solved by a little more grammar
>> and ecpglib code, which effectively does a MOVE ABSOLUTE N before
>> executing the DML with WHERE CURRENT OF clause. No patching of the
>> backend. This way, the new ECPG caching code is compatible with older
>> servers but obviously reduces the efficiency of caching.
> Good plan.
>
>> diff -dcrpN postgresql.orig/doc/src/sgml/ecpg.sgml postgresql/doc/src/sgml/ecpg.sgml
>> *** postgresql.orig/doc/src/sgml/ecpg.sgml	2012-03-12 09:24:31.699560098 +0100
>> --- postgresql/doc/src/sgml/ecpg.sgml	2012-03-24 10:15:00.538924601 +0100
>> *************** EXEC SQL COMMIT;
>> *** 454,459 ****
>> --- 454,479 ----
>>       details.
>>      </para>
>>
>> +<para>
>> +    ECPG may use cursor readahead to improve performance of programs
>> +    that use single-row FETCH statements. Option<literal>-R number</literal>
>> +    option for ECPG modifies the default for all cursors from<literal>NO READAHEAD</literal>
> This sentence duplicates the word "option".

Fixed.

>
>> +    to<literal>READAHEAD number</literal>. Explicit<literal>READAHEAD number</literal>  or
>> +<literal>NO READAHEAD</literal>  turns cursor readahead on (with<literal>number</literal>
>> +    as the size of the readahead window) or off for the specified cursor,
>> +    respectively. For cursors using a non-0 readahead window size is 256 rows,
> The number 256 is no longer present in your implementation.

Indeed. Oversight, that part of the sentence is deleted.

>
>> +    the window size may be modified by setting the<literal>ECPGFETCHSZ</literal>
>> +    environment variable to a different value.
> I had in mind that DECLARE statements adorned with READAHEAD syntax would
> always behave precisely as written, independent of ECPGFETCHSZ or "ecpg -R".
> Unadorned DECLARE statements would use ECPGFETCHSZ if set, then the value
> passed to "ecpg -R" if provided, and finally a value of 1 (no readahead) in
> the absence of both ECPGFETCHSZ and "ecpg -R".  Did you do it differently for
> a particular reason?  I don't particularly object to what you've implemented,
> but I'd be curious to hear your thinking.

What I had in mind was:

NO READAHEAD == READAHEAD 0. This will translate into 1 tuple sized
readahead window that cannot be modified by ECPGFETCHSZ.

READAHEAD 1 also means uncached by default but ECPGFETCHSZ may
modify the readahead window size.

Values greater than 1 means cached by default with the specified window size
but it can also be overridden (even disabled) just in case. This part can be debated,
I agree. This is in add_cursor():

-----8<-----8<-----8<-----8<-----8<-----8<-----
         desc->cachable = (ra_size > 0);
         /*
          * If this cursor is allowed to be cached, the readahead
          * windows is also allowed to be overridden (possibly
          * even disabled) by default_fetch_size if it's set.
          */
         if (desc->cachable)
                 desc->ra_size = (default_fetch_size > 0 ?
                                                 default_fetch_size :
                                                 ra_size);
         else
                 desc->ra_size = 1;
-----8<-----8<-----8<-----8<-----8<-----8<-----

The check (default_fetch_size > 0) can be modified so it reads
(default_fetch_size > ra_size) so the override can only happen upwards.

This part is policy that can be debated and modified accordingly,
it doesn't affect the internals of the caching code.

>
>> +</para>
>> +
>> +<para>
>> +<command>UPDATE</command>  or<command>DELETE</command>  with the
>> +<literal>WHERE CURRENT OF</literal>  clause, cursor readahead may or may not
>> +    actually improve performance, as<command>MOVE</command>  statement has to be
>> +    sent to the backend before the DML statement to ensure correct cursor position
>> +    in the backend.
>> +</para>
> This sentence seems to be missing a word near its beginning.

Sounds like a corner case to me, but I am not a native english speaker.
Which one sounds better:

    UPDATE or DELETE with the WHERE CURRENT OF clause...

or

    UPDATE or DELETE statements with the WHERE CURRENT OF clause...
?

I went with the second. The committer can change the wording in any way
he wants.

>
>> *************** DECLARE<replaceable class="PARAMETER">c
>> *** 6639,6649 ****
>>          </listitem>
>>         </varlistentry>
>>        </variablelist>
>>
>>        <para>
>> !      For the meaning of the cursor options,
>> !      see<xref linkend="sql-declare">.
>>        </para>
>>       </refsect1>
>>
>>       <refsect1>
>> --- 6669,6728 ----
>>          </listitem>
>>         </varlistentry>
>>        </variablelist>
>> +</refsect1>
>> +
>> +<refsect1>
>> +<title>Cursor options</title>
>>
>>        <para>
>> !      For the meaning of other cursor options, see<xref linkend="sql-declare">.
>>        </para>
>> +
>> +<variablelist>
>> +<varlistentry>
>> +<term><literal>READAHEAD number</literal></term>
>> +<term><literal>NO READAHEAD</literal></term>
>> +<listitem>
>> +<para>
>> +<literal>READAHEAD number</literal>  makes the ECPG preprocessor and
>> +          runtime library use a client-side cursor accounting and data readahead
>> +          during<command>FETCH</command>. This improves performance for programs
>> +          that use single-row<command>FETCH</command>  statements.
>> +</para>
>> +
>> +<para>
>> +<literal>NO READAHEAD</literal>  disables data readahead in case
>> +<parameter>-R number</parameter>  is used for compiling the file.
>> +</para>
>> +</listitem>
>> +</varlistentry>
> One of the new sections about readahead should somehow reference the hazard
> around volatile functions.

Done.

>> +
>> +<varlistentry>
>> +<term><literal>OPEN RETURNS LAST ROW POSITION</literal></term>
>> +<term><literal>OPEN RETURNS 0 FOR LAST ROW POSITION</literal></term>
>> +<listitem>
>> +<para>
>> +          When the cursor is opened, it's possible to discover the size of the result set
>> +          using<command>MOVE ALL</command>  which traverses the result set and move
>> +          the cursor back to the beginning using<command>MOVE ABSOLUTE 0</command>.
>> +          The size of the result set is returned in sqlca.sqlerrd[2].
>> +</para>
>> +
>> +<para>
>> +          This slows down opening the cursor from the application point of view
>> +          but may also have side effects. If the cursor query contains volatile function
>> +          calls with side effects, they will be evaluated twice because of traversing
>> +          the result set this way during<command>OPEN</command>.
>> +</para>
>> +
>> +<para>
>> +          The default is not to discover.
>> +</para>
> I mildly oppose having syntax to enable this per-cursor.  Readahead is a
> generally handy feature that I might use in new programs, but this feature is
> more of a hack for compatibility with some old Informix version.  For new
> code, authors should do their own MOVEs instead of using this option.
>
> The patch also adds an undocumented ECPGOPENRETURNSRESULTSETSIZE environment
> variable to control this.  I see no use for such a thing, because a program's
> dependency on sqlerrd[2] is fixed at build time.  If a program does not
> reference sqlerrd[2] for a cursor, there's no benefit from choosing at runtime
> to populate that value anyway.  If a program does reference it, any option
> needed to have it set correctly had better be set at build time and apply to
> every run of the final program.
>
> IOW, I suggest having only the "ecpg"-time option to enable this behavior.
> Thoughts?

I wanted to make it available for non-Informix mode and a way to
disable it if the command line option enables it.

But the extra environment variable seems to be silly indeed.
I deleted that part of the code.

>> +</listitem>
>> +</varlistentry>
>> +
>> +</variablelist>
>> +
>>       </refsect1>
>>
>>       <refsect1>
>> diff -dcrpN postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml postgresql/doc/src/sgml/ref/ecpg-ref.sgml
>> *** postgresql.orig/doc/src/sgml/ref/ecpg-ref.sgml	2011-08-07 11:29:16.003256959 +0200
>> --- postgresql/doc/src/sgml/ref/ecpg-ref.sgml	2012-03-24 10:13:08.679284063 +0100
>> *************** PostgreSQL documentation
>> *** 171,176 ****
>> --- 171,196 ----
>>        </varlistentry>
>>
>>        <varlistentry>
>> +<term><option>-R<replaceable>number</replaceable></option></term>
>> +<listitem>
>> +<para>
>> +        Turn on cursor readahead by default using<replaceable>number</replaceable>
>> +        as readahead window size.
>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +
>> +<varlistentry>
>> +<term><option>--detect-cursor-resultset-size</option></term>
>> +<listitem>
>> +<para>
>> +        Detect the cursor result set size during<command>OPEN</command>  and
>> +        return it in sqlca.sqlerrd[2].
> Make reference to this option in the ecpg-sqlca section, where sqlerrd[2] has
> its primary documentation.

Done.

>> +</para>
>> +</listitem>
>> +</varlistentry>
>> +
>> +<varlistentry>
>>         <term><option>-t</option></term>
>>         <listitem>
>>          <para>
>
> I did not re-review the code changes in the same detail as before, but nothing
> caught my attention when scanning through them.  If some particular section
> has changed in a tricky way and deserves a close look, let me know.

Well, the extra comments should explain everything. The rewrite did not change
the execute.c function split, only comments were added. The changes in cursor.c
were mainly about being able to properly deal with negative cursor positions,
i.e. traversing the cursor backwards, e.g. with FETCH LAST or FETCH ABSOLUTE -N.

> The documentation-cosmetic and policy questions I raise above shouldn't entail
> large-scale patch churn, so I've marked the patch Ready for Committer.

Thanks. The patch with the above changes is attached.
Also attached the second patch from the previous mail,
in case the committer wants to consider it.

Best regards,
Zoltán Böszörményi

-- 
----------------------------------
Zoltán Böszörményi
Cybertec Schönig&  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
      http://www.postgresql.at/


Attachment: ecpg-cursor-readahead-all-cursors.patch.gz
Description: application/x-tar (19.9 KB)
Attachment: ecpg-cursor-readahead-9.2-git-v13.patch.gz
Description: application/x-tar (38.2 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Simon RiggsDate: 2012-03-29 11:06:50
Subject: Re: Standbys, txid_current_snapshot, wraparound
Previous:From: Robert HaasDate: 2012-03-29 10:57:16
Subject: Re: Command Triggers patch v18

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group