ECPG FETCH readahead

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Michael Meskes <meskes(at)postgresql(dot)org>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject: ECPG FETCH readahead
Date: 2010-06-17 12:09:47
Message-ID: 4C1A108B.5080704@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

we improved ECPG quite a lot in 9.0 because we worked and
still working with an Informix to PostgreSQL migration project.

We came across a pretty big performance problem that can be seen in
every "naive" application that uses only FETCH 1, FETCH RELATIVE
or FETCH ABSOLUTE. These are almost the only FETCH variations
usable in Informix, i.e. it doesn't have the grammar for fetching N rows
at once. Instead, the Client SDK libraries do caching themselves
behind the scenes to reduce network turnaround time.

This is what we implemented for ECPG, so by default it fetches 256 rows
at once if possible and serves the application from memory. The number
of cached rows can be changed using the ECPGFETCHSZ environment
variable. The cursor readahead is activated by "ecpg -r fetch_readahead".

The implementation splits ECPGdo() and ecpg_execute() in ecpglib/execute.c
so the different parts are callable from the newly introduced cursor.c code.
Three new API calls are introduced: ECPGopen(), ECPGfetch() and
ECPGclose(). Obviously, OPEN and CLOSE use ECPGopen() and
ECPGclose(), respectively. They build and tear down the cache structures
besides calling the main ECPGdo() behind the scenes.

ECPGopen() also discovers the total number of records in the recordset,
so the previous ECPG "deficiency" (backend limitation) that sqlca.sqlerrd[2]
didn't report the (possibly estimated) number of rows in the resultset
is now
overcome. This slows down OPEN for cursors serving larger datasets
but it makes possible to position the readahead window using MOVE
ABSOLUTE no matter what FORWARD/BACKWARD/ABSOLUTE/RELATIVE
variants are used by the application. And the caching is more than
overweighs
the slowdown in OPEN it seems.

ECPGfetch() is the more interesting one, this handles FETCH and MOVE
statements and follows the absolute position of the cursor in the
client, too.

In Informix, the DECLARE statement is used for creating a cursor descriptor,
it can be OPENed/CLOSEd several times and the "FREE curname" statement
tears down the cursor descriptor. In our implementation, OPEN and CLOSE
sets up and tears down the caching structure, The DECLARE statement
didn't lose its declarative nature and the FREE statement is still only
usable
only for prepared statements. I chose this path because this way this
feature
can be used in native mode as well. It is usable even if the application
itself
uses FETCH N. The readahead window can be set externally to the
application to squeeze out more performance in batch programs.

The patch size is over 2MB because I introduced a new regression test
called fetch2.pgc that does a lot of work on a recordset having 400 rows.
It browses the recordset back and forth with:
- FETCH FORWARD 1/FETCH BACKWARD 1
- FETCH FORWARD 5/FETCH BACKWARD 5
- FETCH ABSOLUTE +N/FETCH ABSOLUTE -N
- FETCH FORWARD 3+MOVE FORWARD 7, also backwards
- FETCH RELATIVE +2/FETCH RELATIVE -2
This test is compiled both with and without "-r fetch_readahead", so
I was able to verify that the two runs produce the same output.
Also, fetch.pgc, dyntest.pgc and sqlda.pgc are also compiled with and
without "-r fetch_readahead", for verifying that both SQL and SQLDA
descriptors are working the same way as before. E.g. PGresult for
SQL descriptors are not simply assigned anymore, they are copied
using PQcopyResult() without tuples and a bunch of PQsetvalue() calls
to copy only the proper rows from the cache or all rows if no cache.

The split parts of ecpg_execute() are intentionally kept the original
wording (especially the "ecpg_execute" function name) in ecpg_log()
messages to eliminate any impact on other regression tests. If this is
not desired, a patch for this can come later.

Because of the patch size, the compressed version is attached.

Comments?

Best regards,
Zoltán Böszörményi

--
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

Attachment Content-Type Size
pg91-fetch-readahead-6-ctxdiff.patch.gz application/x-tar 70.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thom Brown 2010-06-17 12:13:05 Re: pg_dump does not honor namespaces when functions are used in index
Previous Message Robert Haas 2010-06-17 11:34:40 Re: Should the JSON datatype be a specialization of text?