Re: Proposal: efficient iter on named cursors

From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Federico Di Gregorio <federico(dot)digregorio(at)dndg(dot)it>
Cc: psycopg(at)postgresql(dot)org
Subject: Re: Proposal: efficient iter on named cursors
Date: 2011-02-15 12:22:10
Message-ID: AANLkTi=y7x32LTLA65JKm5BSD3kN0rDRq+wC6HhjAqcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Thu, Jan 13, 2011 at 4:46 PM, Federico Di Gregorio
<federico(dot)digregorio(at)dndg(dot)it> wrote:
> On 13/01/2011 17:37, Daniele Varrazzo wrote:

>> So, I'd like to modify the cursor so that in case of __iter__, a
>> certain number of record is fetched and iteration is performed on
>> them. The cursor already has the state to keep the dataset so probably
>> only the code would require change, not so much the data structures.

> I mostly agree. I'd like to see a .fetchsize parameter both on the
> module, on the connection and on the cursor (usual psycopg cascade
> initialization). A good default value, supposing rows of 10-20 columns
> with a mix of textual and numeric data, is ~2000 because it will
> probably fetch less than 100KB of data and that seems a good compromise
> between the number of rows transferred and data you'll throw away if you
> decide to leave the loop early.

The feature discussed here is included in the current betas. In
particular, in the released implementation, we are using the already
existing 'arraysize' attribute (used as default for 'fetchmany()') as
the number of records to fetch per roundtrip. But because the default
for arraysize is 1 (per DB-API), and this value basically disables the
feature, 'iter()' currently uses the proposed value of 2000 as
default.

However the ticket #41
<http://psycopg.lighthouseapp.com/projects/62710/tickets/41>, while
invalid for me as I don't see "for record in cur" as a synonym for
"fetch one record at time", has made me wonder if we are too
aggressive with the default: maybe discarding arraysize=1 is not the
best option. Albeit somebody can still use the "while True + /
fetchone() + break" pattern to force record-per-record fetching, I
think if she is careful enough to use named cursors for its task she
may also care to set an appropriate value > 1 for arraysize. Named
cursors are still easier to use, but I don't want to make people think
they can be a replacement for *all* cursors - they still require more
resources on the server, so are better used only when required.

On this reasoning, I've committed this patch
<https://github.com/dvarrazzo/psycopg/commit/1dd71947bff415f1c79a5e6b81b6ba89d717ecb9>
in a separate branch: it makes iteration respect arraysize in its
default value 1 too, and improves the documentation explaining the
complete picture. If ok, the patch will be merged in devel.

Comments?

-- Daniele

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Daniele Varrazzo 2011-02-15 13:10:30 Re: psycopg used in a ASP page fails
Previous Message Daniele Varrazzo 2011-02-15 11:52:21 Re: psycopg used in a ASP page fails