Re: speed concerns with executemany()

From: Federico Di Gregorio <fog(at)dndg(dot)it>
To: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
Cc: "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: speed concerns with executemany()
Date: 2017-01-05 22:12:26
Message-ID: 2b88cb87-7ff6-b801-2a7c-b8e6c3f78183@dndg.it
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 05/01/17 20:00, Daniele Varrazzo wrote:
> On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog(at)dndg(dot)it> wrote:
>> On 02/01/17 17:07, Daniele Varrazzo wrote:
>>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
>>> wrote:
>>>> With NRECS=10000 and page size=100:
>>>>
>>>> aklaver(at)tito:~> python psycopg_executemany.py -p 100
>>>> classic: 427.618795156 sec
>>>> joined: 7.55754685402 sec
>>> Ugh! :D
>>
>> That's great. Just a minor point: I won't overload executemany() with this
>> feature but add a new method UNLESS the semantics are exactly the same
>> especially regarding session isolation. Also, right now psycopg keeps track
>> of the number of affected rows over executemany() calls: I'd like to not
>> lose that because it is a breaking change to the API.
> It seems to me that the semantics would stay the same, even in
> presence of volatile functions. However unfortunately rowcount would
> break. That's just sad.
>
> We can have no problem an extra argument to executemany: page_size
> defaulting to 1 (previous behaviour) which could be bumped. It's sad
> the default cannot be 100.
>
> Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491)
> that SQLAlchemy actually uses the aggregated rowcount for concurrency
> control.
>
> So, how much it is of a deal-breaker? Can we afford losing aggregated
> rowcount to obtain a juicy speedup in default usage, or we'd rather
> leave the behaviour untouched but having people "opting in for speed"?
>
> ponder, ponder...
>
> Pondered: as the features had little test and I don't want to delay
> releasing 2.7 further, I'd rather release the feature with a page_size
> default of 1. People could use it and report eventual failures if they
> use a page_size > 1. If tests turn out to be positive that the
> database behaves ok we could think about changing the default in the
> future. We may want to drop the aggregated rowcount in the future but
> with better planning, e.g. to allow SQLAlchemy to ignore aggregated
> rowcount from psycopg >= 2.8...
>
> How does it sound?

Fine for me.

federico

--
Federico Di Gregorio federico(dot)digregorio(at)dndg(dot)it
DNDG srl http://dndg.it
Purtroppo i creazionisti non si sono ancora estinti. -- vodka

In response to

Browse psycopg by date

  From Date Subject
Next Message Daniele Varrazzo 2017-01-06 12:51:34 Re: Releasing Linux binary packages of psycopg
Previous Message Adrian Klaver 2017-01-05 21:23:09 Re: speed concerns with executemany()