Re: speed concerns with executemany()

From: mike bayer <mike_mp(at)zzzcomputing(dot)com>
To: psycopg(at)postgresql(dot)org
Subject: Re: speed concerns with executemany()
Date: 2017-01-09 16:04:32
Message-ID: 52fc9715-b357-d6fe-1003-472af95c3ad3@zzzcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On 01/05/2017 02:00 PM, Daniele Varrazzo wrote:
> On Thu, Jan 5, 2017 at 5:32 PM, Federico Di Gregorio <fog(at)dndg(dot)it> wrote:
>> On 02/01/17 17:07, Daniele Varrazzo wrote:
>>>
>>> On Mon, Jan 2, 2017 at 4:35 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com>
>>> wrote:
>>>>
>>>> With NRECS=10000 and page size=100:
>>>>
>>>> aklaver(at)tito:~> python psycopg_executemany.py -p 100
>>>> classic: 427.618795156 sec
>>>> joined: 7.55754685402 sec
>>>
>>> Ugh! :D
>>
>>
>> That's great. Just a minor point: I won't overload executemany() with this
>> feature but add a new method UNLESS the semantics are exactly the same
>> especially regarding session isolation. Also, right now psycopg keeps track
>> of the number of affected rows over executemany() calls: I'd like to not
>> lose that because it is a breaking change to the API.
>
> It seems to me that the semantics would stay the same, even in
> presence of volatile functions. However unfortunately rowcount would
> break. That's just sad.
>
> We can have no problem an extra argument to executemany: page_size
> defaulting to 1 (previous behaviour) which could be bumped. It's sad
> the default cannot be 100.
>
> Mike Bayer reported (https://github.com/psycopg/psycopg2/issues/491)
> that SQLAlchemy actually uses the aggregated rowcount for concurrency
> control.
>
> So, how much it is of a deal-breaker? Can we afford losing aggregated
> rowcount to obtain a juicy speedup in default usage, or we'd rather
> leave the behaviour untouched but having people "opting in for speed"?
>
> ponder, ponder...
>
> Pondered: as the features had little test and I don't want to delay
> releasing 2.7 further, I'd rather release the feature with a page_size
> default of 1. People could use it and report eventual failures if they
> use a page_size > 1. If tests turn out to be positive that the
> database behaves ok we could think about changing the default in the
> future. We may want to drop the aggregated rowcount in the future but
> with better planning, e.g. to allow SQLAlchemy to ignore aggregated
> rowcount from psycopg >= 2.8...

SQLAlchemy can definitely ignore the aggregated rowcount as most DBAPIs
don't support it anyway, so we can flip the flag off if we know exactly
what psycopg version breaks it. The ORM in most cases prefers to use
executemany in any case unless the mapping has specified a versioning
column, in which case it has to use the method that supplies accurate
rowcount.

Ideally if we can control whether or not we get "aggreagted rowcount" or
"speed" via alternate API / flags / etc. would be nice. Seems like
SQLAlchemy will need downstream changes to support this in any case.

>
> How does it sound?
>
> -- Daniele
>
>

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Jim Nasby 2017-01-09 16:45:45 Re: speed concerns with executemany()
Previous Message Daniele Varrazzo 2017-01-06 12:51:34 Re: Releasing Linux binary packages of psycopg