Re: speed concerns with executemany()

From: Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com>
To: Aryeh Leib Taurog <python(at)aryehleib(dot)com>
Cc: "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org>
Subject: Re: speed concerns with executemany()
Date: 2017-01-30 11:18:36
Message-ID: CA+mi_8a9f0QDBskxae1WB8uLWwov8B1ZjDMwos3vDY0-YqnjQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: psycopg

On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog
<python(at)aryehleib(dot)com> wrote:
> On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:

>> aklaver(at)tito:~> python psycopg_executemany.py -p 100
>> classic: 427.618795156 sec
>> joined: 7.55754685402 sec
>
> This is really interesting. I have long been using a utility I put
> together to insert using BINARY COPY. In fact I just brushed it up a
> bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy>
>
> I'm curious to run a benchmark against the improved executemany. I'd
> hoped that pgcopy would be generally useful, but it may no longer be
> necessary. A fast executemany() certainly suits more use cases.

(Sorry, mant to write this message earlier but forgot it in my drafts.)

There's always the case that a sequence of:

1) psycopg executemany as is now: separate statements
- insert into table values (...);
- insert into table values (...);
- insert into table values (...);

is slower than

2) psycopg executemany as proposed: a single statement containing
- insert into table values (...);
insert into table values (...);
insert into table values (...);

which is slower than

3) a single insert with many params. Plays well with PQexecParams
but would need some form of generation by the client
- insert into table values (...), (...), (...);

which is slower than

4) copy.

While the proposed executemany is a nice low hanging fruit it will
break on PQexecParams and it's far from being optimal anyway. Wonder
if there is a way to help users at least to have 3 without bothering
with mogrify (due to break too with the PQexecParams switch).

Brainstorming from here: expect no consistency.

Another good property of 2 is that it supports any statements: update,
delete, select (which may call a stored procedure repeatedly to
perform some data manipulation) whereas a manipulation into form 3 is
specific to inserts (which can be a starting point for a fast
update/select anyway: fast-insert into a temp table, then select or
update with a join).

fast-forward to last message:

On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python(at)aryehleib(dot)com> wrote:

> Over network, with NRECS=10000 and page size=100:
> classic: 716.759769917 sec
> joined: 15.141461134 sec
> pgcopy: 3.70594096184 sec

Aryeh thank you for this benchmark. Could you please add a test like
(code untested):

def insertmany(self, sql, argslist, page_size=100):
tmpl = None
for page in paginate(argslist, page_size=page_size):
if tmpl is None and page:
tmpl = '(%s)' % ','.join([%s] * len(page[0]))
self.execute(sql % ",".join(self.mogrify(tmpl, args) for
args in page))

where sql should be an insert with a single %s placeholder, which
would be replaced by one or more arguments records, and see where we
end up? I expect somewhere between "joined" and "pgcopy" but wonder
closer to which.

-- Daniele

In response to

Responses

Browse psycopg by date

  From Date Subject
Next Message Aryeh Leib Taurog 2017-01-30 21:51:51 Re: speed concerns with executemany()
Previous Message Aryeh Leib Taurog 2017-01-29 17:44:10 Re: speed concerns with executemany()