From: | Daniele Varrazzo <daniele(dot)varrazzo(at)gmail(dot)com> |
---|---|
To: | Aryeh Leib Taurog <python(at)aryehleib(dot)com> |
Cc: | "psycopg(at)postgresql(dot)org" <psycopg(at)postgresql(dot)org> |
Subject: | Re: speed concerns with executemany() |
Date: | 2017-01-30 11:18:36 |
Message-ID: | CA+mi_8a9f0QDBskxae1WB8uLWwov8B1ZjDMwos3vDY0-YqnjQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | psycopg |
On Thu, Jan 19, 2017 at 12:23 PM, Aryeh Leib Taurog
<python(at)aryehleib(dot)com> wrote:
> On Mon, Jan 2, 2017 at 3:35 PM, Adrian Klaver <adrian(dot)klaver(at)aklaver(dot)com> wrote:
>> aklaver(at)tito:~> python psycopg_executemany.py -p 100
>> classic: 427.618795156 sec
>> joined: 7.55754685402 sec
>
> This is really interesting. I have long been using a utility I put
> together to insert using BINARY COPY. In fact I just brushed it up a
> bit and put it on PyPi: <https://pypi.python.org/pypi/pgcopy>
>
> I'm curious to run a benchmark against the improved executemany. I'd
> hoped that pgcopy would be generally useful, but it may no longer be
> necessary. A fast executemany() certainly suits more use cases.
(Sorry, mant to write this message earlier but forgot it in my drafts.)
There's always the case that a sequence of:
1) psycopg executemany as is now: separate statements
- insert into table values (...);
- insert into table values (...);
- insert into table values (...);
is slower than
2) psycopg executemany as proposed: a single statement containing
- insert into table values (...);
insert into table values (...);
insert into table values (...);
which is slower than
3) a single insert with many params. Plays well with PQexecParams
but would need some form of generation by the client
- insert into table values (...), (...), (...);
which is slower than
4) copy.
While the proposed executemany is a nice low hanging fruit it will
break on PQexecParams and it's far from being optimal anyway. Wonder
if there is a way to help users at least to have 3 without bothering
with mogrify (due to break too with the PQexecParams switch).
Brainstorming from here: expect no consistency.
Another good property of 2 is that it supports any statements: update,
delete, select (which may call a stored procedure repeatedly to
perform some data manipulation) whereas a manipulation into form 3 is
specific to inserts (which can be a starting point for a fast
update/select anyway: fast-insert into a temp table, then select or
update with a join).
fast-forward to last message:
On Sun, Jan 29, 2017 at 5:44 PM, Aryeh Leib Taurog <python(at)aryehleib(dot)com> wrote:
> Over network, with NRECS=10000 and page size=100:
> classic: 716.759769917 sec
> joined: 15.141461134 sec
> pgcopy: 3.70594096184 sec
Aryeh thank you for this benchmark. Could you please add a test like
(code untested):
def insertmany(self, sql, argslist, page_size=100):
tmpl = None
for page in paginate(argslist, page_size=page_size):
if tmpl is None and page:
tmpl = '(%s)' % ','.join([%s] * len(page[0]))
self.execute(sql % ",".join(self.mogrify(tmpl, args) for
args in page))
where sql should be an insert with a single %s placeholder, which
would be replaced by one or more arguments records, and see where we
end up? I expect somewhere between "joined" and "pgcopy" but wonder
closer to which.
-- Daniele
From | Date | Subject | |
---|---|---|---|
Next Message | Aryeh Leib Taurog | 2017-01-30 21:51:51 | Re: speed concerns with executemany() |
Previous Message | Aryeh Leib Taurog | 2017-01-29 17:44:10 | Re: speed concerns with executemany() |