Re: Benchmark Data requested

From: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Cc: Greg Smith <gsmith(at)gregsmith(dot)com>
Subject: Re: Benchmark Data requested
Date: 2008-02-06 10:29:42
Message-ID: 200802061129.44482.dfontaine@hi-media.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Le mercredi 06 février 2008, Greg Smith a écrit :
> pgloader is a great tool for a lot of things, particularly if there's any
> chance that some of your rows will get rejected. But the way things pass
> through the Python/psycopg layer made it uncompetative (more than 50%
> slowdown) against the straight COPY path from a rows/second perspective
> the last time (V2.1.0?)

I've yet to add in the psycopg wrapper Marko wrote for skytools: at the moment
I'm using the psycopg1 interface even when psycopg2 is used, and it seems the
new version has some great performance improvements. I just didn't bother
until now thinking this wouldn't affect COPY.

> I did what I thought was a fair test of it (usual
> caveat of "with the type of data I was loading"). Maybe there's been some
> gigantic improvement since then, but it's hard to beat COPY when you've
> got an API layer or two in the middle.

Did you compare to COPY or \copy? I'd expect psycopg COPY api not to be that
more costly than psql one, after all.
Where pgloader is really left behind (in term of tuples inserted per second)
compared to COPY is when it has to jiggle a lot with the data, I'd say
(reformat, reorder, add constants, etc). But I've tried to design it so that
when not configured to arrange (massage?) the data, the code path is the
simplest possible.

Do you want to test pgloader again with Marko psycopgwrapper code to see if
this helps? If yes I'll arrange to push it to CVS ASAP.

Maybe at the end of this PostgreSQL backend code will be smarter than pgloader
(wrt error handling and data massaging) and we'll be able to drop the
project, but in the meantime I'll try my best to have pgloader as fast as
possible :)
--
dim

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Dimitri Fontaine 2008-02-06 11:27:56 Re: Benchmark Data requested --- pgloader CE design ideas
Previous Message Richard Huxton 2008-02-06 10:06:47 Re: Optimizer : query rewrite and execution plan ?