Re: Issues with \copy from file

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Sigurgeir Gunnarsson <sgunnars(at)gmail(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Issues with \copy from file
Date: 2009-12-18 15:23:01
Message-ID: 603c8f070912180723g766e6810o2bbfb5a1f0f928f3@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Fri, Dec 18, 2009 at 7:46 AM, Sigurgeir Gunnarsson
<sgunnars(at)gmail(dot)com> wrote:
> I hope the issue is still open though I haven't replied to it before.
>
> Euler mentioned that I did not provide any details about my system. I'm
> using version 8.3 and with most settings default on an old machine with 2 GB
> of mem. The table definition is simple, four columns; id, value, x, y where
> id is primary key and x, y are combined into an index.
>
> I'm not sure if it matters but unlike Euler's suggestion I'm using \copy
> instead of COPY. Regarding my comparison to MySQL, it is completely valid.
> This is done on the same computer, using the same disk on the same platform.
> From that I would derive that IO is not my problem, unless postgresql is
> doing IO twice while MySQL only once.
>
> I guess my tables are InnoDB since that is the default type (or so I think).
> BEGIN/COMMIT I did not find change much. Are there any other suggestions ?

Did you read Matthew Wakeling's reply? Arranging to skip WAL will
help a lot here. To do that, you need to either create or truncate
the table in the same transaction that does the COPY.

The problem with the MySQL comparison is that it's not really
relevant. It isn't that the PostgreSQL code just sucks and if we
wrote it properly it would be as fast as MySQL. If that were the
case, everyone would be up in arms, and it would have been fixed long
ago. Rather, the problem is almost certainly that it's not an
apples-to-apples comparison. MySQL is probably doing something
different, such as perhaps not properly arranging for recovery if the
system goes down in the middle of the copy, or just after it
completes. But I don't know MySQL well enough to know exactly what
the difference is, and I'm not particularly interested in spending a
lot of time figuring it out. I think you'll get that reaction from
others on this list as well, but of course that's up to them.
Everybody here is a volunteer, of course, and generally our interest
is principally PostgreSQL.

On the other hand, we can certainly give you lots of information about
what PostgreSQL is doing and why that takes the amount of time that it
does, or give you information on how you can find out more about what
it's doing.

...Robert

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Robert Haas 2009-12-18 15:23:29 Re: Automatic optimization of IN clauses via INNER JOIN
Previous Message Grzegorz Jaśkiewicz 2009-12-18 14:24:00 Re: Automatic optimization of IN clauses via INNER JOIN