Re: COPY FROM performance improvements

From: "Luke Lonergan" <llonergan(at)greenplum(dot)com>
To: "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, "Alon Goldshuv" <agoldshuv(at)greenplum(dot)com>, pgsql-patches(at)postgresql(dot)org
Subject: Re: COPY FROM performance improvements
Date: 2005-08-10 17:05:18
Message-ID: BF1F81DE.C26A%llonergan@greenplum.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Alvaro,

On 8/10/05 9:46 AM, "Alvaro Herrera" <alvherre(at)alvh(dot)no-ip(dot)org> wrote:

> AFAIR he never claimed otherwise ... his point was that to gain that
> additional speedup, the code has to be made considerable "worse" (in
> maintenability terms.) Have you (or Alon) tried to port the rest of the
> speed improvement to the new code? Maybe it's possible to have at least
> some of it without worsening the maintenability too badly.

As I suggested previously, there is another, more maintainable way to get
more performance from the parsing logic.

It involves replacing something like this:

============================
char c = input_routine()
if (c == '\n') {
else if (
.
.
.
}
============================

With something like this:

============================
char [32] carr;

nread = Input_routine_new(carr,32)

for (i=0; i<nread; i++) {
if (carr[I] == '\n') {
.
.
.
}
============================

And this section would run much faster (3x?).

This is what I think could make the overall patch 50% faster than it is now
(on the parsing part).

The issue that I expect we'll hear about is that since the parsing is
already 500% faster, it has vanished in the profile. That's why Tom's
testing is not showing much difference between his and Alon's code, we
actually drop the other sections to bring it forward where we see the bigger
difference.

However, what I'm arguing here and elsewhere is that there's still a lot
more of this kind of optimization to be done. 12 MB/s COPY speed is not
enough. There's 40% of the time in processing left to smack down.

> Another question that comes to mind is: have you tried another compiler?
> I see you are all using GCC at most 3.4; maybe the new optimizing
> infrastructure in GCC 4.1 means you can have most of the speedup without
> uglifying the code. What about Intel's compiler?

We have routinely distributed PostgreSQL with the Intel compiler, up until
recently. Interestingly, GCC now beats it handily in our tests on Opteron
and matches it on Xeon, which is too bad - it's my fav compiler.

The problem with this code is that it doesn't have enough micro-parallelism
without loops on the character parsing core. The compiler can only do
register optimizations and branch prediction (poorly) unless it is given
more to work with.

>> PostgreSQL needs major improvement to compete with Oracle and even MySQL on
>> speed. No whacking on the head is going to change that.
>
> Certainly. I think the point is what cost do we want to pay for the
> speedup. I think we all agree that even if we gain a 200% speedup by
> rewriting COPY in assembly, it's simply not acceptable.

Understood, and I totally agree.

> Another point may be that Bizgres can have a custom patch for the extra
> speedup, without inflicting the maintenance cost on the community.

We are committed to making Postgres the best DBMS for Business Intelligence.
Bizgres makes it safe for businesses to rely on open source for their
production uses. As far as features go, I think the best way for our
customers is to make sure that Bizgres features are supporting the
PostgreSQL core and vis-versa.

- Luke

In response to

Browse pgsql-patches by date

  From Date Subject
Next Message Alvaro Herrera 2005-08-10 17:06:54 Re: COPY FROM performance improvements
Previous Message Bruce Momjian 2005-08-10 16:57:18 Re: COPY FROM performance improvements