Re: multi-line copy (was: Re: COPY Hacks)

From: Mischa Sandberg <mischa(dot)sandberg(at)telus(dot)net>
To: Greg Sabino Mullane <greg(at)turnstep(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: multi-line copy (was: Re: COPY Hacks)
Date: 2005-04-08 04:53:22
Message-ID: 1112936002.42560e420adf7@webmail.telus.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


Quoting Greg Sabino Mullane <greg(at)turnstep(dot)com>:

> > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> > I accidentally strung together several \n-terminated input lines,
> > and sent them to the server with a single "putline".
> ...
> > So. Is it a feechur? Worth stress-testing? Could be VERY cool.
>
> As explained elsewhere, not really a feature, more of a side-effect.
> Keep in mind, however, that any network round-trip time saved has to
> be balanced against some additional overhead of constructing the
> combined strings in Perl before sending them over. Most times COPY
> is used to parse a newline-separated file anyway. If you have a slow
> network connection to the database, it *might* be a win, but my
> limited testing shows that it is not an advantage for a "normal"
> connection: I added 1 million rows via COPY using the normal way
> (1 million pg_putline calls), via pg_putline of 1000 rows at a
> time, and via 10,000 rows at a time. They all ran in 22 seconds,
> with no statistical difference between them. (This was the "real" time,
> the system time was actually much lower for the combined calls).
>
> It can't hurt to test things out on your particular system and see
> if it makes a real difference: it certainly does no harm as long as
> you make sure the string you send always *end* in a newline.

Many thanks for digging into it.

For the app I'm working with, the time delay between rows being posted
is /just/ enough to exceed the TCP Nagle delay, so every row goes across
in its own packet :-( Reducing the number of network roundtrips
by a factor of 40 is enough to cut elapsed time in half.
The cost of join("",@FortyRows), which produces a 1-4K string, is what's
negligible in this case.

--
"Dreams come true, not free" -- S.Sondheim, ITW

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Qingqing Zhou 2005-04-08 05:36:40 Re: Compressing WAL
Previous Message Greg Sabino Mullane 2005-04-08 02:03:18 Re: COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)