psql \copy from sends a lot of packets

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: psql \copy from sends a lot of packets
Date: 2021-02-06 22:13:38
Message-ID: 40b2cec0-d0fb-3191-2ae1-9a3fe16a7e48@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I just noticed that if you load a file using psql:

\copy <table> from <local file>

it sends every line as a separate FE/BE protocol CopyData packet. That's
pretty wasteful if the lines are narrow. The overhead of each CopyData
packet is 5 bytes.

To demonstrate, I generated a simple test file with the string "foobar"
repeated 10 million times:

$ perl -le 'for (1..10000000) { print "foobar" }' > /tmp/testdata

and loaded that into a temp table with psql:

create temporary table copytest (t text) on commit delete rows;
\copy copytest from '/tmp/testdata';

I repeated and timed the \copy a few times; it takes about about 3
seconds on my laptop:

postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 3039.625 ms (00:03.040)

Wireshark says that that involved about 120 MB of network traffic. The
size of the file on disk is only 70 MB.

The attached patch modifies psql so that it buffers up 8 kB of data into
each CopyData message, instead of sending one per line. That makes the
operation faster:

postgres=# \copy copytest from '/tmp/testdata';
COPY 10000000
Time: 2490.268 ms (00:02.490)

And wireshark confirms that there's now only a bit over 70 MB of network
traffic.

I'll add this to the next commitfest. There's similar inefficiency in
the server side in COPY TO, but I'll leave that for another patch.

- Heikki

Attachment Content-Type Size
0001-In-psql-copy-from-send-data-to-server-in-larger-chun.patch text/x-patch 4.4 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-02-06 22:23:33 Re: psql \copy from sends a lot of packets
Previous Message Tomas Vondra 2021-02-06 21:39:53 Re: Preserve attstattarget on REINDEX CONCURRENTLY