Quick Links

Re: Binary COPY IN size reduction

From:	Lőrinc Pap <lorinc(at)gradle(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Binary COPY IN size reduction
Date:	2020-04-28 12:13:47
Message-ID:	CAMyrAscemUmZxKrYCDPf4HisGbMWSBvESUWpR6r__OL9_rgXFA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Thanks for the quick response, Tom!
What about implementing only the first part of my proposal, i.e. BINARY
COPY without the redundant column count & size info?
That would already be a big win - I agree the rest of the proposed changes
would only complicate the usage, but I'd argue that leaving out duplicated
info would even simplify it!

I'll give a better example this time - writing *1.8* million rows with
column types bigint, integer, smallint results in the following COPY IN
payloads:

*20.8MB* - Text protocol
*51.3MB* - Binary protocol
*25.6MB* - Binary, without column size info (proposal)

I.e. this would make the binary protocol almost as small as the text one
(which isn't an unreasonable expectation, I think), while making it easier
to use at the same time.

Thanks for your time,
Lőrinc

On Fri, Apr 24, 2020 at 4:19 PM Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> =?UTF-8?Q?L=C5=91rinc_Pap?= <lorinc(at)gradle(dot)com> writes:
> > We've switched recently from TEXT based COPY to the BINARY one.
> > We've noticed a slight performance increase, mostly because we don't need
> > to escape the content anymore.
> > Unfortunately the binary protocol's output ended up being slightly bigger
> > than the text one (e.g. for one payload it's *373MB* now, was *356MB*
> before)
> > ...
> > By skipping the column count and sizes for every row, in our example this
> > change would reduce the payload to *332MB* (most of our payload is
> binary,
> > lightweight structures consisting of numbers only could see a >*2x*
> > decrease in size).
>
> TBH, that amount of gain does not seem to be worth the enormous
> compatibility costs of introducing a new COPY data format. What you
> propose also makes the format a great deal less robust (readers are
> less able to detect errors), which has other costs. I'd vote no.
>
> regards, tom lane
>

--
Lőrinc Pap
Senior Software Engineer
<https://gradle.com/>

In response to

Re: Binary COPY IN size reduction at 2020-04-24 14:19:23 from Tom Lane

Responses

Re: Binary COPY IN size reduction at 2020-04-28 14:41:15 from Stephen Frost

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Robert Haas	2020-04-28 12:18:59	Re: More efficient RI checks - take 2
Previous Message	Andreas Karlsson	2020-04-28 12:10:51	Re: Raw device on PostgreSQL