COPY FROM WITH HEADER skips a tuple every 4 billion tuples

From: David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: COPY FROM WITH HEADER skips a tuple every 4 billion tuples
Date: 2018-05-22 14:41:34
Message-ID: CAKJS1f88yh-6wwEfO6QLEEvH3BEugOq2QX1TOja0vCauoynmOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'd been looking over the COPY FROM code tonight when I saw something
pretty scary looking:

/* on input just throw the header line away */
if (cstate->cur_lineno == 0 && cstate->header_line)
{
cstate->cur_lineno++;
if (CopyReadLine(cstate))
return false; /* done */
}

while it might not look too scary by itself, it gets a bit more so
when you learn that the cur_lineno is only 32 bits wide. This will
result in skipping a tuple every time the 32-bit variable wraps back
around to 0 again.

Maybe when this code was written copying > 4 billion rows was just a
far-off dream, but with today's hardware, it really didn't take that
long to see this actually happen for real.

postgres=# create unlogged table t(a int);
CREATE TABLE
Time: 1.339 ms
postgres=# insert into t select 0 from generate_series(1, 4300000000);
INSERT 0 4300000000
Time: 2128367.019 ms (35:28.367)
postgres=# copy t to '/home/ubuntu/t.csv' with (format csv, header);
COPY 4300000000
Time: 2294331.494 ms (38:14.331)
postgres=# truncate t;
TRUNCATE TABLE
Time: 30.367 ms
postgres=# copy t from '/home/ubuntu/t.csv' with (format csv, header);
COPY 4299999999
Time: 2693186.475 ms (44:53.186)

A patch to fix is attached.

(I just made the variable 64-bit)

--
David Rowley http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

Attachment Content-Type Size
0001-Fix-COPY-FROM-not-to-skip-a-tuple-every-2-32-tuples.patch application/octet-stream 4.7 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2018-05-22 15:11:01 Re: PG11 jit failing on ppc64el
Previous Message Christoph Berg 2018-05-22 14:33:57 PG11 jit failing on ppc64el