Quick Links

Re: COPY FROM performance improvements

From:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
To:	<llonergan(at)greenplum(dot)com>
Cc:	<agoldshuv(at)greenplum(dot)com>, <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-25 10:45:13
Message-ID:	1688.24.211.165.134.1119696313.squirrel@www.dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

Luke Lonergan said:
> I've attached Alon's patch ported to the CVS trunk. It applies cleanly
> and passes the regressions. With fsync=false it is 40% faster loading
> a sample dataset with 15 columns of varied type. It's 19% faster with
> fsync=true.
>
> This patch separates the CopyFrom code into two pieces, the new logic
> for delimited data and the existing logic for CSV and Binary.
>

A few of quick comments - I will probably have many more later when I have
time to review this in depth.

1. Postgres does context diffs for patches, not unidiffs.

2. This comment raises a flag in my mind:

+ * each attribute begins. If a specific attribute is not used for this
+ * COPY command (ommitted from the column list), a value of 0 will be
assigned.+ * For example: for table foo(a,b,c,d,e) and COPY foo(a,b,e)
+ * attr_offsets may look something like this after this routine
+ * returns: [0,20,0,0,55]. That means that column "a" value starts
+ * at byte offset 0, "b" in 20 and "e" in 55, in attr_bytebuf.

Would it not be better to mark missing attributes with something that can't
be a valid offset, like -1?

3. This comment needs improving:

+/*
+ * Copy FROM file to relation with faster processing.
+ */

4. We should indeed do this for CSV, especially since a lot of the relevant
logic for detecting attribute starts is already there for CSV in
CopyReadLine. I'm prepared to help you do that if necessary, since I'm
guilty of perpetrating that code.

cheers

andrew

In response to

Re: COPY FROM performance improvements at 2005-06-25 08:20:14 from Luke Lonergan

Responses

Re: COPY FROM performance improvements at 2005-06-25 16:34:43 from Luke Lonergan
Re: COPY FROM performance improvements at 2005-06-25 17:17:21 from Alon Goldshuv

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Andrew Dunstan	2005-06-25 12:41:49	Re: plperl features
Previous Message	Peter Eisentraut	2005-06-25 09:29:19	Re: Add PG version number to NLS files