Quick Links

Re: COPY FROM performance improvements

From:	"Andrew Dunstan" <andrew(at)dunslane(dot)net>
To:	<agoldshuv(at)greenplum(dot)com>
Cc:	<llonergan(at)greenplum(dot)com>, <pgsql-patches(at)postgresql(dot)org>
Subject:	Re: COPY FROM performance improvements
Date:	2005-06-25 20:46:43
Message-ID:	2697.24.211.165.134.1119732403.squirrel@www.dunslane.net
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-patches

Alon Goldshuv said:
> Andrew,
>

>> 4. We should indeed do this for CSV, especially since a lot of the
>> relevant logic for detecting attribute starts is already there for CSV
>> in CopyReadLine. I'm prepared to help you do that if necessary, since
>> I'm guilty of perpetrating that code.
>
> That will be great. My idea is to eventually keep processing for BINARY
> in one routine while text can be in another (CSV+delimited). The reason
> being is that there is a lot in common for the processing of the 2 text
> types. The CopyReadLineBuffered will have to be changed just a little
> to accommodate embedded newlines inside CSV quotes, and the
> CopyReadAttributeCSV would need to change as well.
>

That sounds right.

> Question for you -- you'll probably notice that I made the escape of
> the delimited text format (previously '\\') a constant variable
> "escapec". Reason being - maybe some people would like to choose an
> escape for delimited COPY too, or even disable it. So it's a good base
> to start with. However, I just noticed that CSV uses that exact
> variable name... Sorry about that. Any suggestion on how to rename it
> to something different?
>

I haven't seen any demand for users to be allowed to specify an escape char
in text mode (BTW, that's what the docs call what you have called delimited
mode). Maybe there's a case for it, but I somewhat doubt it.

As for variable names, choose some variant that seems reasonable. If we need
to distinguish we should use csv_escapec and text_escapec. But I'm not sure
we can't use the same variable name for both cases, unless they would have
conflicting scopes.

What I would like to have is a high level description of
. how the new text mode code differs from the old text mode code, and
. which part of the change is responsible for how much performance gain.

Maybe I have missed that in previous discussion, but this change is
sufficiently invasive that I think you owe that to the reviewers.

cheers

andrew

cheers

andrew

In response to

Re: COPY FROM performance improvements at 2005-06-25 17:17:21 from Alon Goldshuv

Responses

Re: COPY FROM performance improvements at 2005-06-25 22:48:53 from Luke Lonergan
Re: COPY FROM performance improvements at 2005-06-25 22:52:29 from Alon Goldshuv

Browse pgsql-patches by date

	From	Date	Subject
Next Message	Petr Jelínek	2005-06-25 21:02:32	limiting connections per user/database
Previous Message	Bruce Momjian	2005-06-25 20:30:38	Re: Add PG version number to NLS files