Quick Links

Re: Optimizing COPY

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Optimizing COPY
Date:	2008-11-12 16:21:43
Message-ID:	491B0297.5080903@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Chuck McDevitt wrote:
> What if the block of text is split in the middle of a multibyte character?
> I don't think it is safe to assume raw blocks always end on a character boundary.

Yeah, it's not. I realized myself after submitting. The generic approach
is to loop with pg_mblen() to find out the max. safe length. For UTF-8,
and probably many other multi-byte encodings as well, we can detect
whether a byte is the first byte of a multi-byte character, just by
looking at the few high-bits of the byte.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Re: Optimizing COPY at 2008-11-12 01:07:36 from Chuck McDevitt

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Richard Huxton	2008-11-12 16:29:34	Re: [GENERAL] Very slow queries w/ NOT IN preparation (seems like a bug, test case)
Previous Message	Tom Lane	2008-11-12 16:21:35	Re: libpq-events windows gotcha