Re: Optimizing COPY

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Chuck McDevitt <cmcdevitt(at)greenplum(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Optimizing COPY
Date: 2008-11-12 16:21:43
Message-ID: 491B0297.5080903@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Chuck McDevitt wrote:
> What if the block of text is split in the middle of a multibyte character?
> I don't think it is safe to assume raw blocks always end on a character boundary.

Yeah, it's not. I realized myself after submitting. The generic approach
is to loop with pg_mblen() to find out the max. safe length. For UTF-8,
and probably many other multi-byte encodings as well, we can detect
whether a byte is the first byte of a multi-byte character, just by
looking at the few high-bits of the byte.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Huxton 2008-11-12 16:29:34 Re: [GENERAL] Very slow queries w/ NOT IN preparation (seems like a bug, test case)
Previous Message Tom Lane 2008-11-12 16:21:35 Re: libpq-events windows gotcha