Re: Perform COPY FROM encoding conversions in larger chunks

From: John Naylor <john(dot)naylor(at)enterprisedb(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Perform COPY FROM encoding conversions in larger chunks
Date: 2021-01-27 23:23:38
Message-ID: CAFBsxsEvXTy0UAfPB4dQbQa+7a9tfkSs3=ZMFVsqhNqd9ZzDdQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Heikki,

0001 through 0003 are straightforward, and I think they can be committed
now if you like.

0004 is also pretty straightforward. The check you proposed upthread for
pg_upgrade seems like the best solution to make that workable. I'll take a
look at 0005 soon.

I measured the conversions that were rewritten in 0003, and there is indeed
a noticeable speedup:

Big5 to EUC-TW:

head 196ms
0001-3 152ms

EUC-TW to Big5:

head 190ms
0001-3 144ms

I've attached the driver function for reference. Example use:

select drive_conversion(
1000, 'euc_tw'::name, 'big5'::name,
convert('a few kB of utf8 text here', 'utf8', 'euc_tw')
);

I took a look at the test suite also, and the only thing to note is a
couple places where the comment doesn't match the code:

+ -- JIS X 0201: 2-byte encoded chars starting with 0x8e (SS2)
+ byte1 = hex('0e');
+ for byte2 in hex('a1')..hex('df') loop
+ return next b(byte1, byte2);
+ end loop;
+
+ -- JIS X 0212: 3-byte encoded chars, starting with 0x8f (SS3)
+ byte1 = hex('0f');
+ for byte2 in hex('a1')..hex('fe') loop
+ for byte3 in hex('a1')..hex('fe') loop
+ return next b(byte1, byte2, byte3);
+ end loop;
+ end loop;

Not sure if it matters , but thought I'd mention it anyway.

--
John Naylor
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
drive_conversion.c application/octet-stream 2.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2021-01-28 00:32:38 Re: [HACKERS] GSoC 2017: Foreign Key Arrays
Previous Message Peter Geoghegan 2021-01-27 23:19:13 Re: vacuum_cost_page_miss default value and modern hardware