| From: | Henson Choi <assam258(at)gmail(dot)com> |
|---|---|
| To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Subject: | Re: Experimenting with wider Unicode storage |
| Date: | 2026-04-16 01:23:32 |
| Message-ID: | CAAAe_zANMo3o280YU96Nt=JK=mq=PfygvgT1GnG=7Wuh+Es1GQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Thomas,
Thank you for sharing this very interesting and creative approach.
Encoding is indeed a crucial factor in capacity planning and
performance benchmarking — I find this direction quite compelling.
I'm currently working on a few other things, so my responses may not
always be quick, but I wanted to let you know I'm genuinely
interested in following this work.
As it happens, I'm currently collaborating with Ishii-san — who, as
you know, is one of the original architects of multibyte/CJK support
in PostgreSQL — on Row Pattern Recognition; that might also be a
thread worth keeping an eye on.
It also strikes me that this is a topic worth considering in the
context of the rapid growth of SNS and AI-generated data. The
pervasive use of emoji — which cannot be represented in legacy
encodings like EUC-KR at all — is in fact accelerating the migration
toward Unicode in Korea and other Asian markets. This makes the
storage efficiency of Unicode for CJK characters an increasingly
practical concern, not just a theoretical one.
I'd like to take some time to analyze the current situation around
character encoding in Korea — where both EUC-KR legacy systems and
UTF-8 coexist in complex ways — review the patches you've attached,
and then share some thoughts and feedback.
Best regards,
Henson
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Chao Li | 2026-04-16 01:35:39 | Re: Fix a server crash problem from pg_get_database_ddl |
| Previous Message | Chao Li | 2026-04-16 01:22:03 | Re: pg_overexplain produces invalid JSON with RANGE_TABLE option |