Supporting SJIS as a database encoding

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Supporting SJIS as a database encoding
Date: 2016-09-05 07:23:21
Message-ID: 0A3221C70F24FB45833433255569204D1F5E5F9D@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers


I'd like to propose adding SJIS as a database encoding. You may wonder why SJIS is still necessary in the world of Unicode. The purpose is to achieve comparable performance when migrating legacy database systems from other DBMSs without little modification of applications.

Recently, we failed to migrate some customer's legacy database from DBMS-X to PostgreSQL. That customer wished for PostgreSQL, but PostgreSQL couldn't meet the performance requirement.

The system uses DBMS-X with the database character set being SJIS. The main applications are written in embedded SQL, which require SJIS in their host variables. They insisted they cannot use UTF8 for the host variables because that would require large modification of applications due to character handling. So no character set conversion is necessary between the clients and the server.

On the other hand, PostgreSQL doesn't support SJIS as a database encoding. Therefore, character set conversion from UTF-8 to SJIS has to be performed. The batch application runs millions of SELECTS each of which retrieves more than 100 columns. And many of those columns are of character type.

If PostgreSQL supports SJIS, PostgreSQL will match or outperform the performance of DBMS-X with regard to the applications. We confirmed it by using psql to run a subset of the batch processing. When the client encoding is SJIS, one FETCH of 10,000 rows took about 500ms. When the client encoding is UTF8 (the same as the database encoding), the same FETCH took 270ms.

Supporting SJIS may somewhat regain attention to PostgreSQL here in Japan, in the context of database migration. BTW, MySQL supports SJIS as a database encoding. PostgreSQL used to be the most popular open source database in Japan, but MySQL is now more popular.

But what I'm wondering is why PostgreSQL doesn't support SJIS. Was there any technical difficulty? Is there anything you are worried about if adding SJIS?

I'd like to write a patch for adding SJIS if there's no strong objection. I'd appreciate it if you could let me know good design information to add a server encoding (e.g. the URL of the most recent patch to add a new server encoding)

Takayuki Tsunakawa


Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2016-09-05 07:38:54 Re: Supporting SJIS as a database encoding
Previous Message Christian Ullrich 2016-09-05 07:19:36 Re: Parallel build with MSVC