Re: Supporting SJIS as a database encoding

From: "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>
To: 'Tom Lane' <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: 'Tatsuo Ishii' <ishii(at)sraoss(dot)co(dot)jp>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Supporting SJIS as a database encoding
Date: 2016-09-06 02:51:13
Message-ID: 0A3221C70F24FB45833433255569204D1F5E6615@G01JPEXMBYT05
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
> "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com> writes:
> > Before digging into the problem, could you share your impression on
> > whether PostgreSQL can support SJIS? Would it be hopeless?
>
> I think it's pretty much hopeless. Even if we were willing to make every
> bit of code that looks for '\' and other specific at-risk characters
> multi-byte aware (with attendant speed penalties), we could expect that
> third-party extensions would still contain vulnerable code. More, we could
> expect that new bugs of the same ilk would get introduced all the time.
> Many such bugs would amount to security problems. So the amount of effort
> and vigilance required seems out of proportion to the benefits.

Hmm, this sounds like a death sentence. But as I don't have good knowledge of character set handling yet, I'm not completely convinced about why PostgreSQL cannot support SJIS. I wonder why and how other DBMSs support SJIS and what's the difference of the implementation. Using multibyte-functions like mb... to process characters would solve the problem? Isn't the current implementation blocking the support of other character sets that have similar characteristics? I'll learn the character set handling...

> Most of the recent discussion about allowed backend encodings has run more
> in the other direction, ie, "why don't we disallow everything but
> UTF8 and get rid of all the infrastructure for multiple backend encodings?".
> I'm not personally in favor of that, but there are very few hackers who
> want to add any more overhead in this area.

Personally, I totally agree. I want non-Unicode character sets to disappear from the world. But the real business doesn't seem to forgive the lack of SJIS...

Regards
Takayuki Tsunakawa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-09-06 02:53:03 Re: Cache Hash Index meta page.
Previous Message Amit Kapila 2016-09-06 02:49:19 Re: Speed up Clog Access by increasing CLOG buffers