Re: [PGdocs] fix description for handling pf non-ASCII characters

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Peter Smith <smithpb2250(at)gmail(dot)com>
Cc: "Karl O(dot) Pinc" <kop(at)karlpinc(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PGdocs] fix description for handling pf non-ASCII characters
Date: 2023-09-28 01:19:31
Message-ID: 803569.1695863971@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Smith <smithpb2250(at)gmail(dot)com> writes:
> I had in mind something like a SHIFT-JIS encoding where a single
> "character" may include some trail bytes that happen to be in the
> ASCII printable range. AFAIK because the new logic is processing
> bytes, not characters, I thought the end result could be a mix of
> escaped and unescaped bytes for the single SJIS character.

It will not, because ...

> But now looking at PostgreSQL-supported character sets [1] I saw SJIS
> is not supported anyhow. Unfortunately, I am not familiar enough with
> other encodings to know if there is still a chance of similar
> printable ASCII trail bytes so I am fine with whatever wording is
> chosen.

... trailing bytes that could be mistaken for ASCII are precisely
the property that causes us to reject an encoding as not backend-safe.
So this code doesn't need to consider that hazard, and processing the
string byte-by-byte is perfectly OK.

I'd be inclined to keep the text as simple as possible and not focus on
the distinction between bytes and characters.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2023-09-28 01:19:32 Re: Eager page freeze criteria clarification
Previous Message Peter Smith 2023-09-28 01:13:40 Re: [PGdocs] fix description for handling pf non-ASCII characters