Re: [PGdocs] fix description for handling pf non-ASCII characters

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: "Karl O(dot) Pinc" <kop(at)karlpinc(dot)com>
Cc: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: [PGdocs] fix description for handling pf non-ASCII characters
Date: 2023-09-28 01:13:40
Message-ID: CAHut+PtjaMUCJMUCJK6x9kSKC4H0zxBucRGQZ6ZAsUJkZ3jGzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 28, 2023 at 10:30 AM Karl O. Pinc <kop(at)karlpinc(dot)com> wrote:
>
> On Thu, 28 Sep 2023 09:49:03 +1000
> Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> > On Wed, Sep 27, 2023 at 11:59 PM Karl O. Pinc <kop(at)karlpinc(dot)com>
> > wrote:
> > >
> > > On Wed, 27 Sep 2023 12:58:54 +0000
> > > "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
> > >
> > > > > Should the committer be interested, your patch applies cleanly
> > > > > and the docs build as expected.
> > > >
> > > > Yeah, but cfbot accepted previous version. Did you have anything
> > > > in your mind?
> > >
> > > No. I'm letting the committer know everything I've checked
> > > so that they can decide what they want to check.
> > >
> > > > Hmm, what you said looked right. But as Peter pointed out [1], the
> > > > fix seems too much. So I attached three version of patches. How do
> > > > you think? For me, type C is best.
> > > >
> > > > A. A patch which completely follows your comments. The name is
> > > > "v3-0001-...patch". Cfbot tests it.
> > > > B. A patch which completely follows Peter's comments [1]. The
> > > > name is "Peter_v3-....txt".
> > > > C. A patch which follows both comments. Based on
> > > > b, but some comments (Don't use the future tense, "Other
> > > > characters"->"The bytes of other characters"...) were picked. The
> > > > name is "Both_v3-....txt".
> > >
> > > I also like C. Fewer words is better. So long
> > > as nothing is left unsaid fewer words make for clarity.
> > >
> > > However, in the last hunk, "of other than" does not read well.
> > > Instead of writing
> > > "and the bytes of other than printable ASCII characters"
> > > you want "and the bytes that are not printable ASCII characters".
> > > That would be my suggestion.
> > >
> >
> > I also prefer Option C, but...
> >
> > ~~~
> >
> > + <varname>application_name</varname> value.
> > + The bytes of other characters are replaced with
> > + <link linkend="sql-syntax-strings-escape">C-style escaped
> > hexadecimal
> > + byte values</link>.
> >
> > V
> >
> > + <varname>cluster_name</varname> value.
> > + The bytes of other characters are replaced with
> > + <link linkend="sql-syntax-strings-escape">C-style escaped
> > hexadecimal
> > + byte values</link>.
> >
> > V
> >
> > + <symbol>NAMEDATALEN</symbol> characters and the bytes of other
> > than
> > + printable ASCII characters are replaced with <link
> > + linkend="sql-syntax-strings-escape">C-style escaped
> > hexadecimal byte
> > + values</link>.
> >
> >
> > IIUC all of these 3 places can have exactly the same wording change
> > (e.g. like Karl's last suggestion [1]).
> >
> > SUGGESTION
> > Any bytes that are not printable ASCII characters are replaced with
> > <link linkend="sql-syntax-strings-escape">C-style escaped hexadecimal
> > byte values</link>.
>
> I don't see the utility in having exactly the same phrase everywhere,
> especially since the last hunk is modifying the end of a long
> sentence. (Apologies if I'm mis-reading what Peter wrote above.)
>
> I like short sentences. So I prefer "The bytes of other characters"
> rather than "Any bytes that are not printable ASCII characters"
> for the first 2 hunks. In context I don't see the need to repeat
> the whole "printable ASCII characters" part that appears in the
> preceding sentence of both hunks. "Other" is clear, IMHO.
>

I had in mind something like a SHIFT-JIS encoding where a single
"character" may include some trail bytes that happen to be in the
ASCII printable range. AFAIK because the new logic is processing
bytes, not characters, I thought the end result could be a mix of
escaped and unescaped bytes for the single SJIS character. In that
context, I felt "The bytes of other characters" was not quite
accurate.

But now looking at PostgreSQL-supported character sets [1] I saw SJIS
is not supported anyhow. Unfortunately, I am not familiar enough with
other encodings to know if there is still a chance of similar
printable ASCII trail bytes so I am fine with whatever wording is
chosen.

> But because I like short sentences I now think that it's a good
> idea to break the long sentence of the last hunk into two.
> Add a period and use the Peter's SUGGESTION above as the
> text for the second sentence.
>
> Is this desireable?
>

+1.

======
[1] https://www.postgresql.org/docs/current/multibyte.html

Kind Regards,
Peter Smith.
Fujitsu Australia

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-09-28 01:19:31 Re: [PGdocs] fix description for handling pf non-ASCII characters
Previous Message David Rowley 2023-09-28 01:11:06 Re: Set enable_seqscan doesn't take effect?