Re: [PGdocs] fix description for handling pf non-ASCII characters

From: "Karl O(dot) Pinc" <kop(at)karlpinc(dot)com>
To: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: 'jian he' <jian(dot)universality(at)gmail(dot)com>, Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: [PGdocs] fix description for handling pf non-ASCII characters
Date: 2023-09-26 17:45:53
Message-ID: 20230926124553.6884c912@slate.karlpinc.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 26 Sep 2023 13:40:26 +0000
"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> wrote:

> Your effort is quite helpful for me.

You're welcome.

> Before replying your comments, I thought I should show the difference
> between versions. Regarding old versions (here PG15 was used),
> non-ASCIIs (like Japanese) are replaced with '?'.
>
> ```
> psql (15.4)
> Type "help" for help.
>
> postgres=# SET application_name TO 'あああ';
> SET
> postgres=# SHOW application_name ;
> application_name
> ------------------
> ?????????
> (1 row)
> ```
>
> As for the HEAD, as my patch said, non-ASCIIs are replaced
> with hexadecimal representations. (Were my terminologies correct?).
>
> ```
> psql (17devel)
> Type "help" for help.
>
> postgres=# SET application_name TO 'あああ';
> SET
> postgres=# SHOW application_name ;
> application_name
> --------------------------------------
> \xe3\x81\x82\xe3\x81\x82\xe3\x81\x82
> (1 row)
> ```

I think you're terminology is correct, but see my
suggestion at bottom.

Never hurts to have output to look at. I noticed here
and when reading the patch that changed the output --
the patch is operating on bytes, not characters per-se.

> > First, it is now best practice in the PG docs to
> > put a line break at the end of each sentence.
> > At least for the sentences on the lines you change.
> > (No need to update the whole document!) Please
> > do this in the next version of your patch. I don't
> > know if this is a requirement for acceptance by
> > a committer, but it won't hurt.
>
> I didn't know that. Could you please tell me if you have a source?

I thought I could find an email but the search is taking
forever. If I find something I'll let you know.
I could even be mis-remembering, but it's a nice
practice regardless.

There are a number of undocumented conventions.
Another is the use of gender neutral language.

The place to discuss doc conventions/styles would
be the pgsql-docs list. (If you felt like
asking there.)

You could try submitting another patch to add various
doc conventions to the official docs at
https://www.postgresql.org/docs/current/docguide-style.html
:-)

> Anyway, I put a line break for each sentences for now.

Thanks.

A related thing that's nice to have is to limit the line
length of the documentation source to 80 characters or less.
79 is probably best. Since the source text around your patch
conforms to this convention you should also.

> IIUC the word " Postgres" cannot be used in the doc.

I think you're right.

> Based on your all comments, I changed as below. How do you think?
>
> ```
> Characters that are not printable ASCII, like
> <literal>\x03</literal>, are replaced with the
> <productname>PostgreSQL</productname> <link
> linkend="sql-syntax-strings-escape">C-style escaped hexadecimal byte
> value</link>. ```

>
> > Since the both of you have looked and confirmed that the
> > actual behavior matches the new documentation I have not
> > done this.
>
> I showed the result again, please see.

Should the committer be interested, your patch applies cleanly
and the docs build as expected.

Also, based on the comments in the
patch which changed the system's behavior, I believe that
your patch updates all the relevant places in the documentation.

> > But, have either of you checked that we're really talking about
> > replacing everything outside the 7-bit ASCII encodings?
> > My reading of the commit referenced in the first email of this
> > thread says that it's everything outside of the _printable_
> > ASCII encodings, ASCII values outside of the range 32 to 127,
> > inclusive.
> >
> > Please check. The docs should probably say "printable ASCII",
> > or "non-printable ASCII", depending. I think the meaning
> > of "printable ASCII" is widely enough known to be 32-127.
> > So "printable" is good enough, right?
>
> For me, "non-printable ASCII" sounds like control characters. So that
> I used the sentence "Characters that are not printable ASCII ... are
> replaced with...". Please tell me if you have better explanation?

Your explanation sounds good to me.

I now think that you should consider another change to your wording.
Instead of starting with "Characters that are not printable ASCII ..."
consider writing "The bytes of the string which are not printable ASCII
...". Notice above that characters (e.g. あ) generate output for
each non-ASCII byte (e.g. \xe3\x81\x82). So my thought is that
the docs should be talking about bytes.

For the last hunk you'd change around "anything". Write:
"... it will be truncated to less than NAMEDATALEN characters and
the bytes of the string which are not printable ASCII characters ...".

Notice that I have also changed "that" to "which" just above.
I _think_ this is better English. See sense 3 of:
https://en.wiktionary.org/wiki/which
See also the first paragraph of:
https://en.wikipedia.org/wiki/Relative_pronoun

If the comments above move you, send another patch. Seems to me
we're close to sending this on to a committer.

Regards,

Karl <kop(at)karlpinc(dot)com>
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2023-09-26 17:49:32 Re: Eager page freeze criteria clarification
Previous Message Ranier Vilela 2023-09-26 17:10:04 Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)