RE: [PGdocs] fix description for handling pf non-ASCII characters

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: "'Karl O(dot) Pinc'" <kop(at)karlpinc(dot)com>, 'jian he' <jian(dot)universality(at)gmail(dot)com>
Cc: Postgres hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: [PGdocs] fix description for handling pf non-ASCII characters
Date: 2023-09-26 13:40:26
Message-ID: TYAPR01MB586607D53C42E59DBB00FD9AF5C3A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Karl,

Thank you for reviewing! PSA new version.

> I see a few problems with the English and style of the patches
> and am commenting below and have signed up as a reviewer.

Your effort is quite helpful for me.

> At
> commitfest.postgresql.org I have marked the thread
> as needing author attention. Hayato, you will need
> to mark the thread as needing review when you have
> replied to this message.

Sure. I will change the status after posting the patch.

Before replying your comments, I thought I should show the difference between
versions. Regarding old versions (here PG15 was used), non-ASCIIs (like Japanese) are
replaced with '?'.

```
psql (15.4)
Type "help" for help.

postgres=# SET application_name TO 'あああ';
SET
postgres=# SHOW application_name ;
application_name
------------------
?????????
(1 row)
```

As for the HEAD, as my patch said, non-ASCIIs are replaced
with hexadecimal representations. (Were my terminologies correct?).

```
psql (17devel)
Type "help" for help.

postgres=# SET application_name TO 'あああ';
SET
postgres=# SHOW application_name ;
application_name
--------------------------------------
\xe3\x81\x82\xe3\x81\x82\xe3\x81\x82
(1 row)
```

Note that non-printable ASCIIs are also replaced with the same rule.

```
psql (15.4)
Type "help" for help.

postgres=# SET application_name TO E'\x03';
SET
postgres=# SHOW application_name ;
application_name
------------------
?
(1 row)

psql (17devel)
Type "help" for help.

postgres=# SET application_name TO E'\x03';
SET
postgres=# SHOW application_name ;
application_name
------------------
\x03
(1 row)
```

> Now, to reviewing the patch:
>
> First, it is now best practice in the PG docs to
> put a line break at the end of each sentence.
> At least for the sentences on the lines you change.
> (No need to update the whole document!) Please
> do this in the next version of your patch. I don't
> know if this is a requirement for acceptance by
> a committer, but it won't hurt.

I didn't know that. Could you please tell me if you have a source? Anyway,
I put a line break for each sentences for now.

> diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
> index e700782d3c..a4ce99ba4d 100644
> --- a/doc/src/sgml/config.sgml
> +++ b/doc/src/sgml/config.sgml
> @@ -7040,9 +7040,8 @@ local0.* /var/log/postgresql
> The name will be displayed in the
> <structname>pg_stat_activity</structname> view and included in CSV log
> entries. It can also be included in regular log entries via the <xref
> linkend="guc-log-line-prefix"/> parameter.
> - Only printable ASCII characters may be used in the
> - <varname>application_name</varname> value. Other characters
> will be
> - replaced with question marks (<literal>?</literal>).
> + Non-ASCII characters used in the
> <varname>application_name</varname>
> + will be replaced with hexadecimal strings.
> </para>
> </listitem>
> </varlistentry>
>
> Don't use the future tense to describe the system's behavior. Instead
> of "will be" write "are". (Yes, this change would be an improvement
> on the original text. We should fix it while we're working on it
> and our attention is focused.)
>
> It is more accurate, if I understand the issue, to say that characters
> are replaced with hexadecimal _representations_ of the input bytes.
> Finally, it would be good to know what representation we're talking
> about. Perhaps just give the \xhh example and say: the Postgres
> C-style escaped hexadecimal byte value. And hyperlink to
> https://www.postgresql.org/docs/16/sql-syntax-lexical.html#SQL-SYNTAX-ST
> RINGS-ESCAPE
>
> (The docbook would be, depending on text you want to link:
>
> <link linkend="sql-syntax-strings-escape">C-style escaped hexadecimal
> byte value</link>.
>
> I think. You link to id="someidvalue" attribute values.)

IIUC the word " Postgres" cannot be used in the doc.
Based on your all comments, I changed as below. How do you think?

```
Characters that are not printable ASCII, like <literal>\x03</literal>,
are replaced with the <productname>PostgreSQL</productname>
<link linkend="sql-syntax-strings-escape">C-style escaped hexadecimal byte value</link>.
```

> @@ -8037,10 +8036,9 @@ COPY postgres_log FROM
> '/full/path/to/logfile.csv' WITH csv; <para>
> The name can be any string of less
> than <symbol>NAMEDATALEN</symbol> characters (64 characters
> in
> a standard
> - build). Only printable ASCII characters may be used in the
> - <varname>cluster_name</varname> value. Other characters will be
> - replaced with question marks (<literal>?</literal>). No name
> is shown
> - if this parameter is set to the empty string
> <literal>''</literal> (which is
> + build). Non-ASCII characters used in the
> <varname>cluster_name</varname>
> + will be replaced with hexadecimal strings. No name is shown if
> this
> + parameter is set to the empty string <literal>''</literal>
> (which is the default). This parameter can only be set at server start.
> </para>
> </listitem>
>
> Same review as for the first patch hunk.

Fixed like above. You can refer the patch.

> diff --git a/doc/src/sgml/postgres-fdw.sgml
> b/doc/src/sgml/postgres-fdw.sgml index 5062d712e7..98785e87ea 100644
> --- a/doc/src/sgml/postgres-fdw.sgml
> +++ b/doc/src/sgml/postgres-fdw.sgml
> @@ -1067,9 +1067,8 @@ postgres=# SELECT postgres_fdw_disconnect_all();
> of any length and contain even non-ASCII characters. However
> when it's passed to and used as <varname>application_name</varname>
> in a foreign server, note that it will be truncated to less than
> - <symbol>NAMEDATALEN</symbol> characters and anything other than
> - printable ASCII characters will be replaced with question
> - marks (<literal>?</literal>).
> + <symbol>NAMEDATALEN</symbol> characters and non-ASCII
> characters
> will be
> + replaced with hexadecimal strings.
> See <xref linkend="guc-application-name"/> for details.
> </para>
>
> Same review as for the first patch hunk.

Fixed like above.

> Since the both of you have looked and confirmed that the
> actual behavior matches the new documentation I have not
> done this.

I showed the result again, please see.

> But, have either of you checked that we're really talking about
> replacing everything outside the 7-bit ASCII encodings?
> My reading of the commit referenced in the first email of this
> thread says that it's everything outside of the _printable_
> ASCII encodings, ASCII values outside of the range 32 to 127,
> inclusive.
>
> Please check. The docs should probably say "printable ASCII",
> or "non-printable ASCII", depending. I think the meaning
> of "printable ASCII" is widely enough known to be 32-127.
> So "printable" is good enough, right?

For me, "non-printable ASCII" sounds like control characters. So that I used the
sentence "Characters that are not printable ASCII ... are replaced with...".
Please tell me if you have better explanation?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
v3_doc_fix.patch application/octet-stream 3.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2023-09-26 13:42:32 Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower
Previous Message Peter Eisentraut 2023-09-26 13:19:54 Re: [PATCH] Add inline comments to the pg_hba_file_rules view