Re: Cleanup of syntax.sgml

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Joshua Drake <jd(at)commandprompt(dot)com>
Cc: pgsql-docs <pgsql-docs(at)postgresql(dot)org>
Subject: Re: Cleanup of syntax.sgml
Date: 2025-06-20 20:59:55
Message-ID: CAKFQuwYxK-ctDWYpT0VEQJ6Yaz+TcF0CoSiXR=szbgdraqnZ0g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-docs

On Fri, Jun 20, 2025 at 12:33 PM Joshua Drake <jd(at)commandprompt(dot)com> wrote:

> To make it more consumable.
>

Overall I'm good with the attempt to trim, and most of the changes, but
feel it tries to hard and ends up being to "matter-of-fact"; the
conjunctions that exist make reading a wall of text easier. I agree that
some of them could be removed as being more judgemental than mechanical.

Reviewing this reminds me we are inconsistent regarding "key word" vs.
"keyword".

"We advise users who to read this chapter carefully ..." ? botched surgery
on this one

Not sure I agree with removing the comment regarding "end of the input
stream".

I think I'm ok with leaving token separation unspecified here, especially
since it isn't totally accurate (at least in regards to "special character
symbol" which often are grouped together).

Why leave "(syntactically)" in parentheses? Also, you got rid of the word
"input" in SQL input above but left it here. I think leaving "SQL input
consists of..." is better.

For the examples, I would put "values" on its own line. And I would add a
delete command on the same line as the update command. Then just describe
that.

Select...;
update...; delete...;
insert ...
values ...;

I really don't like the re-wording regarding comments.

"But for the <command>UPDATE</command> command always ..." ? another
botched surgery
I'm not sure what the entire paragraph really gives the reader though,
besides a pointer to the reference chapter. It needs more pruning than
given here IMO.

I feel like if we want to enhance clarity about where we differ from the
standard that we use callouts for those items instead of burying the
information in walls of text. Like the point about accepting dollar signs
in unquoted identifiers.

- A convention often used is to write key words in upper
+ The recommened convention is to write key words in upper [recommended
needs a d]
Both should be avoided. We can say "It is the convention in this
documentation to write key words in upper case and names in lower case."
Let other places than our syntax reference speak to real-world conventions
besides ours.

Where we introduce "quoted identifiers" link to the description for the
formal syntax - then it's ok to remove discussions of minutia like
including double quotes in a quoted identifier.

punctuation:
+ Inside the quotes, Unicode characters can be specified in escaped
+ form by writing a backslash followed by the four-digit hexadecimal
+ code point number or[,] alternatively[,] a backslash followed by a
plus
+ sign [(+)] followed by a six-digit hexadecimal code point number.

I've kind of grown fond of "This slightly bizarre behavior"... ;)

+ If you can use Unicode escapes or the alternative Unicode escape
syntax,
+ explained in <xref linkend="sql-syntax-strings-uescape"/>; then the
server

Prefer the existing. This lacks commas or other ways to make it read
well. Removing "useful" judgement is probably sufficient. Or maybe try a
different approach.

I concur we should remove the discussion regarding the GUCs at this point.

Maybe also include the correct way of writing the U & 'foo' operation in
the ambiguity discussion?

"optional tag of zero or more characters" is redundant. Optional is
sufficient.

But much more concisely:
''""
A dollar-quoted string surrounds the content with user-specified tags of
the form $label$ instead of quotation marks. The label may be the empty
string. For example, here are two different ways...
"""

- used without needing to be escaped. Indeed, no characters inside
+ used without needing to be escaped. No characters inside
- Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a
+ The sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a
- <productname>PostgreSQL</productname>. But since the sequence does
not match
+ <productname>PostgreSQL</productname>. Since the sequence does not
match
Removing the word "Indeed, " isn't an improvement. I get the desire to
remove the "commentary" filler fragments but this one isn't a judgement but
a highlight and seems quite appropriate. Same goes for removing "Here" and
"But" - conjunctions are good.

"Bit-string constants is a string constant with a " plural needs "are",
not "is"

- described below. Note that any leading plus or minus sign is not
actually
+ described below. Any leading plus or minus sign is not considered
part of
"Note" is also a perfectly fine conjunction, and you haven't claimed your
fixes are to bring things in line with a style guideline, which I don't
think exists at this level of specificity.

- These are some examples of valid non-decimal integer constants:
+ Examples of valid non-decimal integer constants:
Status quo preferred.

Note, the stuff I'm not calling out does seem ok to remove in context.

A comment is removed from the input stream before further syntax
- analysis and is effectively replaced by whitespace.
+ analysis and is replaced by whitespace.

This seems repetitive with an earlier change...also, is a 20 character
comment replaced with 20 spaces? Why whitespace and not "space character"
or "nothing"?

[For example,] If you define a <quote>+</quote> operator -- this is an
example so the conjunction is valid. Though the trailing ", no matter what
yours does." seems unnecessary.

Removing legacy comment regarding 9.5 makes sense.

David J.

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2025-06-20 21:19:21 Re: Document if width_bucket's low and high are inclusive/exclusive
Previous Message Joshua Drake 2025-06-20 19:33:10 Cleanup of syntax.sgml