Re: split func.sgml to separated individual sgml files

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: split func.sgml to separated individual sgml files
Date: 2025-06-24 03:34:56
Message-ID: CACJufxH8=BL98wAqcx-xf-fiCzw7-NRfsQT7RdAPrBTb5=-kZw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 20, 2025 at 10:16 AM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> In short, ready to commit (see last paragraph below however), but the committer will need to run the python script at the time of commit on the then-current tree.
>

hi.
more explanation, since the python script seems quite large...

each <sect1 id="functions-XXX"> in doc/src/sgml/func.sgml
corresponds to each individual section in [1].

each <sect1 id="functions-XXX"> within func.sgml is unique.
if you try to rename it, having two <sect1 id="functions-logical">
will error out saying something like:
../../Desktop/pg_src/src6/postgres/doc/src/sgml/postgres.sgml:199:
element sect1: validity error : ID functions-logical already defined
see [2] also.

Based on this, we can use the literal string <sect1 id="functions-XXX"> to
perform pattern matching and identify the line numbers that mark the start and
end of each <sect1> section.

The polished v2 python script use the following steps for splitting func.sgml
into several pieces:

0. For each 9.X section listed in [1], create an empty SGML file to hold the
corresponding content.

1. Use the pattern <sect1 id="functions-XXX"> to locate the starting and ending
line number of each section in func.sgml

2. Copy func.sgml all the content block (<sect1>)

<sect1 id="functions-XXX">
...main content
</sect1>

into the newly created SGML files.

3. Remove the copied content from func.sgml.
4. In func.sgml, insert general entity references [3] to include the newly
created SGML files.

because PG18, and PG17, Chapter 9. Functions and Operators
have the same amount of section (31),

so v1-0001-split_func_sgml.py will work just fine.
but I did some minor changes, therefore v2 attached.

----------------------------------------------------
I used the sed --in-place option [3] to modify and truncate the original large
func.sgml file directly.
I also used the -n and -p options with sed to extract lines from func.sgml
between line X and line Y, as shown in reference [4].

for the attach file:
first run ``python3 v2-0001-split_func_sgml.py``
then run ``git apply v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot``
(`git am` won't work, need to use `git apply`).

[1] https://www.postgresql.org/docs/current/functions.html
[2] https://en.wikipedia.org/wiki/Document_type_definition
[3] https://www.gnu.org/software/sed/manual/html_node/Command_002dLine-Options.html#index-_002di
[4] https://www.gnu.org/software/sed/manual/html_node/Common-Commands.html#index-n-_0028next_002dline_0029

Attachment Content-Type Size
v2-0001-update-filelist.sgml-allfiles.sgml.no-cfbot application/octet-stream 3.4 KB
v2-0001-split_func_sgml.py text/x-python 23.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2025-06-24 03:49:41 Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Previous Message John Naylor 2025-06-24 03:34:12 Re: Improve CRC32C performance on SSE4.2