
# Migration of PG's documentation from DocBook 4.5 to DocBook 5.2

The migration from DocBook 4.x to 5.2 is a huge step that changes most
of PG's sgml files. DocBook supports the migration with some scripts,
see: https://docbook.org/docs/howto/howto.html.  

But PG's documentation doesn't meet all prerequisites to utilize DocBook's
scripts directly. One of them, db4-upgrade.xsl, is slightly modified (see
comments starting with 'jup'). There are some bash, Perl, and sed commands
to solve generic and individual problems. This is very specific work.
To be able to perform such changes at any point in time, all changes are
done within scripts.

The scripts are developed for PG version 13 up to 17. 'func.sgml' of 
version 12 is a mess.


## Major DocBook changes

- Discontinuation of a DOCTYPE declaration. Instead, there is an XML conforming
  **namespace** which uniquely identifies DocBook tags.
- Discontinuation of DTDs (and XSD schema) for XML-validation. Instead, the
  validation is done against a RELAX NG schema.
- Some tag names change (see: https://docbook.org/docs/howto/howto#changes-renamed)
  in order to adopt the XML conventions and standards, others are
  removed (see: https://docbook.org/docs/howto/howto#changes-removed).
  The content model of some tags is narrowed down and defined more precise.
- Some examples (see: https://docbook.org/docs/howto/howto#changes):
```
'id' is now: 'xml:id'

replace 'ulink' by 'link'
# DocBook 4
<ulink url="https://docbook.org">DocBook site</ulink>'
# DocBook 5 external URI (similar to HTML anchor 'href')
<link xlink:href="https://docbook.org">DocBook site</link>'
# DocBook 5 internal reference (with 'linkend' attribute)
<link linkend="pg_wal">Write-Ahead-Log</link>'   or the empty element <xref linkend="pg_wal"/>

# in DocBook 5 ALL elements can directly use 'linkend':
# DocBook 4
<link linkend='dir'><command>DIR</command></link>
# can be changed in DocBook 5 to:
<command linkend='dir'>DIR</command>
```

## Migration steps

The migration is steered by conv.sh. The script uses 3 directories: Scripts and
other necessary migration files are located in **$ToolDir**, the existing sgml files
are located in **$FromSgmlDir**, the migrated ones are in **$ToSgmlDir**. 
1. Preparation: The git tree of the complete PG source gets copied to a different
   place (**$FromSgmlDir** is one part of it). Hence, we can use 'diff'
   after any intermediate step to check the changes so far.
2. Migration:
   2.1 All changes are done in **$ToSgmlDir**.
   2.2 Perform some general modifications on every sgml file to make all of them
       XML conform.
   2.3 Perform individual changes on some sgml files (doRealChanges.sh).
   2.4 Perform the standard DocBook migration 4.x -> 5.x.
   2.5 Revert the standard modications done at 2.2.
3. Validation: Perform validation against the RELAX NG schema. This is done with
   Jing because the error messages delivered by xmllint are not helpful.
4. Check results by comparing old/new sgml and html files via diff.


## Introduction of a new tool

In the past, we used the tool **xmllint** to validate the sgml files against the DocBook
DTD. This worked well. Also, its validation against a RELAX NG schema works well as far
as no schema-violation occurs. But if the RELAX NG schema is violated by a sgml file,
the resulting error messages are more confusing than helpful.

Therefore, we should consider to introduce another validator. During the migration phase,
we have used **jing** (20181222+dfsg2-6). It's Java, it's fast, the error messages are
very precise. But there are many others: https://relaxng.org/#validators. Should we
switch completly to Jing for validation (Jing is not able to produce postgres-full.xml)?


### Installation of **jing** on Ubuntu:
```
sudo apt install jing
sudo apt install libavalon-framework-java  # (... possibly more)
export JAVA_HOME="....."                   # adopt to your situation
export JAVA_CMD="$JAVA_HOME"
```


## Problems

HTML single and multiple pages: Looks ok. But, do they produce REALLY identical output?

Raw man pages: They contain additional linebreaks. Does it matter?

Postgres.txt: Identical with the exception of a few whitespaces.

pdf: The generation shows unacceptable bad runtime behavior.
  An intentionally reduced postgres.sgml file (up to
  about 100 pages of output) creates the expected pdf file.
  The pdf-problem seems to result from the xslt processing
  in 'http://docbook.sourceforge.net/release/xsl/current/fo/docbook.xsl'
  which shall produce the 'fo' file.

epub: I'm not able to produce epub for DocBook 4.5 as well for
      DocBook 5.2 files.


## ToDo

- Adoption of doc/src/sgml/Makefile
- Adoption of Appendix J: Documentation
- Adoption of README.link


## Forecast

Entities: We use **character entities** (e.g.: \&mdash;) as well as **parameter entities**
(e.g.: %filelist;). The use of character entities instead of hex-values or direct
Unicode-values is helpful because it improves the readability of the source for authors.
The use of parameter entities can - theoretically - be replaced by the more XML-conform
XInclude mechanism. But this isn't possible without major changes in most files:
 - Every xml/sgml-file must be XML conform, especially it needs a single root element.
 - In every xml/sgml-file we must re-declare namespace(s). The reason is that parameter
   entities perform a plain text substitution whereas xi:include creates trees and combines
   them. During the combination of such subtrees namespaces get - intentionally -
   not inherited. In every file only its own namespaces are known.


