Re: Moving documentation to XML

From: Alexander Lakhin <a(dot)lakhin(at)postgrespro(dot)ru>
To: "pgsql-docs(at)postgresql(dot)org" <pgsql-docs(at)postgresql(dot)org>
Subject: Re: Moving documentation to XML
Date: 2015-10-30 13:40:53
Message-ID: 56337365.2080104@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Hello, Guillaume.

We have plans to use this for russian translation, too. We translate the
docs by converting (with xml2po) the single xml to postgres-ru.po and
after translating it we convert it back to xml (we get postres-ru.xml
here). (Until now we had to perform one more conversion
(postgres-ru.xml -> set of sgml's).)
So now we can get russian html/* with:
python xml2po.py -l ru -k -p postgres-ru.po postgres.xml >postgres-ru.xml
xsltproc --stringparam pg.version '9.4.1' stylesheet.xsl postgres-ru.xml

But I had some doubts about DSSSL and XSL differences. As I noted
previously there was at least one visible difference. So I decided to
customize XSL templates to make sure that html's are generated without a
loss or corruption.
I thought that comparing two HTML sources will not work, as they are too
different, but maybe we can compare text generated from html by lynx,
for example.
So I use the following procedure to look for differences:
0. Get dsssl-generated html's
make html
1. Extract text content from html's:
for f in html/*.html; do fn=`basename $f`; echo $fn; cat $f | perl
-0pi -pe 's/<B\s*>Note:\s*<\/B\s*>/\<h3>Note<\/h3>/g' | perl -0pi -pe
's/><BLOCKQUOTE\s*CLASS="NOTE"/><div/ig' >/tmp/$fn; lynx /tmp/$fn --dump
>html-text/$fn;
* Some differences are not significant so it's not reasonable to modify
XSL templates to eliminate them. Difference in "Note" placement and
spelling is one of them, so I just filter it out.
2. Rename html to html-o and html-text to html-o-text.
3. Generate html's with XSL (use modified templates):
rm -r html; xsltproc --stringparam pg.version '9.4.1' stylesheet.xsl
postgres.xml
4. Extract text content from html's as above.
5. Make sure that two text html's are identical:
diff -s -u -b -I '^\s*_\+\s*$' html-o-text/xtypes.html html-text/xtypes.html
* Differences in whitespaces and length of "____" lines are not
significant, too.

For now, I've managed to get the same xtypes.html (I tested my XSL
customizations with it), but I think, we can eliminate other most
outstanding (or maybe all) differences likewise.
I can describe XSL customizations in more details, if needed.

Best regards,
Alexander

P.S. I couldn't post the message as a reply due to error on the
postgresql.org side.
(<pgsql-docs(at)postgresql(dot)org>: host makus.postgresql.org[174.143.35.229]
said:
550 Message headers fail syntax check (in reply to end of DATA
command))

28.10.2015 14:46, Guillaume Lelarge wrote:
>
> Le 26 oct. 2015 6:40 PM, "Alexander Lakhin" <a(dot)lakhin(at)postgrespro(dot)ru>
> a écrit :
> >
> ...
> > To make sure that result of the transformation is the same, I've
> compared original .html's with .html's generated with modified templates.
> > Unfortunately xslt generates random id's, so it's needed to exclude
> them before comparing. I do that with:
> > for f in */*.html; do sed -e
> 's/id=\"\(ftn\.\)\?id[a-z][0-9]\+\"/id=\"id\"/g' -i $f ; sed -e
> 's/href=\"[^#]*#\(ftn\.\)\?id[a-z][0-9]\+\"/href=\"#\"/g' -i $f; done
> >
> >
> > So if it's acceptable way to speed up generation of HTML (and maybe
> some other formats), what other steps should we take to move away from
> SGML?
> > If the performance is still not satisfying, please let me know, I'll
> continue to optimize xslt.
> > Beside performance issues, I can see some difference in results of
> 'make html' and 'make xslthtml'. For example, see
> doc/src/sgml/html/spi.html (xslt-generated version doesn't contain the
> lists of functions).
> >
>
> What you've done is awesome. I can't wait to test it on the french
> translation.
>
> Nice work!
>

Attachment Content-Type Size
xslt-customize.patch text/x-patch 33.5 KB

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Tom Lane 2015-10-31 03:40:23 Obsolete advocacy for E-style strings in regexp documentation
Previous Message Robert Haas 2015-10-30 10:30:17 Re: [HACKERS] max_worker_processes on the standby