Re: Uh-oh: documentation PDF output no longer builds in HEAD

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Uh-oh: documentation PDF output no longer builds in HEAD
Date: 2015-11-10 16:59:00
Message-ID: CABUevEzehq_Aoo2UKiDPSx6sGabK8p7hO+AfA3syCW35AioYUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 10, 2015 at 1:46 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> I wrote:
> > Curiously though, that gets us down to this:
> > 30615 strings out of 245828
> > 397721 string characters out of 1810780
> > which implies that indeed FlowObjectSetup *is* the cause of most of
> > the strings being entered. I'm not sure how that squares with the
> > observation that there are less than 5000 \pagelabel entries in the
> > postgres-US.aux file. Time for more digging.
>
> Well, after much digging, I've found what seems a workable answer.
> It turns out that the original form of FlowObjectSetup is just
> unbelievably awful when it comes to handling of hyperlink anchors:
> it will put a hyperlink anchor into the PDF for every "flow object",
> that is, everything in the document that could possibly have a link
> to it, whether or not it actually is linked to. And aside from bloating
> the PDF file, it turns out that the hyperlink stuff also consumes some
> control sequence names, which is why we're running out of strings.
>
> There already is logic (probably way older than the hyperlink code)
> in jadetex to avoid generating page-number labels for objects that have
> no cross-references. So what I did to fix this was to piggyback on
> that code: with the attached jadetex.cfg, both a page-number label
> and a hyperlink anchor will be generated for all and only those flow
> objects that have either a page-number reference or a hyperlink reference.
> (We could try to separate those things, but then we'd need two control
> sequence names not one per object for tracking purposes, and anyway many
> objects will have both kinds of reference if they have either.)
>
> This gets us down to ~135000 strings to build HEAD, and not incidentally,
> the resulting PDF is about half the size it was before. I think I've
> also fixed a number of formerly unexplainable broken hyperlinks in the
> PDF; some are still broken, but they were that way before. (It looks
> like <xref> with endterm doesn't work very well in jadetex; all the
> remaining bad links seem to be associated with uses of that.)
>
> Barring objection I'll commit this tomorrow. I'm inclined to back-patch
> it at least into 9.5, maybe further, because I'm afraid we may be closer
> than we realized to exceeding the strings limit in the back branches too.
>

Impressive, indeed.

When you say it's half the size - is that half the size of the preprocessed
PDF or is it also after the stuff we do on the website PDFs using
jpdftweak? IIRC that tweak is only there to deal with the size, and
specifically it deals with "bookmarks" which sounds a lot like this...

--
Magnus Hagander
Me: http://www.hagander.net/
Work: http://www.redpill-linpro.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jesper Pedersen 2015-11-10 17:03:11 Re: Move PinBuffer and UnpinBuffer to atomics
Previous Message Jim Nasby 2015-11-10 16:10:20 Re: Documentation tweak for row-valued expressions and null