Re: WIP: a way forward on bootstrap data

From: John Naylor <jcnaylor(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, David Fetter <david(at)fetter(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: a way forward on bootstrap data
Date: 2018-01-12 20:54:57
Message-ID: CAJVSVGUNRNWc1b-npcrvF6o5q5em4VO3cD+OK121CT9XoyiONA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Tom, everyone,
It's getting late in my timezone, but I wanted to give a few quick
answers. I'll follow up tomorrow. Thanks Alvaro for committing my
refactoring of pg_attribute data creation. I think your modifications
are sensible and I'll rebase soon.

On 1/13/18, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> It's not very clear to me what the proposed data format actually is,
> and I don't really want to read several hundred KB worth of patches
> in order to reverse-engineer that information. Nor do I see
> anything in the patch list that obviously looks like it updates
> doc/src/sgml/bki.sgml to explain things.

Alvaro gave a good overview, so I'll just point out a few things.

-Patches 0002 through 0007 represent a complete one-to-one migration
of data entries. I didn't see much in bki.sgml specific to the current
format, so my documentation changes are confined largely to the
README, in patch 0005.
-Patches 0008 and 0009 implement techniques to make the data lines
shorter. My choices are certainly debatable. There is a brief addition
to the README in patch 0008. The abbreviation technique was only used
in three catalogs to demonstrate.
-Patches 0010 and 0011 implement human-readable OID references.
-Patches 0012 and 0013 are cosmetic, but invasive.

> Seems like we would almost need a per-catalog convention on how to lay out
> the entries, or else we're going to end up (over time) with lots of cowboy
> coding leading to entries that look randomly different from the ones
> around them.

If I understand your concern correctly, the convention is enforced by
a script (rewrite_dat.pl). At the very least this would be done at the
same time as pg_indent and perltidy. To be sure, because of default
values many entries will look randomly different from the ones around
them regardless. I have a draft patch to load the source data into
tables for viewing, but it's difficult to rebase, so I thought I'd
offer that enhancement later.

> One other question is how we'll verify the conversion. Is there an
> expectation that the .bki file immediately after the conversion will
> be identical to immediately before?

Not identical. First, as part of the base migration, I stripped almost
all double quotes from the data entries since the new Perl hash values
are already single-quoted. (The exception is macros expanded by
initdb.c) I made genbki.pl add quotes on output to match what
bootscanner.l expects. Where a simple rule made it possible, it also
matches the original .bki. The new .bki will only diff where the
current data has superfluous quotes. (ie. "0", "sql"). Second, if the
optional cosmetic patch 0013 is applied, the individual index and
toast commands will be in a different order.

> Check. Where is it coming from --- I suppose we aren't going to try to
> store this in the existing .h files? What provisions will there be for
> comments?

Yes, they're in ".dat" files. Perl comments (#) on their own line are
supported. I migrated all existing comments from the header files as
part of the conversion. This is scripted, so I can rebase over new
catalog entries that get committed.

> I think single-letter abbreviations here are a pretty bad space vs
> readability tradeoff, particularly for wider catalogs where it risks
> ambiguity.

Ironically, I got that one from you [1] ;-), but if you have a
different opinion upon seeing concrete, explicit examples, I think
that's to be expected.

--
Now is probably a good time to disclose concerns of my own:
1. MSVC dependency tracking is certainly broken until such time as I
can shave that yak and test.
2. Keeping the oid symbols with the data entries required some
Makefile trickery to make them visible to .c files outside the backend
(patch 0007). It builds fine, but the dependency tracking might have
bugs.

--
[1] https://www.postgresql.org/message-id/15697.1479161432%40sss.pgh.pa.us

Thanks,
John Naylor

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2018-01-12 21:07:09 Re: [HACKERS] postgres_fdw bug in 9.6
Previous Message Tom Lane 2018-01-12 20:52:08 Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT