Re: WIP: a way forward on bootstrap data

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: John Naylor <jcnaylor(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: a way forward on bootstrap data
Date: 2018-03-21 15:33:12
Message-ID: 18048.1521646392@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

John Naylor <jcnaylor(at)gmail(dot)com> writes:
> [ v11-bootstrap-data-conversion.tar.gz ]

I've done a round of review work on this, focusing on the Makefile
infrastructure. I found a bunch of problems with parallel builds and
VPATH builds, which are addressed in the attached incremental patch.

The parallel-build issues are a bit of a mess really: it's surprising
we've not had problems there earlier. For instance, catalog/Makefile has

postgres.description: postgres.bki ;
postgres.shdescription: postgres.bki ;
schemapg.h: postgres.bki ;

However, genbki.pl doesn't make any particular guarantee that postgres.bki
will be written sooner than its other output files, which means that in
principle make might think it needs to rebuild these other files on every
subsequent check. That was somewhat harmless given the empty update rule,
but it's not really the right thing. Your patch extended this to make
all the generated headers dependent on postgres.bki, and those are
definitely written before postgres.bki, meaning make *will* think they
are out of date. Worse, it'll also think the symlinks to them are out
of date. So I was running into problems with different parallel make
sub-tasks removing and recreating the symlinks. VPATH builds didn't
work well either, because out-of-date-ness ties into whether make will
accept a file in the source dir as a valid replacement target.

I resolved this mess by setting up a couple of stamp files, which is
a technique we also use elsewhere. bki-stamp is a single file
representing the action of having run genbki.pl, and header-stamp
likewise represents the action of having made the symlinks to the
generated headers. By depending on those rather than individual
files, we avoid questions of exactly what the timestamps on the
individual output files are.

In the attached, I've also done some more work on the README file
and cleaned up a few other little things.

I've not really looked at the MSVC build code at all. Personally,
I'm willing to just commit this (when the time comes) and let the
buildfarm see if the MSVC code works ... but if anyone else wants
to check that part beforehand, please do.

I also have not spent much time yet looking at the end-product .h and .dat
files. I did note a bit of distressing inconsistency in the formatting of
the catalog struct declarations, some of which predates this patch but it
seems like you've introduced more. I think what we ought to standardize
on is a format similar to this in pg_opclass.h:

CATALOG(pg_opclass,2616)
{
/* index access method opclass is for */
Oid opcmethod BKI_LOOKUP(pg_am);

/* name of this opclass */
NameData opcname;

/* namespace of this opclass */
Oid opcnamespace BKI_DEFAULT(PGNSP);

/* opclass owner */
Oid opcowner BKI_DEFAULT(PGUID);

The former convention used in some places, of field descriptions in
same-line comments, clearly won't work anymore if we're sticking
BKI_DEFAULT annotations there. I also don't like the format used in, eg,
pg_aggregate.h of putting field descriptions in a separate comment block
before the struct proper. Bitter experience has shown that there are a
lot of people on this project who won't update comments that are more than
about two lines away from the code they change; so the style in
pg_aggregate.h is just inviting maintenance oversights.

I've got mixed feelings about the whitespace lines between fields. They
seem like they are mostly bulking up the code and we could do without 'em.
On the other hand, pgindent will insist on putting one before any
multi-line field comment, and so that would create inconsistent formatting
if we don't use 'em elsewhere. Thoughts?

Speaking of pgindent, those prettily aligned BKI annotations are a waste
of effort, because when pgindent gets done with the code it will look
like

regproc aggfnoid;
char aggkind BKI_DEFAULT(n);
int16 aggnumdirectargs BKI_DEFAULT(0);
regproc aggtransfn BKI_LOOKUP(pg_proc);
regproc aggfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggcombinefn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggdeserialfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggmtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggminvtransfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);
regproc aggmfinalfn BKI_DEFAULT(-) BKI_LOOKUP(pg_proc);

I'm not sure if there's anything much to be done about this. BKI_DEFAULT
isn't so bad, but additional annotations get unreadable fast. Maybe
BKI_LOOKUP was a bad idea after all, and we should just invent more
Oid-equivalent typedef names.

The attached is just one incremental patch on top of your v11 series.
I couldn't think of an easy way to migrate the changes back into the
most relevant diffs of your series, so I didn't try.

regards, tom lane

Attachment Content-Type Size
v11-bootstrap-data-adjustments-1.patch text/x-diff 24.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeevan Chalke 2018-03-21 15:33:23 Re: [HACKERS] Partition-wise aggregation/grouping
Previous Message Teodor Sigaev 2018-03-21 15:29:18 Re: pgbench randomness initialization