Quick Links

Experimental patch: generating BKI revisited

From:	John Naylor <jcnaylor(at)gmail(dot)com>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Experimental patch: generating BKI revisited
Date:	2009-11-04 20:28:11
Message-ID:	4d191a530911041228v621286a7q6a98d9ab8a2ed734@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello everyone,

I was quite intrigued by a discussion that happened this past summer
regarding generation of bootstrap files such as postgres.bki, and the
associated pain points of maintaining the DATA() statements in catalog headers.
It occurred to me that the current system is backwards: Instead of
generating the
bootstrap files from hard-coded strings contained in various header files, it
seems it would be a cleaner design to generate both from a human-readable
high-level description of the system catalogs.

1. You wouldn't need hand-maintained pg_foo.h files. The struct declarations,
Natts/Annums, and such are predictable enough to be machine-generated. Ditto
for all 6 Schema_pg_foo declarations needed by relcache.c.
2. You wouldn't even need to specify the contents of pg_attribute -- the
bootstrap data for that are completely determined by the column names/types of
the bootstrap tables and some type information in pg_type.
3. With a human-readable format, you could have placeholder strings
representing oid values.
4. Since looking up catalog oids would be easy, we could get rid of
postgres.description, postgres.shdescription by putting those statements into
postgres.bki as well, which would eliminate the need for the
setup_description() function in initdb.c with its hard-coded SQL.

The patch linked to below implements 1-3 of what I've just described.
Unfortunately, as will become apparent, it introduces a dependency on a Perl
module which is inappropriate for Postgres' requirements of portability. So
consider this a research project, not an item for the patch queue. My hope is
that it could still be the basis for a practical solution. If this interests
you, keep reading...

FILE GENERATION

There are 4 scripts, which output distprep targets:

gen_bki.pl -- generates postgres.bki and schemapg.h
gen_header.pl -- generates the pg_foo.h headers.
gen_descr.pl -- generates postgres.description and postgres.shdescription
gen_fmgr.pl -- generates fmgroids.h and fmgrtab.c

CATALOG DATA FORMAT

The catalog data is represented in the YAML format. The features that led to
this decision are:

1. Ease of reading/editing.
2. Anchors/aliases -- enable human-readable oid values.
3. Merge key -- enables default values.

A simple example using pg_tablespace will illustrate:

pg_tablespace:
relation_oid: 1213
relation_define: TableSpaceRelationId
shared_relation: True
columns:
- spcname: name # tablespace name
- spcowner: oid # owner of tablespace
- spclocation: text # physical location (VAR LENGTH)
- spcacl: aclitem[] # access permissions (VAR LENGTH)
column_defaults: &pg_tablespace_defaults
spcowner: *PGUID
spclocation: '""'
spcacl: _null_
data:
- spcname: pg_default
oid: &DEFAULTTABLESPACE_OID 1663
define: DEFAULTTABLESPACE_OID
<<: *pg_tablespace_defaults
- spcname: pg_global
oid: 1664
define: GLOBALTABLESPACE_OID
<<: *pg_tablespace_defaults

When the YAML parser loads this into the Perl data structures used by the
scripts, they look similar to this when output through Data::Dumper:

$catalogs->{pg_tablespace} = {
'shared_relation' => 'True',
'relation_oid' => 1213,
'relation_define' => 'TableSpaceRelationId',
'columns' => [
{ 'spcname' => 'name' },
{ 'spcowner' => 'oid' },
{ 'spclocation' => 'text' },
{ 'spcacl' => 'aclitem[]' }
],
'data' => [
{
'spcname' => 'pg_default',
'spclocation' => '""',
'oid' => 1663,
'spcacl' => '_null_',
'define' => 'DEFAULTTABLESPACE_OID',
'spcowner' => 10
},
{
'spcname' => 'pg_global',
'spclocation' => '""',
'oid' => 1664,
'spcacl' => '_null_',
'define' => 'GLOBALTABLESPACE_OID',
'spcowner' => 10
}
],
'column_defaults' => {
'spclocation' => '""',
'spcacl' => '_null_',
'spcowner' => 10
}
};

Note that the alias *PGUID is expanded to 10 since there was (not shown here) a
corresponding anchor in pg_authid -- "oid: &PGUID 10". Similarly, in any
subsequent catolog that refers to oid 1663 you can instead write
*DEFAULTTABLESPACE_OID. Note also that each data entry
is merged with the column_defaults hash.

A portion of a more complex example will hopefully motivate this method of data
organization:

This is the first 5 entries of the current representation of pg_amop:

/* default operators int2 */
DATA(insert ( 1976 21 21 1 95 403 ));
DATA(insert ( 1976 21 21 2 522 403 ));
DATA(insert ( 1976 21 21 3 94 403 ));
DATA(insert ( 1976 21 21 4 524 403 ));
DATA(insert ( 1976 21 21 5 520 403 ));

The YAML representation leaves out half of the columns, to be computed at
compile time by gen_bki.pl, since they are intentional denormalizations. The
remaining columns are self-documenting because of human-readable oids:

- amopfamily: *integer_btree_fam
amopstrategy: 1
amopopr: *int2lt_op
- amopfamily: *integer_btree_fam
amopstrategy: 2
amopopr: *int2le_op
- amopfamily: *integer_btree_fam
amopstrategy: 3
amopopr: *int2eq_op
- amopfamily: *integer_btree_fam
amopstrategy: 4
amopopr: *int2ge_op
- amopfamily: *integer_btree_fam
amopstrategy: 5
amopopr: *int2gt_op

I think this approach is more readable and less error-prone.

DEPENDENCIES AND THE REAL WORLD

Parsing YAML into Perl data structures requires the YAML::XS module, which in
turn requires Perl 5.10. Since the generated files are distprep targets, this
would only apply to those who want to build from the repo. Since this
is still an
unacceptable dependency, it might be worth it to use the new infrastructure
with a simpler data format that can be parsed with straight Perl or C.

NEW WARTS

Some entries in catalog.yaml are only there to put macros into the generated
header. These are indicated by data entries that contain only a
"define:" value and a "nobki: True" value. If a catalog header used
to contain things like function prototypes, enums, and #include's, these have
been put into 9 new pg_foo_fn.h files which are #include'd into the generated
pg_foo.h file. This is indicated by the presence of an "include:" value. The
number of new files could be reduced by consolidation, but I didn't do that so
that it would be obvious where the definitions come from.

The old mechanism is retained for the declare index and declare toast
statements. That is, they are still retrieved from indexing.h and toasting.h.
using regular expressions.

CAVEATS FOR THE CURIOUS

1. I haven't changed the configure script to test for YAML::XS.
2. I've run make -j2 successfully, but I'm not positive that my changes are
100% correct for parallel make.
3. I don't have ready access to a Windows box with the necessary development
environment, so MSVC is certainly broken.
4. Since there are whitepace inconsistencies in the current headers, you need
this command on the current postgres.bki to diff cleanly with mine:
sed -i 's/_)$/_ )/' src/backend/catalog/postgres.bki

INFO

The project is located at

http://git.postgresql.org/gitweb?p=users/jnaylor/bki.git;a=summary

Some code snippets and conventions were borrowed from Robert Haas' earlier
efforts. Feedback is appreciated.

John

Responses

Re: Experimental patch: generating BKI revisited at 2009-11-13 13:16:13 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Alvaro Herrera	2009-11-04 20:49:44	Re: WIP: pushing parser hooks through SPI and plancache
Previous Message	Tom Lane	2009-11-04 20:21:30	Re: "ERROR: could not read block 6 ...: read only 0 of 8192 bytes" after autovacuum cancelled