Re: Bootstrap DATA is a pita

From: Mark Dilger <hornschnorter(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Caleb Welton <cwelton(at)pivotal(dot)io>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bootstrap DATA is a pita
Date: 2015-12-11 23:30:04
Message-ID: 3A8EFC37-E483-4FA6-A996-881EC29CBA63@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> On Dec 11, 2015, at 3:02 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Mark Dilger <hornschnorter(at)gmail(dot)com> writes:
>>> On Dec 11, 2015, at 2:40 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> Huh? Those files are the definition of that mapping, no? Isn't what
>>> you're proposing circular?
>
>> No, there are far more references to Oids than there are definitions of them.
>
> Well, you're still not being very clear, but I *think* what you're
> proposing is to put a lot more smarts into the script that converts
> the master source files into .bki format. That is, we might have
> "=(int8,int4)" in an entry in the master source file for pg_amop, but
> the script would look up that entry using the source data for pg_type
> and pg_operator, and then emit a simple numeric OID into the .bki file.
> (Presumably, it would know to do this because we'd redefine the
> pg_amop.amopopr column as of regoperator type not plain OID.)
>
> Yeah, that could work, though I'd be a bit concerned about the complexity
> and speed of the script. Still, one doesn't usually rebuild postgres.bki
> many times a day, so speed might not be a big problem.

I am proposing that each of the catalog headers that currently has DATA
lines instead have a COPY loadable file that contains the same information.
So, for pg_type.h, there would be a pg_type.dat file. All the DATA lines
would be pulled out of pg_type.h and a corresponding tab delimited row
would be written to pg_type.dat. Henceforth, if you cloned the git repository,
you'd find no DATA lines in pg_type.h, but would find a pg_type.dat file
in the src/include/catalog directory. Likewise for the other header files.

There would be some script, SQL or perl or whatever, that would convert
these .dat files into the .bki file.

Now, if we know that pg_type.dat will be processed before pg_proc.dat,
we can replace all the Oids representing datatypes in pg_proc.dat with the
names for those types, given that we already have a name <=> oid
mapping for types.

Likewise, if we know that pg_proc.dat will be processed before pg_operator.dat,
we can specify both functions and datatypes by name rather than by Oid
in that file, making it much easier to read. By the time pg_operator.dat is
read, pg_type.dat and pg_proc.dat will already have been read and processed,
so there shouldn't be ambiguity.

By the time pg_amop.dat is processed, the operators, procs, datatypes,
opfamilies and so forth would already be know. The example I gave up
thread would be easy to parse:

amopfamily amoplefttype amoprighttype amopstrategy amoppurpose amopopr amopmethod amopsortfamily
integer_ops int2 int2 1 search "<" btree 0
integer_ops int2 int2 2 search "<=" btree 0
integer_ops int2 int2 3 search "=" btree 0
integer_ops int2 int2 4 search ">=" btree 0
integer_ops int2 int2 5 search ">" btree 0

And if I came along and defined a new datatype, int384, I could add rows to
this file much more easily, as:

amopfamily amoplefttype amoprighttype amopstrategy amoppurpose amopopr amopmethod amopsortfamily
integer_ops int384 int384 1 search "<" btree 0
integer_ops int384 int384 2 search "<=" btree 0
integer_ops int384 int384 3 search "=" btree 0
integer_ops int384 int384 4 search ">=" btree 0
integer_ops int384 int384 5 search ">" btree 0

I don't see how this creates all that much complication, and I clearly see
how it makes files like pg_operator.{h,dat} and pg_amop.{h,dat} easier to read.

mark

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2015-12-11 23:33:26 pgsql: pg_rewind: Don't error if the two clusters are already on the sa
Previous Message Peter Geoghegan 2015-12-11 23:20:11 Re: Using quicksort for every external sort run