Re: Bootstrap DATA is a pita

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Mark Dilger <hornschnorter(at)gmail(dot)com>
Cc: Caleb Welton <cwelton(at)pivotal(dot)io>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Bootstrap DATA is a pita
Date: 2015-12-12 18:28:28
Message-ID: 7349.1449944908@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Mark Dilger <hornschnorter(at)gmail(dot)com> writes:
>> On Dec 11, 2015, at 2:54 PM, Caleb Welton <cwelton(at)pivotal(dot)io> wrote:
>> Compare:
>> CREATE FUNCTION lo_export(oid, text) RETURNS integer LANGUAGE internal STRICT AS 'lo_export' WITH (OID=765);
>>
>> DATA(insert OID = 765 ( lo_export PGNSP PGUID 12 1 0 0 0 f f f f t f v u 2 0 23 "26 25" _null_ _null_ _null_ _null_ _null_ lo_export _null_ _null_ _null_ ));

> I would like to hear more about this idea. Are you proposing that we use something
> like the above CREATE FUNCTION format to express what is currently being expressed
> with DATA statements?

Yes, that sort of idea has been kicked around some already, see the
archives.

> That is an interesting idea, though I don't know what exactly
> that would look like. If you want to forward this idea, I'd be eager to hear your thoughts.
> If not, I'll try to make progress with my idea of tab delimited files and such (or really,
> Alvaro's idea of csv files that I only slightly corrupted).

Personally I would like to see both approaches explored. Installing as
much as we can via SQL commands is attractive for a number of reasons;
but there is going to be an irreducible minimum amount of stuff that
has to be inserted by something close to the current bootstrapping
process. (And I'm not convinced that that "minimum amount" is going
to be very small...) So it's not impossible that we'd end up accepting
*both* types of patches, one to do more in the post-bootstrap SQL world
and one to make the bootstrap data notation less cumbersome. In any
case it would be useful to push both approaches forward some more before
we make any decisions between them.

BTW, there's another thing I'd like to see improved in this area, which is
a problem already but will get a lot worse if we push more work into the
post-bootstrap phase of initdb. That is that the post-bootstrap phase is
both inefficient and impossible to debug. If you've ever had a failure
there, you'll have seen that the backend spits out an entire SQL script
and says there's an error in it somewhere; that's because it gets the
whole per-stage script as one submission. (Try introducing a syntax error
somewhere in information_schema.sql, and you'll see what I mean.)
Breaking the stage scripts down further would help, but that is
unattractive because each one requires a fresh backend startup/shutdown,
including a full checkpoint. I'd like to see things rejiggered so that
there's only one post-bootstrap standalone backend session that performs
all the steps, but initdb feeds it just one SQL command at a time so that
errors are better localized. That should both speed up initdb noticeably
and make debugging easier.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-12-12 18:37:45 PATCH: add pg_current_xlog_flush_location function
Previous Message Petr Jelinek 2015-12-12 18:21:04 Re: WIP: Rework access method interface