Re: exposing pg_controldata and pg_config as functions

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Joe Conway <mail(at)joeconway(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Peter Eisentraut <peter_e(at)gmx(dot)net>, Josh Berkus <josh(dot)berkus(at)pgexperts(dot)com>
Subject: Re: exposing pg_controldata and pg_config as functions
Date: 2016-02-28 13:16:14
Message-ID: CAB7nPqQsg+iPq5Ma349FvuOqutaTbpNAO=sNcCAu92awOraQjw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Feb 28, 2016 at 9:33 AM, Joe Conway <mail(at)joeconway(dot)com> wrote:
> On 02/21/2016 05:30 AM, Michael Paquier wrote:
>> Looking again at this thread I guess that this is consensus, based on
>> the proposal from Josh and seeing no other ideas around. Another idea
>> would be to group all the fields that into a single function
>> pg_control_data().
>
> I think a single function would be ridiculously wide. I like the four
> separate functions better if we're going to do it this way at all.
>
>> + <indexterm>
>> + <primary>pg_checkpoint_state</primary>
>> + </indexterm>
>> + <para>
>> + <function>pg_checkpoint_state</> returns a record containing
>> + checkpoint_location, prior_location, redo_location, redo_wal_file,
>> + timeline_id, prev_timeline_id, full_page_writes, next_xid, next_oid,
>> + next_multixact_id, next_multi_offset, oldest_xid, oldest_xid_dbid,
>> + oldest_active_xid, oldest_multi_xid, oldest_multi_dbid,
>> + oldest_commit_ts_xid, newest_commit_ts_xid, and checkpoint_time.
>> + </para>
>> This is bit unreadable. The only entry in the documentation that
>> adopts a similar style is pg_stat_file, and with six fields that feels
>> as being enough. I would suggest using a table instead with the type
>> of the field and its name.
>
> Ok, changed to your suggestion.
>
>
>> Regarding the naming of the functions, I think that it would be good
>> to get something consistent with the concept of those being "Control
>> Data functions" by having them share the same prefix, say pg_control_
>> - pg_control_checkpoint
>> - pg_control_init
>> - pg_control_system
>> - pg_control_recovery
>
> No issues -- changed.
>
>> + snprintf (controldata_name, CONTROLDATANAME_LEN,
>> + "%s:", controldata[i].name);
>> Nitpick: extra space.
>
> I didn't understand this comment but it is moot now anyway...
>
>> +static const char *const controldata_names[] =
>> +{
>> + gettext_noop("pg_control version number"),
>> + gettext_noop("Catalog version number"),
>> + gettext_noop("Database system identifier"),
>> Is this complication really necessary? Those identifiers are used only
>> in the frontend and the footprint of this patch on pg_controldata is
>> really large. What I think we should do is have in src/common the
>> following set of routines that work directly on ControlFileData:
>> - checkControlFile, to perform basic sanity checks on the control file
>> (CRC, see for example pg_rewind.c)
>> - getControlFile(dataDir), that simply returns a palloc'd
>> ControlFileData to the caller after looking at global/pg_control.
>> pg_rewind could perhaps make use of the one to check the control file
>> CRC, to fetch ControlFileData there is some parallel logic for the
>> source server if it is either remote or local so it would be better to
>> not use getControlFile in this case.
>
> I agree with the assessment that much of what had been moved based on
> the original pg_controladata() SRF no longer needs to move. This version
> only puts get_controlfile() into src/common, since that is the bit that
> is still shared. If checkControlFile() or something similar is useful
> for pg_rewind or some other extension, I'd say that should be a separate
> patch.
>
> Oh, and the entire thing is now rebased against a git pull from a few
> hours ago. I moved this to the upcoming commitfest too, although I think
> it is pretty well ready to go.

Thanks for the updated version.

+ Returns information about current controldata file state.
s/controldata/control data?

+ <tgroup cols="2">
+ <thead>
+ <row>
+ <entry>Column Name</entry>
+ <entry>Data Type</entry>
+ </row>
+ </thead>
+
Having a description of each field would be nice.

+ * pg_controldata.c
+ * Expose select pg_controldata output, except via SQL functions
I am not much getting the meaning of this sentence. What about the following:
"Set of routines exposing the contents of the control data file in a
set of SQL functions".

+ /* Populate the values and null arrays */
+ values[0] = LSNGetDatum(ControlFile->checkPoint);
+ nulls[0] = false;
+
+ values[1] = LSNGetDatum(ControlFile->prevCheckPoint);
+ nulls[1] = false;
Instead of setting each flag of nulls[] one by one, just calling once
MemSet would be fine. Any method is fine though.

+ /* get a copy of the control file */
+ ControlFile = get_controlfile(DataDir, progname);
Some whitespaces here. git diff is showing in red here.

+ if (ControlFile->pg_control_version % 65536 == 0 &&
+ ControlFile->pg_control_version / 65536 != 0)
+ elog(ERROR, _("byte ordering mismatch"));
You may want to put this check directly in get_controlfile(). it is
repeated 4 times in the backend, and has an equivalent in
pg_controldata.

our @pgcommonallfiles = qw(
- config_info.c exec.c pg_lzcompress.c pgfnames.c psprintf.c
+ config_info.c controldata_utils.c exec.c pg_lzcompress.c pgfnames.c
relpath.c rmtree.c string.c username.c wait_error.c);
"psprintf.c" has been removed from this list. This causes the build
with MSVC to fail.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2016-02-28 13:41:46 Re: [REVIEW] In-core regression tests for replication, cascading, archiving, PITR, etc.
Previous Message Konstantin Knizhnik 2016-02-28 12:14:14 Re: The plan for FDW-based sharding