Re: proposal: possibility to read dumped table's name from file

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2021-09-17 11:18:18
Message-ID: FF63A2C8-B20D-49A3-BA0B-21669D255DA1@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

As there have been a lot of differing opinions raised in this thread, I re-read
it and tried to summarize the discussion so far to try and figure out where we
agree and on what (and disagree) before we get deep into the technicalities wrt
the current patch. If anyone feel I've misrepresented them below then I
sincerely do apologize. If I missed a relevant viewpoint I also apologize,
I've tried to objectively represent the thread.

I proposed JSON in [0] which is where the format discussion to some extent
started, Justin and Pavel had up until that point discussed the format by
refining the original proposal.

In [1] Surafel Temesgen brought up --exclude-database from pg_dumpall and
--no-comments, and argued for them being handled by this patch. This was
objected against on the grounds that pg_dumpall is out of scope, and
all-or-nothing switches not being applicable in a filter option.

Stephen objected to both the proposed, and the suggestion of JSON, in [2] and
argued for a more holistic configuration file approach. TOML was suggested.
Dean then +1'd the config file approach in [3].

In [4] Tom supported the idea of a more generic config file, and remarked that
the proposed filter for table names only makes sense when the number of exclude
patterns are large enough that we might hit other problems in pg_dump.
Further, in [5] Tom commented that a format with established quoting
conventions would buy us not having to invent our own to cope with complicated
relation names.

The fact that JSON doesn't support comments is brought up in a few emails and
is a very valid point, as the need for comments regardless of format is brought
up as well.

Tomas Vondra in [6] wanted the object filter be a separate file from a config
file, and argued for a simpler format for these lists (while still supporting
multiple object types).

Alvaro agreed with Tomas on [+-] OBJTYPE OBJIDENT in [7] and Tom extended the
proposal to use [include/exclude] keywords in [8] in order to support more than
just excluding and including. Regardless of stance on format, the use of
keywords instead of [+-] is a rare point of consensus in this thread.

Stephen and myself have also expressed concern in various parts of the thread
that inventing our own format rather than using something with existing broad
library support will end up with third-parties (like pgAdmin et.al) having to
all write their own generators and parsers.

A main concern among most (all?) participants of the thread, regardless of
format supported, is that quoting is hard and must be done right for all object
names postgres support (including any not currently in scope by this patch).

Below is an attempt at summarizing and grouping the proposals so far into the
set of ideas presented:

A) A keyword+object based format to invoke with a switch to essentially
allow for more filters than the commandline can handle and nothing more.
After a set of revisions, the current proposal is:
[include|exclude] [<objtype>] [<objident>]

B) A format similar to (A) which can also be used for pg_dump configuration

C) The format in (A), or a close variant thereof, with the intention of it
being included in/referred to from a future configuration file of currently
unknown format. One reference being a .gitignore type file.

D) An existing format (JSON and TOML have been suggested, with JSON
being dismissed due to lack of comment support) which has quoting
conventions that supports postgres' object names and which can be used to
define a full pg_dump configuration file syntax.

For B), C) and D) there is implicit consensus in the thread that we don't need
to implement the full configuration file as of this patch, merely that it
*must* be possible to do so without having to paint ourselves out of a corner.

At this point it seems to me that B) and C) has the broadest support. Can the
C) option may represent the compromise between "simple" format for object
filtering and a more structured format for configuration? Are there other
options?

Thoughts?

--
Daniel Gustafsson https://vmware.com/

[0] https://postgr.es/m/F6674FF0-5800-4AED-9DC7-13C475707241@yesql.se
[1] https://postgr.es/m/CALAY4q9u30L7oGhbsfY3dPECQ8SrYa8YO=H-xOn5xWUeiEneeg@mail.gmail.com
[2] https://postgr.es/m/20201110200904.GU16415@tamriel.snowman.net
[3] https://postgr.es/m/CAEZATCVKMG7+b+_5tNwrNZ-aNDBy3=FMRNea2bO9O4qGcEvSTg@mail.gmail.com
[4] https://postgr.es/m/502641.1606334432@sss.pgh.pa.us
[5] https://postgr.es/m/619671.1606406538@sss.pgh.pa.us
[6] https://postgr.es/m/cb545d78-2dae-8d27-f062-822a07ca56cf@enterprisedb.com
[7] https://postgr.es/m/202107122259.n6o5uwb5erza@alvherre.pgsql
[8] https://postgr.es/m/3183720.1626131795@sss.pgh.pa.us

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2021-09-17 11:42:09 Re: proposal: possibility to read dumped table's name from file
Previous Message Amit Kapila 2021-09-17 10:37:58 Re: [BUG] Unexpected action when publishing partition tables