Re: pg_dump new feature: exporting functions only. Bad or good idea ?

From: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
To: Laetitia Avrot <laetitia(dot)avrot(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ryan Lambert <ryan(at)rustprooflabs(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, ahsan hadi <ahsan(dot)hadi(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: pg_dump new feature: exporting functions only. Bad or good idea ?
Date: 2022-03-26 17:00:00
Message-ID: CAKFQuwZA4ch0ajw4xpHFiTCaPFwk+G-bSn6bhcAS1sT_coPg8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 26, 2022 at 1:53 AM Laetitia Avrot <laetitia(dot)avrot(at)gmail(dot)com>
wrote:

> Hello all,
>
> Le sam. 26 mars 2022 à 01:13, Michael Paquier <michael(at)paquier(dot)xyz> a
> écrit :
>
>> On Fri, Mar 25, 2022 at 10:09:33PM +0100, Daniel Gustafsson wrote:
>> > Agreed. In this case it seems that adding --exclude-extension would
>> make sense
>> > to keep conistency. I took a quick stab at doing so with the attached
>> while
>> > we're here.
>>
>> src/test/modules/test_pg_dump would be the best place for the addition
>> of a couple of tests with this new switch. Better to check as well
>> what happens when a command collides with --extension and
>> --exclude-extension.
>>
>> printf(_(" -e, --extension=PATTERN dump the specified
>> extension(s) only\n"));
>> + printf(_(" --exclude-extension=PATTERN do NOT dump the specified
>> extension(s)\n"));
>> Shouldn't this be listed closer to --exclude-table-data in the --help
>> output?
>>
>
> I think it's time to sum up what we want to do:
>
> - We'd like to use switches to export objects according to a pattern.
> - For each object type we will have an --object=PATTERN flag and a
> --exclude-object=PATTERN
> - Having a short flag for each of the long flags is not mandatory
> - The object types that pg_dump can select so far are:
> - table (already written)
> - schema (already written)
> - extension (half-written, --exclude-extension not written)
> - routine (TBD ASAP). Routine flag operates on stored functions,
> stored procedures, aggregate functions, and window functions.
> - By default, pg_dump does not export system objects but we found out that
> we could use --table='pg_catalog.*' to export them. This is a bug and will
> be fixed. pg_dump won't have the ability to export any system object
> anymore. Should the fix belong to that patch or do I need to create a
> separate patch? (Seems to me it should be separated)
>
> If everyone is ok with the points above, I'll write both patches.
>
>
That looks correct.

I would say we should make the --table change and the --exclude-extension
change as separate commits.

Michael's question brought up a point that we should address. I do not
think having these (now) 4 pairs of options presented strictly
alphabetically in the documentation is a good choice and we should deviate
from that convention here for something more user-friendly, and to reduce
the repetitiveness that comes from having basically what could be one pair
of options actually implemented as 3 pairs. My initial approach would be
to move them all to a subsection after the --help parameter and before the
section header for -d. That section would be presented something like:

"""
These options allow for fine-grained control of which user objects are
produced by the dump (system objects are never dumped). If no inclusion
options are specified all objects are dumped except those that are
explicitly excluded. If even one inclusion option is specified then only
those objects selected for inclusion, and not excluded, will appear in the
dump.

These options can appear multiple times within a single pg_dump command
line. For each of these there is a mandatory pattern value, so the actual
option looks like, e.g., --table='public.*', which will select all
relations in the public schema. See (Patterns). When using wildcards, be
careful to quote the pattern if needed to prevent the shell from expanding
the wildcards.

When using these options, dependencies of the selected objects are not
automatically dumped, thus making such a dump potentially unsuitable for
restoration into a clean database.

This subset of options control which schemas to select objects from for an
otherwise normal dump.

--schema / -n
--exclude-schema / -N

The following subset specifies which non-schema objects to include. These
are added to the objects that end up selected due to their presence in a
schema. Specifically, the --exclude-schema option is ignored while
processing these options.

--table / -t
Considers all relations, not just tables. i.e., views, materialized
views, foreign tables, and sequences.
--routine
Considers functions, procedures, aggregates, window functions
--extension / -e
Considers extensions

When dumping data, only local table data is dumped by default. Specific
table data can be excluded using the --exclude-table-data option.

Specifying a foreign server using --include-foreign-data will cause related
foreign table data to also be dumped.

The following subset specifies which objects to exclude. An object that
matches one of these patterns will never be dumped.

--exclude-table / -T
--exclude-routine
--exclude-extension

The following options control the dumping of large objects:

-b
--blobs
Include large objects in the dump. This is the default behavior except when
--schema, --table, or --schema-only is specified. The -b switch is
therefore only useful to add large objects to dumps where a specific schema
or table has been requested. Note that blobs are considered data and
therefore will be included when --data-only is used, but not when
--schema-only is.

-B
--no-blobs
Exclude large objects in the dump.

When both -b and -B are given, the behavior is to output large objects,
when data is being dumped, see the -b documentation.
"""

I've kept the blob wording as-is but moved it here since it seems to fit
the criteria of the section. I don't necessarily believe the wording is
the best possible. In particular the mention of --schema-only should
simply be removed. The comment that blobs are data implies they are not
dumped during --schema-only. Then, --schema and --table should be
consolidated to refer to the "inclusion options" listed immediately above.
I don't know why we didn't just error if the user specified both -b and -B
at the same time. Is it too late to change that decision?

I'm tempted to move the three section-related options to the top of this as
well. Ordering --data-only and --schema-only alphabetically also doesn't
make sense to me, nor does keeping them away from the --section option.

--no-privileges is separated from the other --no-[property] options (e.g.,
--no-comments) due to its having an -X short option and everything with a
short option being listed before things without...I would move it, and
probably place all of the "-no-[property] stuff, as-is, in a subsection
with a descriptive leading paragraph. The blob options probably belong
here as well.

Yes, this is a fairly radical suggestion. But combined with some xrefs
from a table of contents at the top of the page I do believe it would make
for a better user experience. We already do this for the database
connection parameters section of the documentation and the same rationale
exists for taking the remaining large list of options and categorizing them
into meaningful units.

David J.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-03-26 17:15:50 Re: [PATCH] Expose port->authn_id to extensions and triggers
Previous Message Tom Lane 2022-03-26 16:37:44 Re: Pointer subtraction with a null pointer