Re: proposal: possibility to read dumped table's name from file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2021-09-21 06:50:31
Message-ID: CAFj8pRDFwyGmyefihxnk4ZCVedPp=Gs3H=CxbPj2-hBguvmoXw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel(at)yesql(dot)se>
napsal:

> > Will do a closer review on the patch shortly.
>
> Had a read through, and tested, the latest posted version today:
>
> + Read objects filters from the specified file. Specify "-" to read from
> + stdin. Lines of this file must have the following format:
> I think this should be <filename>-</filename> and <literal>STDIN</literal>
> to
> match the rest of the docs.
>
>
> + <para>
> + With the following filter file, the dump would include table
> + <literal>mytable1</literal> and data from foreign tables of
> + <literal>some_foreign_server</literal> foreign server, but
> exclude data
> + from table <literal>mytable2</literal>.
> +<programlisting>
> +include table mytable1
> +include foreign_data some_foreign_server
> +exclude table mytable2
> +</programlisting>
> + </para>
> This example is highlighting the issue I've previously raised with the
> UX/doc
> of this feature. The "exclude table mytable2" is totally pointless in the
> above since the exact match of "mytable1" will remove all other objects.
> What
> we should be doing instead is use the pattern matching aspect along the
> lines
> of the below:
>
> include table mytable*
> exclude table mytable2
>
> + The <option>--filter</option> option works just like the other
> + options to include or exclude tables, schemas, table data, or
> foreign
> This should refer to the actual options by name to make it clear which we
> are
> talking about.
>

fixed

>
> + printf(_(" --filter=FILENAME dump objects and data
> based on the filter expressions\n"
> + " from the filter
> file\n"));
> Before we settle on --filter I think we need to conclude whether this file
> is
> intended to be included from a config file, or used on it's own. If we
> gow tih
> the former then we might not want a separate option for just --filter.
>

I prefer to separate two files. Although there is some intersection, I
think it is good to have two simple separate files for two really different
tasks.
It does filtering, and it should be controlled by option "--filter". When
the implementation will be changed, then this option can be changed too.
Filtering is just a pg_dump related feature. Revision of client application
configuration is a much more generic task, and if we mix it to one, we can
be
in a trap. It can be hard to find one good format for large script
generated content, and possibly hand written structured content. For
practical
reasons it can be good to have two files too. Filters and configurations
can have different life cycles.

>
> + if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the
> pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed.
> The is
> not a hot-path, we can afford the string comparison in case of errors.
> Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
>

I agree that this is not a hot-path, just I don't feel well if I need to
make a zero end string just for comparison pg_strcasecmp. Current design
reduces malloc/free cycles. It is used in more places, when Postgres parses
strings - SQL parser, plpgsql parser. I am not sure about the benefits and
costs - pg_strcasecmp can be more readable, but for any keyword I have to
call pstrdup and pfree. Is it necessary? My opinion in this part is not too
strong - it is a minor issue, maybe I have a little bit different feelings
about benefits and costs in this specific case, and if you really think the
benefits of rewriting are higher, I'll do it.

>
> + initStringInfo(&line);
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of
> pg_dump
> does?
>

The StringInfo is used because I use the pg_get_line_buf function, and this
function uses this API.

>
> +typedef struct
> I think these should be at the top of the file with the other typedefs.
>
>
done

>
> When testing strange object names, I was unable to express this name in
> the filter file:
>
> $ ./bin/psql
> psql (15devel)
> Type "help" for help.
>
> danielg=# create table "
> danielg"# t
> danielg"# t
> danielg"# " (a integer);
> CREATE TABLE
> danielg=# select relname from pg_class order by oid desc limit 1;
> relname
> ---------
> +
> t +
> t +
>
> (1 row)
>
>
Good catch - I had badly placed pg_strip_crlf function, fixed and regress
tests enhanced

Please check assigned patch

Regards

Pavel

> --
> Daniel Gustafsson https://vmware.com/
>
>

Attachment Content-Type Size
pg_dump-filteropt-20210921.patch text/x-patch 18.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2021-09-21 07:02:02 Re: POC: Cleaning up orphaned files using undo logs
Previous Message Gavin Flower 2021-09-21 06:49:36 Re: Release 14 Schedule