Re: proposal: possibility to read dumped table's name from file

From: Daniel Gustafsson <daniel(at)yesql(dot)se>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2021-09-21 12:37:55
Message-ID: 60D482DD-2F63-4627-B154-D6C46A7DAEBE@yesql.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On 21 Sep 2021, at 08:50, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> po 20. 9. 2021 v 14:10 odesílatel Daniel Gustafsson <daniel(at)yesql(dot)se <mailto:daniel(at)yesql(dot)se>> napsal:

> + printf(_(" --filter=FILENAME dump objects and data based on the filter expressions\n"
> + " from the filter file\n"));
> Before we settle on --filter I think we need to conclude whether this file is
> intended to be included from a config file, or used on it's own. If we gow tih
> the former then we might not want a separate option for just --filter.
>
> I prefer to separate two files. Although there is some intersection, I think it is good to have two simple separate files for two really different tasks.
> It does filtering, and it should be controlled by option "--filter". When the implementation will be changed, then this option can be changed too.
> Filtering is just a pg_dump related feature. Revision of client application configuration is a much more generic task, and if we mix it to one, we can be
> in a trap. It can be hard to find one good format for large script generated content, and possibly hand written structured content. For practical
> reasons it can be good to have two files too. Filters and configurations can have different life cycles.

I'm not convinced that we can/should change or remove a commandline parameter
in a coming version when there might be scripts expecting it to work in a
specific way. Having a --filter as well as a --config where the configfile can
refer to the filterfile also passed via --filter sounds like problem waiting to
happen, so I think we need to settle how we want to interact with this file
before anything goes in.

Any thoughts from those in the thread who have had strong opinions on config
files etc?

> + if (filter_is_keyword(keyword, size, "include"))
> I would prefer if this function call was replaced by just the pg_strcasecmp()
> call in filter_is_keyword() and the strlen optimization there removed. The is
> not a hot-path, we can afford the string comparison in case of errors. Having
> the string comparison done inline here will improve readability saving the
> reading from jumping to another function to see what it does.
>
> I agree that this is not a hot-path, just I don't feel well if I need to make a zero end string just for comparison pg_strcasecmp. Current design reduces malloc/free cycles. It is used in more places, when Postgres parses strings - SQL parser, plpgsql parser. I am not sure about the benefits and costs - pg_strcasecmp can be more readable, but for any keyword I have to call pstrdup and pfree. Is it necessary? My opinion in this part is not too strong - it is a minor issue, maybe I have a little bit different feelings about benefits and costs in this specific case, and if you really think the benefits of rewriting are higher, I'll do it

Sorry, I typoed my response. What I meant was to move the pg_strncasecmp call
inline and not do the strlen check, to save readers from jumping around. So
basically end up with the below in read_filter_item():

+ /* Now we expect sequence of two keywords */
+ if (pg_strncasecmp(keyword, "include", size) == 0)
+ *is_include = true;

> + initStringInfo(&line);
> Why is this using a StringInfo rather than a PQExpBuffer as the rest of pg_dump
> does?
>
> The StringInfo is used because I use the pg_get_line_buf function, and this function uses this API.

Ah, of course.

A few other comments from another pass over this:

+ exit_nicely(-1);
Why -1? pg_dump (and all other binaries) exits with 1 on IMO even more serious
errors so I think this should use 1 as well.

+ if (!pg_get_line_buf(fstate->fp, line))
+ {
+ if (ferror(fstate->fp))
+ fatal("could not read from file \"%s\": %m", fstate->filename);
+
+ exit_invalid_filter_format(fstate,"unexpected end of file");
+ }
In the ferror() case this codepath isn't running fclose() on the file pointer
(unless stdin) which we do elsewhere, so this should use pg_log_error and
exit_nicely instead.

+ pg_log_fatal("could not read from file \"%s\": %m", fstate->filename);
Based on how other errors are treated in pg_dump I think this should be
downgraded to a pg_log_error.

The above comments are fixed in the attached, as well as a pass over the docs
and extended tests to actually test matching a foreign server. What do think
about this version? I'm still not convinced that there aren't more quoting
bugs in the parser, but I've left that intact for now.

--
Daniel Gustafsson https://vmware.com/

Attachment Content-Type Size
0001-Implement-filter-for-pg_dump.patch application/octet-stream 20.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message torikoshia 2021-09-21 12:43:14 Re: EXPLAIN(VERBOSE) to CTE with SEARCH BREADTH FIRST fails
Previous Message Greg Nancarrow 2021-09-21 12:35:30 Re: Added schema level support for publication.