Re: proposal: possibility to read dumped table's name from file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2021-09-15 17:31:13
Message-ID: CAFj8pRAZbCvM3v8jp5wkPpZHJvAopWpbF=+v+84Pns=qWtvEcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi

po 13. 9. 2021 v 15:01 odesílatel Daniel Gustafsson <daniel(at)yesql(dot)se>
napsal:

> > On 28 Jul 2021, at 09:28, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> > út 13. 7. 2021 v 1:16 odesílatel Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us <mailto:
> tgl(at)sss(dot)pgh(dot)pa(dot)us>> napsal:
>
> > Hence I suggest
> >
> > include table PATTERN
> > exclude table PATTERN
> >
> > which ends up being the above but with words not [+-].
>
> One issue with this syntax is that the include keyword can be quite
> misleading
> as it's semantic interpretion of "include table t" can be different from
> "--table=t". The former is less clear about the fact that it means
> "exclude
> all other tables than " then the latter. It can be solved with
> documentation,
> but I think that needs be to be made clearer.
>

I invite any documentation enhancing and fixing

>
> > Here is an updated implementation of filter's file, that implements
> syntax proposed by you.
>
> While it's not the format I would prefer, it does allow for most (all?) use
> cases expressed in this thread with ample armtwisting applied so let's go
> ahead
> from this point and see if we can agree on it (or a version of it).
>
> A few notes on the patch after a first pass over it:
>
> +(include|exclude)[table|schema|foreign_data|data] <replaceable
> class="parameter">objectname</replaceable>
> Lacks whitespace between keyword and object type. Also, since these are
> mandatory parameters, shouldn't they be within '{..}' ?
>
> yes, fixed

>
> + /* skip initial white spaces */
> + while (isblank(*ptr))
> + ptr += 1;
> We don't trust isblank() as of 3fd5faed5 due to portability concerns, this
> should probably use a version of the pg_isblank() we already have (and
> possibly
> move that to src/common/string.c as there now are more consumers).
>
>
I rewrote this part, and I don't use function isblank ever

>
> +static bool
> +isblank_line(const char *line)
> This could be replaced with a single call to strspn() as we already do for
> parsing the TOC file.
>
>
> + /* when first char is hash, ignore whole line */
> + if (*str == '#')
> + continue;
> I think we should strip leading whitespace before this to allow
> commentlines to
> start with whitespace, it's easy enough and will make life easier for
> users.
>

now, the comments can be used as first non blank char or after filter

>
>
> + pg_log_error("invalid format of filter file \"%s\": %s",
> + filename,
> + message);
> +
> + fprintf(stderr, "%d: %s\n", lineno, line);
> Can't we just include the lineno in the error logging and skip dumping the
> offending line? Fast-forwarding the pointer to print the offending part is
> less useful than a context marker, and in some cases suboptimal. With this
> coding, if a pattern is omitted for example the below error message is
> given:
>
>
pg_dump: error: invalid format of filter file "filter.txt": missing
> object name
> 1:
>
> The errormessage and the linenumber in the file should be enough for the
> user
> to figure out what to fix.
>

I did it like you proposed, but still, I think the content can be useful.
More times you read dynamically generated files, or you read data from
stdin, and in complex environments it can be hard regenerate new content
for debugging.

>
> + if (keyword && is_keyword(keyword, size, "table"))
> + objecttype = 't';
> Should this use an enum, or at least a struct translation the literal
> keyword
> to the internal representation? Magic constants without explicit
> connection to
> their token counterparts can easily be cause of bugs.
>
>
fixed

> If I create a table called "a\nb" and try to dump it I get an error in
> parsing
> the file. Isn't this supposed to work?
> $ cat filter.txt
> include table "a
> b"
> $ ./bin/pg_dump --filter=filter.txt
> pg_dump: error: invalid format of filter file "filter.txt": unexpected
> chars after object name
> 2:

probably there was some issue, because it should work. I tested a new
version and this is tested in new regress tests. Please, check

>
> Did you consider implementing this in Bison to abstract some of the messier
> parsing logic?
>

Initially not, but now, when I am thinking about it, I don't think so Bison
helps. The syntax of the filter file is nicely linear. Now, the code of the
parser is a little bit larger than minimalistic, but it is due to nicer
error's messages. The raw implementation in Bison raised just "syntax
error" and positions. I did code refactoring, and now the scanning, parsing
and processing are divided into separated routines. Parsing related code
has 90 lines. In this case, I don't think using a parser grammar file can
carry any benefit. grammar is more readable, sure, but we need to include
bison, we need to handle errors, and if we want to raise more helpful
errors than just "syntax error", then the code will be longer.

please, check attached patch

Regards

Pavel

> --
> Daniel Gustafsson https://vmware.com/
>
>

Attachment Content-Type Size
pg_dump-filteropt-20210915.patch text/x-patch 17.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Julien Rouhaud 2021-09-15 17:44:52 Re: Hook for extensible parsing.
Previous Message Julien Rouhaud 2021-09-15 17:23:42 Re: Hook for extensible parsing.