Re: proposal: possibility to read dumped table's name from file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Justin Pryzby <pryzby(at)telsasoft(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2022-11-12 20:35:59
Message-ID: CAFj8pRBmN1AGm7wBk8279_mf-vCd6vaiY+tJ=dOLrn9p1LAL8w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

pá 11. 11. 2022 v 9:11 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com>
napsal:

> Hi,
>
> On Sat, Nov 05, 2022 at 08:54:57PM +0100, Pavel Stehule wrote:
> >
> > pá 4. 11. 2022 v 14:28 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com>
> > napsal:
> >
> > > > But one thing I noticed is that "optarg" looks wrong here:
> > > >
> > > > simple_string_list_append(&opts->triggerNames, optarg);
> > >
> > > Ah indeed, good catch! Maybe there should be an explicit test for
> every
> > > (include|exclude) / objtype combination? It would be a bit verbose
> (and
> > > possibly hard to maintain).
> > >
> >
> > yes - pg_restore is not well covered by tests, fixed
> >
> > I found another issue. The pg_restore requires a full signature of the
> > function and it is pretty sensitive on white spaces (pg_restore).
>
> Argh, indeed. It's a good thing to have expanded the regression tests :)
>
> > I made a
> > mistake when I partially parsed patterns like SQL identifiers. It can
> work
> > for simple cases, but when I parse the function's signature it stops
> > working. So I rewrote the parsing pattern part. Now, I just read an input
> > string and I try to reduce spaces. Still multiline identifiers are
> > supported. Against the previous method of pattern parsing, I needed to
> > change just one regress test - now I am not able to detect garbage after
> > pattern :-/.
>
> I'm not sure it's really problematic. It looks POLA-violation compatible
> with
> regular pg_dump options, for instance:
>
> $ echo "include table t1()" | pg_dump --filter - | ag CREATE
> CREATE TABLE public.t1 (
>
> $ pg_dump -t "t1()" | ag CREATE
> CREATE TABLE public.t1 (
>
> $ echo "include table t1()blabla" | pg_dump --filter - | ag CREATE
> pg_dump: error: no matching tables were found
>
> $ pg_dump -t "t1()blabla" | ag CREATE
> pg_dump: error: no matching tables were found
>
> I don't think the file parsing code should try to be smart about checking
> the
> found patterns.
>
> > * function's filtering doesn't support schema - when the name of function
> > is specified with schema, then the function is not found
>
> Ah I didn't know that. Indeed it only expect a non-qualified identifier,
> and
> would restore any function that matches the name (and arguments), possibly
> multiple ones if there are variants in different schema. That's unrelated
> to
> this patch though.
>
> > * the function has to be specified with an argument type list - the
> > separator has to be exactly ", " string. Without space or with one space
> > more, the filtering doesn't work (new implementation of pattern parsing
> > reduces white spaces sensitivity). This is not a bug, but it is not well
> > documented.
>
> Agreed.
>
> > attached updated patch
>
> It looks overall good to me! I just have a few minor nitpicking
> complaints:
>
> - you removed the pg_strip_clrf() calls and declared everything as "const
> char
> *", so there's no need to explicitly cast the filter_get_keyword()
> arguments
> anymore
>

removed

>
> Note also that the code now relies on the fact that there are some non-zero
> bytes after a pattern to know that no errors happened. It's not a problem
> as
> you should find an EOF marker anyway if CLRF were stripped.
>

I am not sure if I understand this note well?

>
> + * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> + * Unfortunately, implementation of exit_nicely in pg_dump and pg_restore
> + * is different from implementation of this routine in pg_dumpall. So
> instead
> + * of directly calling exit_nicely we have to return some error flag (in
> this
> + * case NULL), and exit_nicelly will be executed from caller's routine.
>
> Slight improvement:
> [...]
> Unfortunately, the implementation of exit_nicely in pg_dump and pg_restore
> is
> different from the one in pg_dumpall, so instead of...
>
> + * read_pattern - reads an pattern from input. The pattern can be mix of
> + * single line or multi line subpatterns. Single line subpattern starts
> first
> + * non white space char, and ending last non space char on line or by char
> + * '#'. The white spaces inside are removed (around char ".()"), or
> reformated
> + * around char ',' or reduced (the multiple spaces are replaced by one).
> + * Multiline subpattern starts by double quote and ending by this char
> too.
> + * The escape rules are same like for SQL quoted literal.
> + *
> + * Routine signalizes error by returning NULL. Otherwise returns pointer
> + * to next char after last processed char in input string.
>
>
> typo: reads "a" pattern from input...
>

fixed

>
> I don't fully understand the part about subpatterns, but is that necessary
> to
> describe it? Simply saying that any valid and possibly-quoted identifier
> can
> be parsed should make it clear that identifiers containing \n characters
> should
> work too. Maybe also just mention that whitespaces are removed and special
> care is taken to output routines in exactly the same way calling code will
> expect it (that is comma-and-single-space type delimiter).
>

In this case I hit the limits of my English language skills.

I rewrote this comment, but it needs more care. Please, can you look at it?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-11-12 20:37:03 Re: proposal: possibility to read dumped table's name from file
Previous Message Jose Arthur Benetasso Villanova 2022-11-12 19:43:33 Re: Fix order of checking ICU options in initdb and create database