Re: proposal: possibility to read dumped table's name from file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2022-11-03 21:22:15
Message-ID: CAFj8pRAyyXBFiD9U6Gpo7M++Jq33B_L1gWJEnGJu=WcMk+McQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

čt 3. 11. 2022 v 5:09 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com> napsal:

> Hi,
>
> On Wed, Oct 26, 2022 at 06:26:26AM +0200, Pavel Stehule wrote:
> >
> > út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com>
> > napsal:
> >
> > >
> > > I'm wondering if psql's parse_identifier() could be exported and reused
> > > here
> > > rather than creating yet another version.
> > >
> >
> > I looked there, and I don't think this parser is usable for this purpose.
> > It is very sensitive on white spaces, and doesn't support multi-lines.
> It
> > is designed for support readline tab complete, it is designed for
> > simplicity not for correctness.
>
> Ah, sorry I should have checked more thoroughly. I guess it's ok to have
> another identifier parser for the include file then, as this new one
> wouldn't
> really fit the tab-completion use case.
>
> > > also, is there any reason why this function doesn't call exit_nicely in
> > > case of
> > > error rather than letting each caller do it without any other cleanup?
> > >
> >
> > It is commented few lines up
> >
> > /*
> > * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> > * Unfortunatelly, implementation of exit_nicely in pg_dump and
> pg_restore
> > * is different from implementation of this rutine in pg_dumpall. So
> instead
> > * direct calling exit_nicely we have to return some error flag (in this
> > * case NULL), and exit_nicelly will be executed from caller's routine.
> > */
>
> Oh right, I totally missed it sorry about that!
>
> About the new version, I didn't find any problem with the feature itself so
> it's a good thing!
>
> I still have a few comments about the patch. First, about the behavior:
>
> - is that ok to have just "data" pattern instead of "table_data" or
> something
> like that, since it's supposed to match --exclude-table-data option?
>

done

>
> - the error message are sometimes not super helpful. For instance:
>
> $ echo "include data t1" | pg_dump --filter -
> pg_dump: error: invalid format of filter on line 1: include filter is not
> allowed for this type of object
>
> It would be nice if the error message mentioned "data" rather than a
> generic
> "this type of object". Also, maybe we should quote "include" to outline
> that
> we found this keyword?
>
>
done

> About the patch itself:
> filter.c:
>
> +#include "postgres_fe.h"
> +
> +#include "filter.h"
> +
> +#include "common/logging.h"
>
> the filter.h inclusion should be done with the rest of the includes, in
> alphabetical order.
>
>
done

> +#define is_keyword_str(cstr, str, bytes) \
> + ((strlen(cstr) == bytes) && (pg_strncasecmp(cstr, str, bytes) ==
> 0))
>
> nit: our guidline is to protect macro arguments with parenthesis. Some
> arguments can be evaluated multiple times but I don't think it's worth
> adding a
> comment for that.
>
>
done

> + * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
> + * is different from implementation of this rutine in pg_dumpall. So
> instead
> + * direct calling exit_nicely we have to return some error flag (in this
>
> typos: s/Unfortunatelly/Unfortunately/ and s/rutine/routine/
> Also, it would probably be better to say "instead of directly calling..."
>
>
done

> +static const char *
> +filter_object_type_name(FilterObjectType fot)
> +{
> + switch (fot)
> + {
> + case FILTER_OBJECT_TYPE_NONE:
> + return "comment or empty line";
> +[...]
> + }
> +
> + return "unknown object type";
> +}
>
> I'm wondering if we should add a pg_unreachable() there, some compilers
> might
> complain otherwise. See CreateDestReceiver() for instance for similar
> pattern.
>

done

>
> + * Emit error message "invalid format of filter file ..."
> + *
> + * This is mostly a convenience routine to avoid duplicating file closing
> code
> + * in multiple callsites.
> + */
> +void
> +log_invalid_filter_format(FilterStateData *fstate, char *message)
>
> nit: invalid format *in* filter file...?
>

changed

>
> +void
> +log_unsupported_filter_object_type(FilterStateData *fstate,
> +
> const char *appname,
> +
> FilterObjectType fot)
> +{
> + PQExpBuffer str = createPQExpBuffer();
> +
> + printfPQExpBuffer(str,
> + "The application \"%s\" doesn't
> support filter for object type \"%s\".",
>
> nit: there shouldn't be uppercase in error messages, especially since this
> will
> be appended to another message by log_invalid_filter_format. I would just
> just
> drop "The application" entirely for brevity.
>

changed

>
> +/*
> + * Release allocated resources for filter
> + */
> +void
> +filter_free(FilterStateData *fstate)
>
> nit: Release allocated resources for *the given* filter?
>

changed

>
> + * Search for keywords (limited to ascii alphabetic characters) in
> + * the passed in line buffer. Returns NULL, when the buffer is empty or
> first
> + * char is not alpha. The length of the found keyword is returned in the
> size
> + * parameter.
> + */
> +static const char *
> +filter_get_keyword(const char **line, int *size)
> +{
> + [...]
> + if (isascii(*ptr) && isalpha(*ptr))
> + {
> + result = ptr++;
> +
> + while (isascii(*ptr) && (isalpha(*ptr) || *ptr == '_'))
> + ptr++;
>
> Is there any reason to test isascii()? isalpha() should already cover
> that and
> should be cheaper to test anyway.
>

changed. I wanted to limit keyword's char just for basic ascii alphabets,
but the benefit probably is not too strong, and the real effect can be
messy, so I removed isascii test

>
> Also nit: "Returns NULL when the buffer..." (unnecessary comma), and the
> '_'
> char is also allowed.
>

done

>
> +filter_read_item(FilterStateData *fstate,
> + bool *is_include,
> + char **objname,
> + FilterObjectType *objtype)
> +{
> + Assert(!fstate->is_error);
> +
> + if (pg_get_line_buf(fstate->fp, &fstate->linebuff))
> + {
> + char *str = fstate->linebuff.data;
> + const char *keyword;
> + int size;
> +
> + fstate->lineno++;
> +
> + (void) pg_strip_crlf(str);
> +
> + /* Skip initial white spaces */
> + while (isspace(*str))
> + str++;
> +[...]
> + keyword = filter_get_keyword((const char **) &str,
> &size);
>
> Is there any interest with the initial pg_strip_crlf? AFAICT all the rest
> of
> the code will ignore such caracters using isspace() so it wouldn't change
> anything.
>

I think reading multiline identifiers is a little bit easier, because I
don't need to check the ending \n and \r
When I read multiline identifiers, I cannot ignore white spaces.

> Dropping both pg_strip_crlf() would allow you to declare str as const
> rather
> than doing it in function calls. It would require to add const qualifiers
> in a
> few other places, but it seems like an improvement, as for instance right
> now
> filter_get_pattern is free to rewrite the str (because it's also calling
> pg_strip_crlf, but there's no guarantee that it doesn't do anything else).
>
> +/*
> + * filter_get_pattern - Read an object identifier pattern from the buffer
> + *
> + * Parses an object identifier pattern from the passed in buffer and sets
> + * objname to a string with object identifier pattern. Returns pointer to
> the
> + * first character after the pattern. Returns NULL on error.
> + */
> +static char *
> +filter_get_pattern(FilterStateData *fstate,
>
> nit: suggestion to reword the comment, maybe something like
>
> /*
> * filter_get_pattern - Identify an object identifier pattern
> *
> * Try to parse an object identifier pattern from the passed buffer. If
> one is
> * found, it sets objname to a string with the object identifier pattern
> and
> * returns a pointer to the first byte after the found pattern. Otherwise
> NULL
> * is returned.
> */
>
>
replaced

> +bool
> +filter_read_item(FilterStateData *fstate,
>
> Another suggestion for comment rewrite:
>
> /*-------------------
> * filter_read_item - Read command/type/pattern triplet from a filter file
> *
> * This will parse one filter item from the filter file, and while it is a
> * row based format a pattern may span more than one line due to how object
> * names can be constructed. The expected format of the filter file is:
> *
> * <command> <object_type> <pattern>
> *
> * command can be "include" or "exclude"
> * object_type can one of: "table", "schema", "foreign_data", "data",
> * "database", "function", "trigger" or "index"
> * pattern can be any possibly-quoted and possibly-qualified identifier.
> It
> * follows the same rules as other object include and exclude functions so
> it
> * can also use wildcards.
> *
> * Returns true when one filter item was successfully read and parsed.
> When
> * object name contains \n chars, then more than one line from input file
> can
> * be processed. Returns false when the filter file reaches EOF. In case
> of
> * error, the function will emit an appropriate error message before
> returning
> * false.
> */
>
>
replaced, thank you for the text

> Note also that your original comment said:
> + * In case of
> + * errors, the function wont return but will exit with an appropriate
> error
> + * message.
>
> But AFAICS that's not the case: it will indeed log an appropriate error
> message
> but will return false. I'm assuming that the comment was outdated as the
> calling code handles it just fine, so I just modified the comment.
>

yes

>
> filter.h:
>
> +#ifndef FILTER_H
> +#define FILTER_H
> +#include "c.h"
>
> It's definitely not ok to include .ch in frontend code. But AFAICS just
> removing it doesn't cause any problem. Note also that there should be an
> empty
> line after the #define FILTER_H per usual coding style.
>

fixed - it looks so it was some garbage

updated patch attached

big thanks for these comments and tips

Regards

Pavel

Attachment Content-Type Size
pg_dump-filter-20221103.patch text/x-patch 50.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-11-03 21:49:17 Re: Use array as object (src/fe_utils/parallel_slot.c)
Previous Message Daniel Gustafsson 2022-11-03 20:47:35 GUC for temporarily disabling event triggers