Re: proposal: possibility to read dumped table's name from file

From: Julien Rouhaud <rjuju123(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Andrew Dunstan <andrew(at)dunslane(dot)net>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2022-11-03 04:09:36
Message-ID: 20221103040936.3otdcg53pabro3cu@jrouhaud
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Wed, Oct 26, 2022 at 06:26:26AM +0200, Pavel Stehule wrote:
>
> út 18. 10. 2022 v 11:33 odesílatel Julien Rouhaud <rjuju123(at)gmail(dot)com>
> napsal:
>
> >
> > I'm wondering if psql's parse_identifier() could be exported and reused
> > here
> > rather than creating yet another version.
> >
>
> I looked there, and I don't think this parser is usable for this purpose.
> It is very sensitive on white spaces, and doesn't support multi-lines. It
> is designed for support readline tab complete, it is designed for
> simplicity not for correctness.

Ah, sorry I should have checked more thoroughly. I guess it's ok to have
another identifier parser for the include file then, as this new one wouldn't
really fit the tab-completion use case.

> > also, is there any reason why this function doesn't call exit_nicely in
> > case of
> > error rather than letting each caller do it without any other cleanup?
> >
>
> It is commented few lines up
>
> /*
> * Following routines are called from pg_dump, pg_dumpall and pg_restore.
> * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
> * is different from implementation of this rutine in pg_dumpall. So instead
> * direct calling exit_nicely we have to return some error flag (in this
> * case NULL), and exit_nicelly will be executed from caller's routine.
> */

Oh right, I totally missed it sorry about that!

About the new version, I didn't find any problem with the feature itself so
it's a good thing!

I still have a few comments about the patch. First, about the behavior:

- is that ok to have just "data" pattern instead of "table_data" or something
like that, since it's supposed to match --exclude-table-data option?

- the error message are sometimes not super helpful. For instance:

$ echo "include data t1" | pg_dump --filter -
pg_dump: error: invalid format of filter on line 1: include filter is not allowed for this type of object

It would be nice if the error message mentioned "data" rather than a generic
"this type of object". Also, maybe we should quote "include" to outline that
we found this keyword?

About the patch itself:
filter.c:

+#include "postgres_fe.h"
+
+#include "filter.h"
+
+#include "common/logging.h"

the filter.h inclusion should be done with the rest of the includes, in
alphabetical order.

+#define is_keyword_str(cstr, str, bytes) \
+ ((strlen(cstr) == bytes) && (pg_strncasecmp(cstr, str, bytes) == 0))

nit: our guidline is to protect macro arguments with parenthesis. Some
arguments can be evaluated multiple times but I don't think it's worth adding a
comment for that.

+ * Unfortunatelly, implementation of exit_nicely in pg_dump and pg_restore
+ * is different from implementation of this rutine in pg_dumpall. So instead
+ * direct calling exit_nicely we have to return some error flag (in this

typos: s/Unfortunatelly/Unfortunately/ and s/rutine/routine/
Also, it would probably be better to say "instead of directly calling..."

+static const char *
+filter_object_type_name(FilterObjectType fot)
+{
+ switch (fot)
+ {
+ case FILTER_OBJECT_TYPE_NONE:
+ return "comment or empty line";
+[...]
+ }
+
+ return "unknown object type";
+}

I'm wondering if we should add a pg_unreachable() there, some compilers might
complain otherwise. See CreateDestReceiver() for instance for similar pattern.

+ * Emit error message "invalid format of filter file ..."
+ *
+ * This is mostly a convenience routine to avoid duplicating file closing code
+ * in multiple callsites.
+ */
+void
+log_invalid_filter_format(FilterStateData *fstate, char *message)

nit: invalid format *in* filter file...?

+void
+log_unsupported_filter_object_type(FilterStateData *fstate,
+ const char *appname,
+ FilterObjectType fot)
+{
+ PQExpBuffer str = createPQExpBuffer();
+
+ printfPQExpBuffer(str,
+ "The application \"%s\" doesn't support filter for object type \"%s\".",

nit: there shouldn't be uppercase in error messages, especially since this will
be appended to another message by log_invalid_filter_format. I would just just
drop "The application" entirely for brevity.

+/*
+ * Release allocated resources for filter
+ */
+void
+filter_free(FilterStateData *fstate)

nit: Release allocated resources for *the given* filter?

+ * Search for keywords (limited to ascii alphabetic characters) in
+ * the passed in line buffer. Returns NULL, when the buffer is empty or first
+ * char is not alpha. The length of the found keyword is returned in the size
+ * parameter.
+ */
+static const char *
+filter_get_keyword(const char **line, int *size)
+{
+ [...]
+ if (isascii(*ptr) && isalpha(*ptr))
+ {
+ result = ptr++;
+
+ while (isascii(*ptr) && (isalpha(*ptr) || *ptr == '_'))
+ ptr++;

Is there any reason to test isascii()? isalpha() should already cover that and
should be cheaper to test anyway.

Also nit: "Returns NULL when the buffer..." (unnecessary comma), and the '_'
char is also allowed.

+filter_read_item(FilterStateData *fstate,
+ bool *is_include,
+ char **objname,
+ FilterObjectType *objtype)
+{
+ Assert(!fstate->is_error);
+
+ if (pg_get_line_buf(fstate->fp, &fstate->linebuff))
+ {
+ char *str = fstate->linebuff.data;
+ const char *keyword;
+ int size;
+
+ fstate->lineno++;
+
+ (void) pg_strip_crlf(str);
+
+ /* Skip initial white spaces */
+ while (isspace(*str))
+ str++;
+[...]
+ keyword = filter_get_keyword((const char **) &str, &size);

Is there any interest with the initial pg_strip_crlf? AFAICT all the rest of
the code will ignore such caracters using isspace() so it wouldn't change
anything.

Dropping both pg_strip_crlf() would allow you to declare str as const rather
than doing it in function calls. It would require to add const qualifiers in a
few other places, but it seems like an improvement, as for instance right now
filter_get_pattern is free to rewrite the str (because it's also calling
pg_strip_crlf, but there's no guarantee that it doesn't do anything else).

+/*
+ * filter_get_pattern - Read an object identifier pattern from the buffer
+ *
+ * Parses an object identifier pattern from the passed in buffer and sets
+ * objname to a string with object identifier pattern. Returns pointer to the
+ * first character after the pattern. Returns NULL on error.
+ */
+static char *
+filter_get_pattern(FilterStateData *fstate,

nit: suggestion to reword the comment, maybe something like

/*
* filter_get_pattern - Identify an object identifier pattern
*
* Try to parse an object identifier pattern from the passed buffer. If one is
* found, it sets objname to a string with the object identifier pattern and
* returns a pointer to the first byte after the found pattern. Otherwise NULL
* is returned.
*/

+bool
+filter_read_item(FilterStateData *fstate,

Another suggestion for comment rewrite:

/*-------------------
* filter_read_item - Read command/type/pattern triplet from a filter file
*
* This will parse one filter item from the filter file, and while it is a
* row based format a pattern may span more than one line due to how object
* names can be constructed. The expected format of the filter file is:
*
* <command> <object_type> <pattern>
*
* command can be "include" or "exclude"
* object_type can one of: "table", "schema", "foreign_data", "data",
* "database", "function", "trigger" or "index"
* pattern can be any possibly-quoted and possibly-qualified identifier. It
* follows the same rules as other object include and exclude functions so it
* can also use wildcards.
*
* Returns true when one filter item was successfully read and parsed. When
* object name contains \n chars, then more than one line from input file can
* be processed. Returns false when the filter file reaches EOF. In case of
* error, the function will emit an appropriate error message before returning
* false.
*/

Note also that your original comment said:
+ * In case of
+ * errors, the function wont return but will exit with an appropriate error
+ * message.

But AFAICS that's not the case: it will indeed log an appropriate error message
but will return false. I'm assuming that the comment was outdated as the
calling code handles it just fine, so I just modified the comment.

filter.h:

+#ifndef FILTER_H
+#define FILTER_H
+#include "c.h"

It's definitely not ok to include .ch in frontend code. But AFAICS just
removing it doesn't cause any problem. Note also that there should be an empty
line after the #define FILTER_H per usual coding style.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2022-11-03 04:11:46 Re: [BUG] Logical replica crash if there was an error in a function.
Previous Message Ian Lawrence Barwick 2022-11-03 02:33:57 Re: Commit fest 2022-11