Re: proposal: possibility to read dumped table's name from file

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: proposal: possibility to read dumped table's name from file
Date: 2020-07-05 20:08:09
Message-ID: CAFj8pRCsZuKRRdqZoYYo_wW-YjpWGA_ie9nhwJRd9E+GmsShrQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

st 1. 7. 2020 v 23:24 odesílatel Justin Pryzby <pryzby(at)telsasoft(dot)com>
napsal:

> On Thu, Jun 11, 2020 at 09:36:18AM +0200, Pavel Stehule wrote:
> > st 10. 6. 2020 v 0:30 odesílatel Justin Pryzby <pryzby(at)telsasoft(dot)com>
> napsal:
> > > > + /* ignore empty rows */
> > > > + if (*line != '\0')
> > >
> > > Maybe: if line=='\0': continue
> > > We should also support comments.
>
> Comment support is still missing but easily added :)
>
> I tried this patch and it works for my purposes.
>
> Also, your getline is dynamically re-allocating lines of arbitrary length.
> Possibly that's not needed. We'll typically read "+t schema.relname",
> which is
> 132 chars. Maybe it's sufficient to do
> char buf[1024];
> fgets(buf);
> if strchr(buf, '\n') == NULL: error();
> ret = pstrdup(buf);
>

63 bytes is max effective identifier size, but it is not max size of
identifiers. It is very probably so buff with 1024 bytes will be enough for
all, but I do not want to increase any new magic limit. More when dynamic
implementation is not too hard.

Table name can be very long - sometimes the data names (table names) can be
stored in external storages with full length and should not be practical to
require truncating in filter file.

For this case it is very effective, because a resized (increased) buffer is
used for following rows, so realloc should not be often. So when I have to
choose between two implementations with similar complexity, I prefer more
dynamic code without hardcoded limits. This dynamic hasn't any overhead.

> In any case, you could have getline return a char* and (rather than
> following
> GNU) no need to take char**, int* parameters to conflate inputs and
> outputs.
>

no, it has a special benefit. It eliminates the short malloc/free cycle.
When some lines are longer, then the buffer is increased (and limits), and
for other rows with same or less size is not necessary realloc.

> I realized that --filter has an advantage over the previous implementation
> (with multiple --exclude-* and --include-*) in that it's possible to use
> stdin
> for includes *and* excludes.
>

yes, it looks like better choose

> By chance, I had the opportunity yesterday to re-use with rsync a regex
> that
> I'd previously been using with pg_dump and grep. What this patch calls
> "--filter" in rsync is called "--filter-from". rsync's --filter-from
> rejects
> filters of length longer than max filename, so I had to split it up into
> multiple lines instead of using regex alternation ("|"). This option is a
> close parallel in pg_dump.
>

we can talk about option name - maybe "--filter-from" is better than just
"--filter"

Regards

Pavel

>
> --
> Justin
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-07-05 20:25:22 Re: Ideas about a better API for postgres_fdw remote estimates
Previous Message Pavel Stehule 2020-07-05 19:50:34 Re: proposal: possibility to read dumped table's name from file