Quick Links

Re: proposal: possibility to read dumped table's name from file

From:	Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To:	Stephen Frost <sfrost(at)snowman(dot)net>
Cc:	Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Surafel Temesgen <surafel3000(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, vignesh C <vignesh21(at)gmail(dot)com>
Subject:	Re: proposal: possibility to read dumped table's name from file
Date:	2021-07-14 05:00:26
Message-ID:	CAFj8pRBjVcTeqD6P9ihc6uMySCzCpNegCOgeZ=gCk0PQVdzGyA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> You're right- no one followed up on that. Instead, one group continues
> to push for 'simple' and to just accept what's been proposed, while
> another group counters that we should be looking at the broader design
> question and work towards a solution which will work for us down the
> road, and not just right now.
>
> One thing remains clear- there's no consensus here.
>

I think there should be some misunderstanding about the target of this
patch, and I am afraid so there cannot be consensus, because the people are
speaking about two very different features. And it is not possible to push
it to one thing. It cannot work I am afraid.

1. The main target of this patch is to solve the problem with the too large
command line of pg_dump when there are a lot of dumped objects. You need to
call pg_dump only once to ensure dump in one transaction. And sometimes it
is not possible to use wild characters effectively, because the state of
objects is in different databases. Enhancing the length of the command line
is not secure, and there are other production issues. In this case you need
a very simple format - just because you want to use pg_dump in pipe. This
format should be line oriented - and usually it will contain just "dump
this table, dump second table". Nothing else. Nobody will read this format,
nobody will edit this format. Because the main platform for this format is
probably the UNIX shell, the format should be simple. I really don't see
any joy in generating JSON and parsing JSON later. These data will be
processed locally. This is one purpose designed format, and it is not
designed for holding configuration. For this purpose the complex format has
not any advantage. There is not a problem with parsing JSON or other
formats on the pg_dump side, but it is pretty hard to generate valid JSON
from bash script. For a unix shell we need the most possible simple format.
Theoretically this format (this file) can hold any pg_dump's option, but
for usual streaming processing the only filter's options will be there.
Originally this feature had the name "filter file". There are a lot of
examples of successful filter's file formats in the UNIX world, and I think
so nobody doubts about sense and usability. Probably there is a consensus
so filter's files are not config files.

The format of the filter file can look like "+d tablename" or "include data
tablename". If we find a consensus so the filter file is a good thing, then
the format design and implementation is easy work. Isn't problem to invent
comment lines.

2. Is true, so there is only a small step from filter's file to option's
file. I rewrote this patch in this direction. The advantage is universality
- it can support any options without necessity to modify related code.
Still this format is not difficult for producers, and it is simple for
parsing. Now, the format should be defined by command line format: "-t
tablename" or "--table tablename" or "table tablename". There can be issues
related to different parsers in shell and in implemented code, but it can
be solved. Isn't problem to introduce comment lines. The big advantage is
simplicity of usage, simplicity of implementation - more the implementation
is generic.

3. But the option's file is just a small step to config file. I can imagine
somebody wanting to store typical configuration (and usual options) for
psql, pg_dump, pg_restore, pgAdmin, ... somewhere. The config files are
very different creatures than filter's files. Although they can be
generated, usually are edited and can be very complex. There can be shared
parts for all applications, and specific sections for psql, and specific
sections for every database. The config files can be brutally complex. The
simple text format is not good for this purpose. And some people prefer
YAML, some people hate this format. Other people prefer XML or JSON or
anything else. Sometimes the complexity of config files is too big, and
people prefer startup scripting.

Although there is an intersection between filter's files and config files,
I see very big differences in usage. Filter's files are usually temporal
and generated and non shared. Config file's are persistent, usually
manually modified and can be shared. The requests are different, and should
be different too. I don't propose any configuration's file related
features, and my proposal doesn't block the introduction of configuration's
file in any format in future. I think these features are very different,
and should be implemented differently. The filter's file or option's file
will be a pretty ugly config file, and config's file will be a pretty
impractical filter's file.

So can we talk about implementation of filter's file or option's file? And
can we talk about implementation config's files in separate topics? Without
it, I am afraid so there is no possibility of finding an agreement and
moving forward.

Regards

Pavel

> Thanks,
>
> Stephen
>

In response to

Re: proposal: possibility to read dumped table's name from file at 2021-07-14 00:18:35 from Stephen Frost

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Ian Lawrence Barwick	2021-07-14 05:05:29	Re: [PATCH] psql: \dn+ to show size of each schema..
Previous Message	Peter Eisentraut	2021-07-14 04:36:48	Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?