From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
Cc: | Mahendra Singh Thalor <mahi6run(at)gmail(dot)com>, Álvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, jian he <jian(dot)universality(at)gmail(dot)com>, Srinath Reddy <srinath2133(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Re: Non-text mode for pg_dumpall |
Date: | 2025-07-22 00:53:39 |
Message-ID: | 20250722005339.ca.nmisch@google.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Jul 21, 2025 at 04:41:03PM -0400, Andrew Dunstan wrote:
> On 2025-07-17 Th 6:18 AM, Mahendra Singh Thalor wrote
> > > > > > > --- a/src/bin/pg_dump/pg_restore.c
> > > > > > > +++ b/src/bin/pg_dump/pg_restore.c
> > > > > > > +/*
> > > > > > > + * read_one_statement
> > > > > > > + *
> > > > > > > + * This will start reading from passed file pointer using fgetc and read till
> > > > > > > + * semicolon(sql statement terminator for global.dat file)
> > > > > > > + *
> > > > > > > + * EOF is returned if end-of-file input is seen; time to shut down.
> > > > > > What makes it okay to use this particular subset of SQL lexing?
> > > > > To support complex syntax, we used this code from another file.
> > > > I'm hearing that you copied this code from somewhere. Running
> > > > "git grep 'time to shut down'" suggests you copied it from
> > > > InteractiveBackend(). Is that right? I do see other similarities between
> > > > read_one_statement() and InteractiveBackend().
> > > >
> > > > Copying InteractiveBackend() provides negligible assurance that this is the
> > > > right subset of SQL lexing. Only single-user mode uses InteractiveBackend().
> > > > Single-user mode survives mostly as a last resort for recovering from having
> > > > reached xidStopLimit, is rarely used, and only superusers write queries to it.
> > > Yes, we copied this from InteractiveBackend to read statements from
> > > global.dat file.
>
> Maybe we should ensure that identifiers with CR or LF are turned into
> Unicode quoted identifiers, so each SQL statement would always only occupy
> one line.
Interesting. That might work.
> Or just reject role and tablespace names with CR or LF altogether,
> just as we do for database names.
There are other ways to get multi-line statements. Non-exhaustive list:
- pg_db_role_setting.setconfig
- pg_shdescription.description
- pg_shseclabel.label
- pg_tablespace.spcoptions (if we add a text option in the future)
I think this decision about lexing also ties to other unfinished open item
work of aligning "pg_dumpall -Fd;pg_restore [options]" behavior with "pg_dump
-Fd;pg_restore [options]". "pg_restore --no-privileges" should not restore
pg_tablespace.spcacl, and "pg_restore --no-comments" should not emit COMMENT
statements.
I suspect this is going to end with a structured dump like we use on the
pg_dump (per-database) side. It's not an accident that v17 pg_restore doesn't
lex text files to do its job. pg_dumpall deals with a more-limited set of
statements than pg_dump deals with, but they're not _that much_ more limited.
I won't veto a lexing-based approach if it gets the behaviors right, but I
don't have high hopes for it getting the behaviors right and staying that way.
(I almost said "pg_restore --no-owner" should not restore
pg_tablespace.spcowner, but v17 "pg_dumpall --no-owner" does restore it. One
could argue for or against aligning $SUBJECT behavior w/ v17's mistake there.)
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-07-22 00:55:41 | Re: Improve error reporting in 027_stream_regress test |
Previous Message | Corey Huinker | 2025-07-22 00:16:12 | Re: teach pg_upgrade to handle in-place tablespaces |