Re: use pg_get_functiondef() in pg_dump

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: use pg_get_functiondef() in pg_dump
Date: 2020-08-18 17:03:03
Message-ID: 1260763.1597770183@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephen Frost <sfrost(at)snowman(dot)net> writes:
> So.. the code that's in pg_dump today works to go from "whatever the
> connected server's version is" to "whatever the version is of the
> pg_dump command itself". If we had the code in libpgcommon, and
> functions in the backend to get at it along with psql having that code,
> you could then, using the code we have today, go from a bunch of
> 'source' versions to 'target' version of either the version of the psql
> command, or that of the server.

At this point, I think I need a high-power telescope even to see the
goalposts :-(

If we actually want to do something like this, we need a plan not just
some handwaving. Let's start by enumerating the concerns that would
have to be solved. I can think of:

* Execution context. Stephen seems to be envisioning code that could be
compiled into the backend not just the frontend, but is that really worth
the trouble? Could we share such code across FE/BE at all (it'd certainly
be a far more ambitious exercise in common code than we've done to date)?
What's the backend version actually doing, issuing queries over SPI?
(I suppose if you were rigid about that, it could offer a guarantee
that the results match your snapshot, which is pretty attractive.)

* Global vs. per-object activity. pg_dump likes to query the entire state
of the database to start with, and then follow up by grabbing additional
details about objects it's going to dump. That's not an operating mode
that most other clients would want, but if for no other reason than
performance, I don't think we can walk away from it for pg_dump ---
indeed, I think pg_dump probably needs to be fixed to do less per-object
querying, not more. Meanwhile applications such as psql \d would only
want to investigate one object at a time. What design can we create that
will handle that? If there is persistent state involved, what in the
world does that mean for the case of a backend-side library?

* Context in which the output is valid. Target server version was already
mentioned, but a quick examination of pg_dump output scripts will remind
you that there's a bunch more assumptions:

SET statement_timeout = 0;
SET lock_timeout = 0;
SET idle_in_transaction_session_timeout = 0;
SET client_encoding = 'en_US.utf8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET check_function_bodies = false;
SET xmloption = content;
SET client_min_messages = warning;
SET row_security = off;

not to mention special hackery for object ownership and tablespaces.
Some of these things probably don't matter for other use-cases, but
others definitely do. In particular, I really doubt that psql and
other clients would find it acceptable to force search_path to a
particular thing. Which brings us to

* Security. How robust do the output commands need to be, and
what will we have to do that pg_dump doesn't need to?

* No doubt there are some other topics I didn't think of.

This certainly would be attractive if we had it, but the task
seems dauntingly large. It's not going to happen without some
fairly serious investment of time.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2020-08-18 17:17:03 Re: doc review for v13
Previous Message Fujii Masao 2020-08-18 16:25:48 Re: Nicer error when connecting to standby with hot_standby=off