Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: "Drouvot, Bertrand" <bdrouvot(at)amazon(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jason Harvey <jason(at)reddit(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, pgsql-hackers(at)postgresql(dot)org, "Tharakan, Robins" <tharar(at)amazon(dot)com>
Subject: Re: [UNVERIFIED SENDER] Re: pg_upgrade can result in early wraparound on databases with high transaction load
Date: 2021-07-27 02:39:04
Message-ID: 20210727023904.GA19774@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


This patch has been applied back to 9.6 and will appear in the next
minor release.

---------------------------------------------------------------------------

On Tue, May 18, 2021 at 01:26:38PM +0200, Drouvot, Bertrand wrote:
> Hi,
>
> On 5/4/21 10:17 AM, Drouvot, Bertrand wrote:
>
>
> Hi,
>
> On 4/24/21 3:00 AM, Andres Freund wrote:
>
> Hi,
>
> On 2021-04-23 19:28:27 -0500, Justin Pryzby wrote:
>
> This (combination of) thread(s) seems relevant.
>
> Subject: pg_upgrade failing for 200+ million Large Objects
> https://www.postgresql.org/message-id/flat/12601596dbbc4c01b86b4ac4d2bd4d48%40EX13D05UWC001.ant.amazon.com
> https://www.postgresql.org/message-id/flat/a9f9376f1c3343a6bb319dce294e20ac%40EX13D05UWC001.ant.amazon.com
> https://www.postgresql.org/message-id/flat/cc089cc3-fc43-9904-fdba-d830d8222145%40enterprisedb.com#3eec85391c6076a4913e96a86fece75e
>
> Huh. Thanks for digging these up.
>
>
>
> Allows the user to provide a constant via pg_upgrade command-line, that
> overrides the 2 billion constant in pg_resetxlog [1] thereby increasing the
> (window of) Transaction IDs available for pg_upgrade to complete.
>
> That seems the entirely the wrong approach to me, buying further into
> the broken idea of inventing random wrong values for oldestXid.
>
> We drive important things like the emergency xid limits off oldestXid. On
> databases with tables that are older than ~147million xids (i.e. not even
> affected by the default autovacuum_freeze_max_age) the current constant leads
> to setting the oldestXid to a value *in the future*/wrapped around. Any
> different different constant (or pg_upgrade parameter) will do that too in
> other scenarios.
>
> As far as I can tell there is precisely *no* correct behaviour here other than
> exactly copying the oldestXid limit from the source database.
>
>
> Please find attached a patch proposal doing so: it adds a new (- u)
> parameter to pg_resetwal that allows to specify the oldest unfrozen XID to
> set.
> Then this new parameter is being used in pg_upgrade to copy the source
> Latest checkpoint's oldestXID.
>
> Questions:
>
> □ Should we keep the old behavior in case -x is being used without -u?
> (The proposed patch does not set an arbitrary oldestXID anymore in case
> -x is used.)
> □ Also shouldn't we ensure that the xid provided with -x or -u is >=
> FirstNormalTransactionId (Currently the only check is that it is # 0)?
>
>
> Copy/pasting Andres feedback (Thanks Andres for this feedback) on those
> questions from another thread [1].
>
> > I was also wondering if:
> >
> > * We should keep the old behavior in case pg_resetwal -x is being used
> > without -u?
 (The proposed patch does not set an arbitrary oldestXID
> > anymore in 
case -x is used)
>
> Andres: I don't think we should. I don't see anything in the old behaviour
> worth
> maintaining.
>
> > * We should ensure that the xid provided with -x or -u is
> > >=
FirstNormalTransactionId (Currently the only check is that it is
> > # 0)?
>
> Andres: Applying TransactionIdIsNormal() seems like a good idea.
>
> => I am attaching a new version that makes use of TransactionIdIsNormal()
> checks.
>
> Andres: I think it's important to verify that the xid provided with -x is
> within a reasonable range of the oldest xid.
>
> => What do you mean by "a reasonable range"?
>
> Thanks
>
> Bertrand
>
> [1]: https://www.postgresql.org/message-id/
> 20210517185646.pwe4klaufwmdhe2a%40alap3.anarazel.de
>
>
>
>

> src/bin/pg_resetwal/pg_resetwal.c | 65 ++++++++++++++++++++++-----------------
> src/bin/pg_upgrade/controldata.c | 17 +++++++++-
> src/bin/pg_upgrade/pg_upgrade.c | 6 ++++
> src/bin/pg_upgrade/pg_upgrade.h | 1 +
> 4 files changed, 60 insertions(+), 29 deletions(-)
> diff --git a/src/bin/pg_resetwal/pg_resetwal.c b/src/bin/pg_resetwal/pg_resetwal.c
> index 805dafef07..5e864760ed 100644
> --- a/src/bin/pg_resetwal/pg_resetwal.c
> +++ b/src/bin/pg_resetwal/pg_resetwal.c
> @@ -65,6 +65,7 @@ static bool guessed = false; /* T if we had to guess at any values */
> static const char *progname;
> static uint32 set_xid_epoch = (uint32) -1;
> static TransactionId set_xid = 0;
> +static TransactionId set_oldest_unfrozen_xid = 0;
> static TransactionId set_oldest_commit_ts_xid = 0;
> static TransactionId set_newest_commit_ts_xid = 0;
> static Oid set_oid = 0;
> @@ -102,6 +103,7 @@ main(int argc, char *argv[])
> {"next-oid", required_argument, NULL, 'o'},
> {"multixact-offset", required_argument, NULL, 'O'},
> {"next-transaction-id", required_argument, NULL, 'x'},
> + {"oldest-transaction-id", required_argument, NULL, 'u'},
> {"wal-segsize", required_argument, NULL, 1},
> {NULL, 0, NULL, 0}
> };
> @@ -135,7 +137,7 @@ main(int argc, char *argv[])
> }
>
>
> - while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:", long_options, NULL)) != -1)
> + while ((c = getopt_long(argc, argv, "c:D:e:fl:m:no:O:x:u:", long_options, NULL)) != -1)
> {
> switch (c)
> {
> @@ -176,9 +178,24 @@ main(int argc, char *argv[])
> fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
> exit(1);
> }
> - if (set_xid == 0)
> + if (!TransactionIdIsNormal(set_xid))
> {
> - pg_log_error("transaction ID (-x) must not be 0");
> + pg_log_error("transaction ID (-x) must be greater or equal to %u", FirstNormalTransactionId);
> + exit(1);
> + }
> + break;
> +
> + case 'u':
> + set_oldest_unfrozen_xid = strtoul(optarg, &endptr, 0);
> + if (endptr == optarg || *endptr != '\0')
> + {
> + pg_log_error("invalid argument for option %s", "-u");
> + fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
> + exit(1);
> + }
> + if (!TransactionIdIsNormal(set_oldest_unfrozen_xid))
> + {
> + pg_log_error("oldest unfrozen transaction ID (-u) must be greater or equal to %u", FirstNormalTransactionId);
> exit(1);
> }
> break;
> @@ -429,21 +446,12 @@ main(int argc, char *argv[])
> XidFromFullTransactionId(ControlFile.checkPointCopy.nextXid));
>
> if (set_xid != 0)
> - {
> ControlFile.checkPointCopy.nextXid =
> FullTransactionIdFromEpochAndXid(EpochFromFullTransactionId(ControlFile.checkPointCopy.nextXid),
> set_xid);
>
> - /*
> - * For the moment, just set oldestXid to a value that will force
> - * immediate autovacuum-for-wraparound. It's not clear whether adding
> - * user control of this is useful, so let's just do something that's
> - * reasonably safe. The magic constant here corresponds to the
> - * maximum allowed value of autovacuum_freeze_max_age.
> - */
> - ControlFile.checkPointCopy.oldestXid = set_xid - 2000000000;
> - if (ControlFile.checkPointCopy.oldestXid < FirstNormalTransactionId)
> - ControlFile.checkPointCopy.oldestXid += FirstNormalTransactionId;
> + if (set_oldest_unfrozen_xid != 0) {
> + ControlFile.checkPointCopy.oldestXid = set_oldest_unfrozen_xid;
> ControlFile.checkPointCopy.oldestXidDB = InvalidOid;
> }
>
> @@ -1209,20 +1217,21 @@ usage(void)
> printf(_("Usage:\n %s [OPTION]... DATADIR\n\n"), progname);
> printf(_("Options:\n"));
> printf(_(" -c, --commit-timestamp-ids=XID,XID\n"
> - " set oldest and newest transactions bearing\n"
> - " commit timestamp (zero means no change)\n"));
> - printf(_(" [-D, --pgdata=]DATADIR data directory\n"));
> - printf(_(" -e, --epoch=XIDEPOCH set next transaction ID epoch\n"));
> - printf(_(" -f, --force force update to be done\n"));
> - printf(_(" -l, --next-wal-file=WALFILE set minimum starting location for new WAL\n"));
> - printf(_(" -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID\n"));
> - printf(_(" -n, --dry-run no update, just show what would be done\n"));
> - printf(_(" -o, --next-oid=OID set next OID\n"));
> - printf(_(" -O, --multixact-offset=OFFSET set next multitransaction offset\n"));
> - printf(_(" -V, --version output version information, then exit\n"));
> - printf(_(" -x, --next-transaction-id=XID set next transaction ID\n"));
> - printf(_(" --wal-segsize=SIZE size of WAL segments, in megabytes\n"));
> - printf(_(" -?, --help show this help, then exit\n"));
> + " set oldest and newest transactions bearing\n"
> + " commit timestamp (zero means no change)\n"));
> + printf(_(" [-D, --pgdata=]DATADIR data directory\n"));
> + printf(_(" -e, --epoch=XIDEPOCH set next transaction ID epoch\n"));
> + printf(_(" -f, --force force update to be done\n"));
> + printf(_(" -l, --next-wal-file=WALFILE set minimum starting location for new WAL\n"));
> + printf(_(" -m, --multixact-ids=MXID,MXID set next and oldest multitransaction ID\n"));
> + printf(_(" -n, --dry-run no update, just show what would be done\n"));
> + printf(_(" -o, --next-oid=OID set next OID\n"));
> + printf(_(" -O, --multixact-offset=OFFSET set next multitransaction offset\n"));
> + printf(_(" -u, --oldest-transaction-id=XID set oldest unfrozen transaction ID\n"));
> + printf(_(" -V, --version output version information, then exit\n"));
> + printf(_(" -x, --next-transaction-id=XID set next transaction ID\n"));
> + printf(_(" --wal-segsize=SIZE size of WAL segments, in megabytes\n"));
> + printf(_(" -?, --help show this help, then exit\n"));
> printf(_("\nReport bugs to <%s>.\n"), PACKAGE_BUGREPORT);
> printf(_("%s home page: <%s>\n"), PACKAGE_NAME, PACKAGE_URL);
> }
> diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c
> index 4f647cdf33..a4b6375403 100644
> --- a/src/bin/pg_upgrade/controldata.c
> +++ b/src/bin/pg_upgrade/controldata.c
> @@ -44,6 +44,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
> bool got_oid = false;
> bool got_multi = false;
> bool got_oldestmulti = false;
> + bool got_oldestxid = false;
> bool got_mxoff = false;
> bool got_nextxlogfile = false;
> bool got_float8_pass_by_value = false;
> @@ -312,6 +313,17 @@ get_control_data(ClusterInfo *cluster, bool live_check)
> cluster->controldata.chkpnt_nxtmulti = str2uint(p);
> got_multi = true;
> }
> + else if ((p = strstr(bufin, "Latest checkpoint's oldestXID:")) != NULL)
> + {
> + p = strchr(p, ':');
> +
> + if (p == NULL || strlen(p) <= 1)
> + pg_fatal("%d: controldata retrieval problem\n", __LINE__);
> +
> + p++; /* remove ':' char */
> + cluster->controldata.chkpnt_oldstxid = str2uint(p);
> + got_oldestxid = true;
> + }
> else if ((p = strstr(bufin, "Latest checkpoint's oldestMultiXid:")) != NULL)
> {
> p = strchr(p, ':');
> @@ -544,7 +556,7 @@ get_control_data(ClusterInfo *cluster, bool live_check)
>
> /* verify that we got all the mandatory pg_control data */
> if (!got_xid || !got_oid ||
> - !got_multi ||
> + !got_multi || !got_oldestxid ||
> (!got_oldestmulti &&
> cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER) ||
> !got_mxoff || (!live_check && !got_nextxlogfile) ||
> @@ -575,6 +587,9 @@ get_control_data(ClusterInfo *cluster, bool live_check)
> cluster->controldata.cat_ver >= MULTIXACT_FORMATCHANGE_CAT_VER)
> pg_log(PG_REPORT, " latest checkpoint oldest MultiXactId\n");
>
> + if (!got_oldestxid)
> + pg_log(PG_REPORT, " latest checkpoint oldestXID\n");
> +
> if (!got_mxoff)
> pg_log(PG_REPORT, " latest checkpoint next MultiXactOffset\n");
>
> diff --git a/src/bin/pg_upgrade/pg_upgrade.c b/src/bin/pg_upgrade/pg_upgrade.c
> index e23b8ca88d..950ff24980 100644
> --- a/src/bin/pg_upgrade/pg_upgrade.c
> +++ b/src/bin/pg_upgrade/pg_upgrade.c
> @@ -473,6 +473,12 @@ copy_xact_xlog_xid(void)
> "\"%s/pg_resetwal\" -f -x %u \"%s\"",
> new_cluster.bindir, old_cluster.controldata.chkpnt_nxtxid,
> new_cluster.pgdata);
> + check_ok();
> + prep_status("Setting oldest XID for new cluster");
> + exec_prog(UTILITY_LOG_FILE, NULL, true, true,
> + "\"%s/pg_resetwal\" -f -u %u \"%s\"",
> + new_cluster.bindir, old_cluster.controldata.chkpnt_oldstxid,
> + new_cluster.pgdata);
> exec_prog(UTILITY_LOG_FILE, NULL, true, true,
> "\"%s/pg_resetwal\" -f -e %u \"%s\"",
> new_cluster.bindir, old_cluster.controldata.chkpnt_nxtepoch,
> diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
> index a5f71c5294..dd0204902c 100644
> --- a/src/bin/pg_upgrade/pg_upgrade.h
> +++ b/src/bin/pg_upgrade/pg_upgrade.h
> @@ -207,6 +207,7 @@ typedef struct
> uint32 chkpnt_nxtmulti;
> uint32 chkpnt_nxtmxoff;
> uint32 chkpnt_oldstMulti;
> + uint32 chkpnt_oldstxid;
> uint32 align;
> uint32 blocksz;
> uint32 largesz;

--
Bruce Momjian <bruce(at)momjian(dot)us> https://momjian.us
EDB https://enterprisedb.com

If only the physical world exists, free will is an illusion.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Vijaykumar Jain 2021-07-27 06:19:25 Re: pg-audit extension
Previous Message Ben Madin 2021-07-27 01:43:45 Re: pg_restore (fromuser -> touser)

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2021-07-27 02:39:17 Re: visibility map corruption
Previous Message Amit Langote 2021-07-27 02:32:36 Re: Allow batched insert during cross-partition updates