From: | Noah Misch <noah(at)leadboat(dot)com> |
---|---|
To: | Jason Harvey <jason(at)reddit(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: pg_upgrade can result in early wraparound on databases with high transaction load |
Date: | 2019-06-15 18:37:59 |
Message-ID: | 20190615183759.GB239428@rfd.leadboat.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general pgsql-hackers |
On Tue, May 21, 2019 at 03:23:00PM -0700, Peter Geoghegan wrote:
> On Mon, May 20, 2019 at 3:10 AM Jason Harvey <jason(at)reddit(dot)com> wrote:
> > This week I upgraded one of my large(2.8TB), high-volume databases from 9 to 11. The upgrade itself went fine. About two days later, we unexpectedly hit transaction ID wraparound. What was perplexing about this was that the age of our oldest `datfrozenxid` was only 1.2 billion - far away from where I'd expect a wraparound. Curiously, the wraparound error referred to a mysterious database of `OID 0`:
> >
> > UPDATE ERROR: database is not accepting commands to avoid wraparound data loss in database with OID 0
That's bad.
> > We were able to recover after a few hours by greatly speeding up our vacuum on our largest table.
For what it's worth, a quicker workaround is to VACUUM FREEZE any database,
however small. That forces a vac_truncate_clog(), which recomputes the wrap
point from pg_database.datfrozenxid values. This demonstrates the workaround:
--- a/src/bin/pg_upgrade/test.sh
+++ b/src/bin/pg_upgrade/test.sh
@@ -248,7 +248,10 @@ case $testhost in
esac
pg_dumpall --no-sync -f "$temp_root"/dump2.sql || pg_dumpall2_status=$?
+pg_controldata "${PGDATA}"
+vacuumdb -F template1
pg_ctl -m fast stop
+pg_controldata "${PGDATA}"
if [ -n "$pg_dumpall2_status" ]; then
echo "pg_dumpall of post-upgrade database cluster failed"
> > In a followup investigation I uncovered the reason we hit the wraparound so early, and also the cause of the mysterious OID 0 message. When pg_upgrade executes, it calls pg_resetwal to set the next transaction ID. Within pg_resetwal is the following code: https://github.com/postgres/postgres/blob/6cd404b344f7e27f4d64555bb133f18a758fe851/src/bin/pg_resetwal/pg_resetwal.c#L440-L450
pg_upgrade should set oldestXID to the same value as the source cluster or set
it like vac_truncate_clog() would set it. Today's scheme is usually too
pessimistic, but it can be too optimistic if the source cluster was on the
bring of wrap. Thanks for the report.
From | Date | Subject | |
---|---|---|---|
Next Message | Peter Geoghegan | 2019-06-16 00:35:52 | Re: checkpoints taking much longer than expected |
Previous Message | Pavel Stehule | 2019-06-15 14:37:19 | Re: how to concat/concat_ws all fields without braces |
From | Date | Subject | |
---|---|---|---|
Next Message | Noah Misch | 2019-06-15 18:47:39 | Re: Do we expect tests to work with default_transaction_isolation=serializable |
Previous Message | Noah Misch | 2019-06-15 18:07:32 | Re: Dead encoding conversion functions |