Re: [GENERAL] pg_upgrade problem

From: daveg <daveg(at)sonic(dot)net>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: hubert depesz lubaczewski <depesz(at)depesz(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] pg_upgrade problem
Date: 2011-09-06 00:16:50
Message-ID: 20110906001650.GI24583@sonic.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers


Sorry I missed your reply, catching up now.

On Wed, Aug 31, 2011 at 09:56:59PM -0400, Bruce Momjian wrote:
> daveg wrote:
> > On Mon, Aug 29, 2011 at 07:49:24PM +0200, hubert depesz lubaczewski wrote:
> > > On Mon, Aug 29, 2011 at 06:54:41PM +0200, hubert depesz lubaczewski wrote:
> > > vacuumdb: vacuuming of database "etsy_v2" failed: ERROR: could not access status of transaction 3429738606
> > > DETAIL: Could not open file "pg_clog/0CC6": No such file or directory.
> > >
> > > Interestingly.
> > >
> > > In old dir there is pg_clog directory with files:
> > > 0AC0 .. 0DAF (including 0CC6, size 262144)
> > > but new pg_clog has only:
> > > 0D2F .. 0DB0
> > >
> > > File content - nearly all files that exist in both places are the same, with exception of 2 newest ones in new datadir:
> > > 3c5122f3e80851735c19522065a2d12a 0DAF
> > > 8651fc2b9fa3d27cfb5b496165cead68 0DB0
> > >
> > > 0DB0 doesn't exist in old, and 0DAF has different md5sum: 7d48996c762d6a10f8eda88ae766c5dd
...
> > I had this same thing happen this Saturday just past and my client had to
> > restore the whole 2+ TB instance from the previous days pg_dumps.
...
> > After running pg_upgrade apparently successfully and analyzeing all the

Update: reviewing the logs I see some of the analyzes hit the "could not
access status of transaction" error too.

> > tables we restarted the production workload and started getting errors:
> >
> > 2011-08-27 04:18:34.015 12337 c06 postgres ERROR: could not access status of transaction 2923961093
> > 2011-08-27 04:18:34.015 12337 c06 postgres DETAIL: Could not open file "pg_clog/0AE4": No such file or directory.
> > 2011-08-27 04:18:34.015 12337 c06 postgres STATEMENT: analyze public.b_pxx;
> >
> > On examination the pg_clog directory contained on two files timestamped
> > after the startup of the new cluster with 9.0.4. Other hosts that upgraded
> > successfully had numerous files in pg_clog dating back a few days. So it
> > appears that all the clog files went missing during the upgrade somehow.
> > a
> > This happened upgrading from 8.4.7 to 9.0.4, with a brief session in between
> > at 8.4.8. We have upgraded several hosts to 9.0.4 successfully previously.
>
> I have posted this fix to the hackers email list, but I found it only
> affected old 8.3 servers, not old 8.4.X, so I am confused by your bug
> report.
>
> I have tested 8.4.X to 9.0.4 and found pg_upgrade preserves toast
> relfrozenxids properly in that case.
>
> Can you tell me what table is showing this error? Does it happen during
> vacuum? Can you run a vacuum verbose to see what it is throwing the
> error on? Thanks.

This was upgrading from 8.4.8 to 9.0.4. I don't have the running cluster
anymore, but I do have tar.gz archives of it and could probably find
2TB free somewhere to restore it to if there is something useful to extract.

However, I don't think this was toast related. Most of our rows are short and
have only int, float, and short text columns. These errors hit over 60
different tables mostly during the analyzes we ran immediately after the
upgrade. It also hit during select, insert and delete statements. We did not
run the db more than a few minutes as the damage was so extensive.

As far as I can tell pg_upgrade never copied any pg_clog files from the
old cluster to the new cluster. I wish I had detected that before running
the remove_old_cluster.sh script.

-dg

--
David Gould daveg(at)sonic(dot)net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Bruce Momjian 2011-09-06 00:19:21 Re: [GENERAL] pg_upgrade problem
Previous Message Tomas Vondra 2011-09-06 00:15:34 Re: Query runs in 335ms; function in 100,239ms : date problem?

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-09-06 00:19:21 Re: [GENERAL] pg_upgrade problem
Previous Message David Fetter 2011-09-06 00:05:10 Re: Couple document fixes