Re: Errors on missing pg_subtrans/ files with 9.3

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: J Smith <dark(dot)panda+lists(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Errors on missing pg_subtrans/ files with 9.3
Date: 2013-11-15 20:21:57
Message-ID: CA+TgmoYX-gRKoVjD42tpk5PfYmoHeey8qRoFUAjJ=EC-jPUXuA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 13, 2013 at 12:29 PM, J Smith <dark(dot)panda+lists(at)gmail(dot)com> wrote:
> Looks like we got another set of errors overnight. Here's the log file
> from the errors. (Log file scrubbed slightly to remove private data,
> but still representative of the problem I believe.)
>
> Nov 13 05:34:34 dev postgres[6084]: [4-1] user=dev,db=dev ERROR:
> could not access status of transaction 6337381
> Nov 13 05:34:34 dev postgres[6084]: [4-2] user=dev,db=dev DETAIL:
> Could not open file "pg_subtrans/0060": No such file or directory.
> Nov 13 05:34:34 dev postgres[6084]: [4-3] user=dev,db=dev CONTEXT:
> SQL statement "SELECT 1 FROM ONLY "typhon"."collection_batches" x
> WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
> Nov 13 05:34:34 dev postgres[6084]: [4-4] user=dev,db=dev STATEMENT:
> update listings set deactivated_at=$1 where id=$2 and lock_version=$3
> Nov 13 05:34:34 dev postgres[6076]: [4-1] user=dev,db=dev ERROR:
> could not access status of transaction 6337381
> Nov 13 05:34:34 dev postgres[6076]: [4-2] user=dev,db=dev DETAIL:
> Could not open file "pg_subtrans/0060": No such file or directory.
> Nov 13 05:34:34 dev postgres[6076]: [4-3] user=dev,db=dev CONTEXT:
> SQL statement "SELECT 1 FROM ONLY "typhon"."collection_batches" x
> WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
> Nov 13 05:34:34 dev postgres[6076]: [4-4] user=dev,db=dev STATEMENT:
> update listings set deactivated_at=$1 where id=$2 and lock_version=$3
> Nov 13 05:34:34 dev postgres[6087]: [4-1] user=dev,db=dev ERROR:
> could not access status of transaction 6337381
> Nov 13 05:34:34 dev postgres[6087]: [4-2] user=dev,db=dev DETAIL:
> Could not open file "pg_subtrans/0060": No such file or directory.
> Nov 13 05:34:34 dev postgres[6087]: [4-3] user=dev,db=dev CONTEXT:
> SQL statement "SELECT 1 FROM ONLY "typhon"."collection_batches" x
> WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
> Nov 13 05:34:34 dev postgres[6087]: [4-4] user=dev,db=dev STATEMENT:
> update listings set deactivated_at=$1 where id=$2 and lock_version=$3
> Nov 13 05:34:34 dev postgres[6086]: [4-1] user=dev,db=dev ERROR:
> could not access status of transaction 6337381
> Nov 13 05:34:34 dev postgres[6086]: [4-2] user=dev,db=dev DETAIL:
> Could not open file "pg_subtrans/0060": No such file or directory.
> Nov 13 05:34:34 dev postgres[6086]: [4-3] user=dev,db=dev CONTEXT:
> SQL statement "SELECT 1 FROM ONLY "typhon"."collection_batches" x
> WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
> Nov 13 05:34:34 dev postgres[6086]: [4-4] user=dev,db=dev STATEMENT:
> update listings set deactivated_at=$1 where id=$2 and lock_version=$3
> Nov 13 05:34:34 dev postgres[6088]: [4-1] user=dev,db=dev ERROR:
> could not access status of transaction 6337381
> Nov 13 05:34:34 dev postgres[6088]: [4-2] user=dev,db=dev DETAIL:
> Could not open file "pg_subtrans/0060": No such file or directory.
> Nov 13 05:34:34 dev postgres[6088]: [4-3] user=dev,db=dev CONTEXT:
> SQL statement "SELECT 1 FROM ONLY "typhon"."collection_batches" x
> WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
> Nov 13 05:34:34 dev postgres[6088]: [4-4] user=dev,db=dev STATEMENT:
> update listings set deactivated_at=$1 where id=$2 and lock_version=$3
> Nov 13 05:34:34 dev postgres[6085]: [4-1] user=dev,db=dev ERROR:
> could not access status of transaction 6337381
> Nov 13 05:34:34 dev postgres[6085]: [4-2] user=dev,db=dev DETAIL:
> Could not open file "pg_subtrans/0060": No such file or directory.
> Nov 13 05:34:34 dev postgres[6085]: [4-3] user=dev,db=dev CONTEXT:
> SQL statement "SELECT 1 FROM ONLY "typhon"."collection_batches" x
> WHERE "id" OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x"
> Nov 13 05:34:34 dev postgres[6085]: [4-4] user=dev,db=dev STATEMENT:
> update listings set deactivated_at=$1 where id=$2 and lock_version=$3
>
> Several processes all seemed to hit the problem at the same moment,
> and all of them refer to the same transaction ID. Again, the file
> pg_subtrans/0060 doesn't exist, and the only file that does exist is
> pg_subtrans/005A which appears to be a zeroed-out file 245760 bytes in
> length.
>
> Still don't have a clue as to how I can reproduce the problem. It
> seems that in all cases the error occurred during either an UPDATE to
> a table_X or an INSERT to table_Y. In all cases, the error occurred in
> a manner identical to those shown in the log above, the only
> difference being either an UPDATE on table_X or an INSERT on table_Y.
>
> Not sure what direction I should head to now. Perhaps some aggressive
> logging would help, so we can see the queries surrounding the
> problems? I could reconfigure things to capture all statements and set
> up monit or something to send an alert when the problem resurfaces,
> for instance.
>
> Cheers all.

I think what would help the most is if you could arrange to obtain a
stack backtrace at the point when the error is thrown. Maybe put a
long sleep call in just before the error happens, and when it gets
stuck there, attach gdb and run bt full.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Merlin Moncure 2013-11-15 20:25:24 Re: additional json functionality
Previous Message Peter Eisentraut 2013-11-15 20:14:41 Re: Extra functionality to createuser