Re: Cache lookup failure for index during pg_dump

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bob Lunney <bob_lunney(at)yahoo(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Cache lookup failure for index during pg_dump
Date: 2010-02-20 00:39:51
Message-ID: 14781.1266626391@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Bob Lunney <bob_lunney(at)yahoo(dot)com> writes:
> I have a job that summarizes some data from a table, truncates the table, drops the indexes, writes the summarized data back into the table, then recreates the indexes. The operations above are not in a single transaction, but separate statements executed by a script. Easy, runs great, has for years.
> Recently the job takes a little longer to run and is still going when the database dump starts. That's when I started getting this:

> ERROR: cache lookup failed for index 70424

> My questions are: can making DDL changes during a dump cause this error? Are the queries used by pg_dump transactionally consistent, i.e. do they run in a transaction and get a single view of the database system catalogs? Other than finer coordination of jobs, how can this situation be avoided?

It's a bit messy. pg_dump runs in a serializable transaction, so it
sees a consistent snapshot of the database including system catalogs.
However, it relies in part on various specialized backend functions like
pg_get_indexdef(), and those things tend to run on SnapshotNow time, ie
they look at the currently committed state. So it is possible to get
this type of error if someone performs DDL changes while a dump is
happening: pg_dump sees index 70424 still listed in the catalogs,
so it asks about it, and the backend says "there is no such index",
which there isn't anymore because somebody dropped it since pg_dump's
transaction started.

The window for this sort of thing isn't very large, because the first
thing pg_dump does is acquire AccessShareLock on every table it intends
to dump, and past that point it won't be possible for anyone to modify
the table's DDL. But it can happen.

The right fix for this is to make all those inquiry functions use the
calling query's snapshot; but duplicating a lot of backend
infrastructure is going to be a major pain in the rear, so the
discussion has kind of petered out every time it's come up in the past.

In practice, because pg_dump does lock out DDL changes for the bulk of
its run, it's not a great idea to be scheduling DDL-changing jobs during
your dumps anyhow. Most of the time they'll just get blocked till the
dump finishes, and if they touch more than one table it's not at all
unlikely for them to end up deadlocked against pg_dump's locks. A fix
for the snapshot-timing problem wouldn't do a thing for that problem.

So in short, the path of least resistance is to reschedule your dumps.
Or reconsider whether you really need to drop and recreate those indexes
--- could you use REINDEX instead?

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Euler Taveira de Oliveira 2010-02-20 01:50:35 Re: Cache lookup failure for index during pg_dump
Previous Message Aris Setyawan 2010-02-19 22:51:32 Re: BUG #5015: MySQL migration wizard does not start