Re: Cache lookup failure for index during pg_dump

From: Boszormenyi Zoltan <zb(at)cybertec(dot)at>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Bob Lunney <bob_lunney(at)yahoo(dot)com>, pgsql-bugs(at)postgresql(dot)org, Hans-Juergen Schoenig <hs(at)cybertec(dot)at>
Subject: Re: Cache lookup failure for index during pg_dump
Date: 2010-05-27 11:34:40
Message-ID: 4BFE58D0.2020307@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tom Lane írta:
> Bob Lunney <bob_lunney(at)yahoo(dot)com> writes:
>
>> I have a job that summarizes some data from a table, truncates the table, drops the indexes, writes the summarized data back into the table, then recreates the indexes. The operations above are not in a single transaction, but separate statements executed by a script. Easy, runs great, has for years.
>> Recently the job takes a little longer to run and is still going when the database dump starts. That's when I started getting this:
>>
>
>
>> ERROR: cache lookup failed for index 70424
>>
>
>
>> My questions are: can making DDL changes during a dump cause this error? Are the queries used by pg_dump transactionally consistent, i.e. do they run in a transaction and get a single view of the database system catalogs? Other than finer coordination of jobs, how can this situation be avoided?
>>
>
> It's a bit messy. pg_dump runs in a serializable transaction, so it
> sees a consistent snapshot of the database including system catalogs.
> However, it relies in part on various specialized backend functions like
> pg_get_indexdef(), and those things tend to run on SnapshotNow time, ie
> they look at the currently committed state. So it is possible to get
> this type of error if someone performs DDL changes while a dump is
> happening: pg_dump sees index 70424 still listed in the catalogs,
> so it asks about it, and the backend says "there is no such index",
> which there isn't anymore because somebody dropped it since pg_dump's
> transaction started.
>

I looked at ruleutils.c and now I am a little confused.
I mean different functions use different subsystems to
get their info. E.g.:

syscache is used by e.g.:
pg_get_indexdef
pg_get_constraintdef
pg_get_userbyid
pg_get_functiondef

direct systable_beginscan(SnapshotNow) is used by:
pg_get_triggerdef
pg_get_serial_sequence

SPI is used by:
pg_get_ruledef
pg_get_viewdef

SPI uses the same snapshot as the calling query, right?
So, a serializable transaction gets all their info properly.

Syscache doesn't contain old info, only the latest according
to committed transactions, this is where the "cache lookup
failed for index" problem comes from but some others,
e.g. dropped functions or usernames may show a similar
error message.

Looking at the tqual.c and snapmgr.c, using
systable_beginscan(GetTransactionSnapshot())
instead of SearchSysCache() would solve the problem for
pg_dump. Why would we have to duplicate these functions?
It seems (at least for pg_get_ruledef and pg_get_indexdef)
that these functions are only used by pg_dump and psql's
\something macros. We would lose a little performance
by not using the syscache but gain a lot of correctness.
Am I missing something?

Best regards,
Zoltán Böszörményi

--
Bible has answers for everything. Proof:
"But let your communication be, Yea, yea; Nay, nay: for whatsoever is more
than these cometh of evil." (Matthew 5:37) - basics of digital technology.
"May your kingdom come" - superficial description of plate tectonics

----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2010-05-27 13:50:08 Re: Cache lookup failure for index during pg_dump
Previous Message Piergiorgio Buongiovanni 2010-05-27 09:10:20 BUG #5476: sequence corruption