Re: pgstattuple triggered checkpoint failure and database outage?

From: Stuart Bishop <stuart(at)stuartbishop(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: pgstattuple triggered checkpoint failure and database outage?
Date: 2009-03-31 04:26:24
Message-ID: 6bc73d4c0903302126q235cdc25j1eda3086b264f259@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Tue, Mar 31, 2009 at 11:10 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Stuart Bishop <stuart(at)stuartbishop(dot)net> writes:
>> On Tue, Mar 31, 2009 at 8:59 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> What's the actual size of that relation now?  Is it growing rapidly?
>>> (I'm trying to figure out whether those writes *should* have succeeded,
>>> or whether the block numbers were corrupt in memory.)
>
>> I can't seem to find a file on disk named 11088101 or an entry in pg_class where relfilenode = 11088101.
>
>> Are the allocated table oids always increasing? If so, I can pretty much guarantee that the missing relation was a temporary table or the index on the temporary table. It had a single integer column and maybe 50million rows.
>
> The OIDs increase till they wrap around, so what this sounds like is a
> problem with somebody fetching temporary-table blocks into shared memory
> (where they should never be), and then things going wrong after the
> owning backend drops the temp table (without having cleared out shared
> buffers, which it won't do because it doesn't think it needs to).  Can
> you say what was the exact command(s) you were using with pgstattuple?

Get a list of everything I'm interested in:

SELECT pg_class.oid, nspname, relname
FROM pg_class,pg_namespace
WHERE relnamespace = pg_namespace.oid
AND relkind IN ('r', 't', 'i') -- normal table, toast, index
AND nspname IN ('public', '_sl')
ORDER BY nspname, relname

For every result, get the pgstattuple information I'm interested in
for my reports:

SELECT
table_len,
pg_size_pretty(table_len),
dead_tuple_len,
pg_size_pretty(dead_tuple_len),
free_space,
pg_size_pretty(free_space)
FROM pgstattuple(%(oid)s)

I might be passing objects into pgstattuple that it can't handle - the
system exploded before I could tune the table selection criteria. I
notice that my query will have asked pgstattuple for information
about the temporary table. Which appears to work when testing locally,
but I'm not engineering any race conditions.

--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Stuart Bishop 2009-03-31 04:29:21 Re: [GENERAL] pgstattuple triggered checkpoint failure and database outage?
Previous Message David E. Wheeler 2009-03-31 04:21:29 Re: [HACKERS] string_to_array with empty input

Browse pgsql-hackers by date

  From Date Subject
Next Message Stuart Bishop 2009-03-31 04:29:21 Re: [GENERAL] pgstattuple triggered checkpoint failure and database outage?
Previous Message Tom Lane 2009-03-31 04:22:55 Re: can't load plpython