From: | "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com> |
---|---|
To: | "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Hackers <pgsql-hackers(at)postgresql(dot)org>, "Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> |
Subject: | Re: the un-vacuumable table |
Date: | 2008-07-04 05:57:36 |
Message-ID: | 5a0a9d6f0807032257l7217d1efx79453e06407774f3@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Jul 3, 2008 at 3:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "Andrew Hammond" <andrew(dot)george(dot)hammond(at)gmail(dot)com> writes:
>> On Thu, Jul 3, 2008 at 2:35 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The whole thing is pretty mystifying, especially the ENOSPC write
>>> failure on what seems like it couldn't have been a full disk.
>
>> Yes, I've passed along the task of explaining why PG thought the disk
>> was full to the sysadmin responsible for the box. I'll post the answer
>> here, when and if we have one.
>
> I just noticed something even more mystifying: you said that the ENOSPC
> error occurred once a day during vacuuming.
Actually, the ENOSPC happened once. After that first error, we got
vacuumdb: vacuuming of database "adecndb" failed: ERROR: failed to
re-find parent key in "ledgerdetail_2008_03_idx2" for deletion target
page 64767
repeatedly.
> That doesn't make any
> sense, because a write error would leave the shared buffer still marked
> dirty, and so the next checkpoint would try to write it again. If
> there's a persistent write error on a particular block, you should see
> it being complained of at least once per checkpoint interval.
>
> If you didn't see that, it suggests that the ENOSPC was transient,
> which isn't unreasonable --- but why would it recur for the exact
> same block each night?
>
> Have you looked into the machine's kernel log to see if there is any
> evidence of low-level distress (hardware or filesystem level)? I'm
> wondering if ENOSPC is being reported because it is the closest
> available errno code, but the real problem is something different than
> the error message text suggests. Other than the errno the symptoms
> all look quite a bit like a bad-sector problem ...
I will pass this along to the sysadmin in charge of this box.
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Raney | 2008-07-04 07:07:49 | Re: [PATCHES] Explain XML patch v2 |
Previous Message | Alvaro Herrera | 2008-07-04 02:09:03 | Re: Truncated queries when select * from pg_stat_activity - wishlist / feature request |