Re: 8.3.0: vacuum full analyze: "invalid memory alloc request size"

From: Tomas Szepe <szepe(at)pinerecords(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: 8.3.0: vacuum full analyze: "invalid memory alloc request size"
Date: 2008-02-10 09:53:17
Message-ID: 20080210095317.GC27757@louise.pinerecords.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

> >> Hmm, please see if you can get a stack trace from that (set the
> >> breakpoint at errfinish()). You might want to use vacuum verbose
> >> first so that you can figure out which individual table is causing it.
>
> > Ok, I recompiled CVS head with --enable-debug and with --enable-cassert
> > and hit the following assert on "vacuum full verbose analyze":
> > [etc]
>
> It seems fairly clear that both of these symptoms mean that
> empty_end_pages got to be larger than fraged_pages->num_pages.
> In the first case the Assert catches that directly, but with
> asserts disabled the code just allows num_pages to go negative
> and then the space calculation in vac_update_fsm goes nuts.
>
> So the question is, how could that happen? There are only three places
> where empty_end_pages is incremented, and the first two definitely add
> the page to fraged_pages as well. What I'm thinking is you must have
> had a few pages where notup was true but do_frag didn't get set, and
> it's not quite clear how that could be. It seems most likely that the
> page contained only LP_DEAD tuples but didn't have free space large
> enough to get it put into the fraged_pages list. But the only place
> that would mark tuples LP_DEAD is pruneheap.c, and it should have
> done a PageRepairFragmentation() after doing so.
>
> Do you perhaps have a ridiculously low fillfactor attached to
> the system catalogs?

I don't know - how do I tell? My config is quite ordinary as far
as I can see (postgresql.conf.gz attached), the db was only created
4 days ago and populated from an 8.2.6 dump, there were no power
failures, freezes or even postmaster shutdowns, in fact no signs
of any problem whatsoever. There are about 300.000 queries a day.

> The fix should probably be to force pages to be put in fraged_pages
> if notup is true, but first I want to understand exactly how it got
> into this state --- there may be something else going on here.

Well, I've never hacked on postgresql, so probably all I can do
is tar/rzip the whole data dir and put it up for download somewhere
(there is no sensitive data).

Thanks for your help,
--
Tomas Szepe <szepe(at)pinerecords(dot)com>

Attachment Content-Type Size
postgresql.conf.gz application/x-gunzip 5.3 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Jorge Campins 2008-02-10 14:41:51 Re: BUG #3948: date/time functions returning wrong value
Previous Message Magnus Hagander 2008-02-10 09:39:22 Re: BUG #3948: date/time functions returning wrong value