Re: Hard limit on WAL space used (because PANIC sucks)

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Jim Nasby <jim(at)nasby(dot)net>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Peter Geoghegan <pg(at)heroku(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2014-01-23 12:56:49
Message-ID: CA+U5nMKaZGHofGY6O=ZUvz_V+n=ooh3CmO4cQhy=2dKrcuNiRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23 January 2014 01:19, Jim Nasby <jim(at)nasby(dot)net> wrote:
> On 1/21/14, 6:46 PM, Andres Freund wrote:
>>
>> On 2014-01-21 16:34:45 -0800, Peter Geoghegan wrote:
>>>
>>> >On Tue, Jan 21, 2014 at 3:43 PM, Andres Freund<andres(at)2ndquadrant(dot)com>
>>> > wrote:
>>>>
>>>> > >I personally think this isn't worth complicating the code for.
>>>
>>> >
>>> >You're probably right. However, I don't see why the bar has to be very
>>> >high when we're considering the trade-off between taking some
>>> >emergency precaution against having a PANIC shutdown, and an assured
>>> >PANIC shutdown
>>
>> Well, the problem is that the tradeoff would very likely include making
>> already complex code even more complex. None of the proposals, even the
>> one just decreasing the likelihood of a PANIC, like like they'd end up
>> being simple implementation-wise.
>> And that additional complexity would hurt robustness and prevent things
>> I find much more important than this.
>
>
> If we're not looking for perfection, what's wrong with Peter's idea of a
> ballast file? Presumably the check to see if that file still exists would be
> cheap so we can do that before entering the appropriate critical section.
>
> There's still a small chance that we'd end up panicing, but it's better than
> today. I'd argue that even if it doesn't work for CoW filesystems it'd still
> be a win.

I grant that it does sound simple enough for a partial stop gap.

My concern is that it provides only a short delay before the eventual
disk-full situation, which it doesn't actually prevent.

IMHO the main issue now is how we clear down old WAL files. We need to
perform a checkpoint to do that - and as has been pointed out in
relation to my proposal, we cannot complete that because of locks that
will be held for some time when we do eventually lock up.

That issue is not solved by having a ballast file(s).

IMHO we need to resolve the deadlock inherent in the
disk-full/WALlock-up/checkpoint situation. My view is that can be
solved in a similar way to the way the buffer pin deadlock was
resolved for Hot Standby.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-01-23 12:57:13 Re: Add CREATE support to event triggers
Previous Message KONDO Mitsumasa 2014-01-23 12:43:22 Re: Add min and max execute statement time in pg_stat_statement