Re: crash with assertions and WAL_DEBUG

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: crash with assertions and WAL_DEBUG
Date: 2014-06-23 10:07:34
Message-ID: 20140623100734.GF3968@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-06-23 12:58:19 +0300, Heikki Linnakangas wrote:
> On 06/21/2014 01:58 PM, Heikki Linnakangas wrote:
> >It's a bit difficult to attach the mark to the palloc calls, as neither
> >the WAL_DEBUG or LWLOCK_STATS code is calling palloc directly, but
> >marking specific MemoryContexts as sanctioned ought to work. I'll take a
> >stab at that.
>
> I came up with the attached patch. It adds a function called
> MemoryContextAllowInCriticalSection(), which can be used to exempt specific
> memory contexts from the assertion. The following contexts are exempted:
>
> * ErrorContext
> * MdCxt, which is used in checkpointer to absorb fsync requests. (the
> checkpointer process as a whole is no longer exempt)
> * The temporary StringInfos used in WAL_DEBUG (a new memory "WAL Debug"
> context is now created for them)
> * LWLock stats hash table (a new "LWLock stats" context is created for it)
>
> Barring objections, I'll commit this to master, and remove the assertion
> from REL9_4_STABLE.

Sounds generally sane. Some comments:

> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
> index abc5682..e141a28 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -60,6 +60,7 @@
> #include "storage/spin.h"
> #include "utils/builtins.h"
> #include "utils/guc.h"
> +#include "utils/memutils.h"
> #include "utils/ps_status.h"
> #include "utils/relmapper.h"
> #include "utils/snapmgr.h"
> @@ -1258,6 +1259,25 @@ begin:;
> if (XLOG_DEBUG)
> {
> StringInfoData buf;
> + static MemoryContext walDebugCxt = NULL;
> + MemoryContext oldCxt;
> +
> + /*
> + * Allocations within a critical section are normally not allowed,
> + * because allocation failure would lead to a PANIC. But this is just
> + * debugging code that no-one is going to enable in production, so we
> + * don't care. Use a memory context that's exempt from the rule.
> + */
> + if (walDebugCxt == NULL)
> + {
> + walDebugCxt = AllocSetContextCreate(TopMemoryContext,
> + "WAL Debug",
> + ALLOCSET_DEFAULT_MINSIZE,
> + ALLOCSET_DEFAULT_INITSIZE,
> + ALLOCSET_DEFAULT_MAXSIZE);
> + MemoryContextAllowInCriticalSection(walDebugCxt, true);
> + }
> + oldCxt = MemoryContextSwitchTo(walDebugCxt);

This will only work though if the first XLogInsert() isn't called from a
critical section. I'm not sure it's a good idea to rely on that.

> diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
> index 3c1c81a..4264373 100644
> --- a/src/backend/storage/smgr/md.c
> +++ b/src/backend/storage/smgr/md.c
> @@ -219,6 +219,16 @@ mdinit(void)
> &hash_ctl,
> HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
> pendingUnlinks = NIL;
> +
> + /*
> + * XXX: The checkpointer needs to add entries to the pending ops
> + * table when absorbing fsync requests. That is done within a critical
> + * section. It means that there's a theoretical possibility that you
> + * run out of memory while absorbing fsync requests, which leads to
> + * a PANIC. Fortunately the hash table is small so that's unlikely to
> + * happen in practice.
> + */
> + MemoryContextAllowInCriticalSection(MdCxt, true);
> }
> }

Isn't that allowing a bit too much? We e.g. shouldn't allow
_fdvec_alloc() within a crritical section. Might make sense to create a
child context for it.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-06-23 10:15:01 Use a signal to trigger a memory context dump?
Previous Message Heikki Linnakangas 2014-06-23 09:58:19 Re: crash with assertions and WAL_DEBUG