Re: dsa_allocate() faliure

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Jakub Glapa <jakub(dot)glapa(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: dsa_allocate() faliure
Date: 2018-12-02 22:45:00
Message-ID: CAEepm=1-Lo+98n7s1jXftEO2BhxFbpKSbPEhNiFkOooxe+ZBWg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

On Sat, Dec 1, 2018 at 9:46 AM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> elog(FATAL,
> "dsa_allocate could not find %zu free pages", npages);
> + abort()

If anyone can reproduce this problem with a debugger, it'd be
interesting to see the output of dsa_dump(area), and
FreePageManagerDump(segment_map->fpm). This error condition means
that get_best_segment() selected a segment from a segment bin that
holds segments with a certain minimum number of contiguous free pages
>= the requested number npages, but then FreePageManagerGet() found
that it didn't have npages of contiguous free memory after all when it
consulted the segment's btree of free space. Possible explanations
include: the segment bin lists are somehow messed up, the FPM in the
segment was corrupted by someone scribbling on free pages (which hold
the btree), the btree was corrupted by an incorrect sequence of
allocate/free calls (for example double frees, allocating from one
area and freeing to another etc), freepage.c fails to track its
largest size correctly.

There is a macro FPM_EXTRA_ASSERTS that can be defined to double-check
the largest contiguous page tracking. I have also been wondering
about a debug mode that would mprotect(PROT_READ) free pages when they
aren't being modified to detect unexpected writes, which should work
on systems that have 4k pages.

One thing I noticed is that it is failing on a "large" allocation,
where we go straight to the btree of 4k pages, but the equivalent code
where we allocate a superblock for "small" allocations doesn't report
the same kind of FATAL this-can't-happen error, it just fails the
allocation via the regular error path without explanation. I also
spotted a path that doesn't respect the DSA_ALLOC_NO_OOM flag (you get
a null pointer instead of an error). I should fix those
inconsistencies (draft patch attached), but those are incidental
problems AFAIK.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
fix-dsa-area-handling.patch application/octet-stream 1.4 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-12-02 23:33:42 Re: on or true
Previous Message Michail Nikolaev 2018-12-02 20:01:43 Re: Synchronous replay take III

Browse pgsql-performance by date

  From Date Subject
Next Message Scott Rankin 2018-12-03 18:41:54 Re: Slow Bitmap Index Scan
Previous Message Justin Pryzby 2018-11-30 20:46:47 Re: dsa_allocate() faliure