Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Masahiko Sawada <masahiko(dot)sawada(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Using Valgrind to detect faulty buffer accesses (no pin or buffer content lock held)
Date: 2020-04-25 01:37:44
Message-ID: CAH2-WzkLgyN3zBvRZ1pkNJThC=xi_0gpWRUb_45eexLH1+k2_Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I recently expressed an interest in using Valgrind memcheck to detect
access to pages whose buffers do not have a pin held in the backend,
or do not have a buffer lock held (the latter check makes sense for
pages owned by index access methods). I came up with a quick and dirty
patch, that I confirmed found a bug in nbtree VACUUM that I spotted
randomly:

https://postgr.es/m/CAH2-Wz=WRu6NMWtit2weDnuGxdsWeNyFygeBP_zZ2Sso0YAGFg@mail.gmail.com

(This is a bug in commit 857f9c36cda.)

Alvaro wrote a similar patch back in 2015, that I'd forgotten about
but was reminded of today:

https://postgr.es/m/20150723195349.GW5596@postgresql.org

I took his version (which was better than my rough original) and
rebased it -- that's attached as the first patch. The second patch is
something that takes the general idea further by having nbtree mark
pages whose buffers lack a buffer lock (that may or may not have a
buffer pin) as NOACCESS in a similar way.

This second patch detected two more bugs in nbtree page deletion by
running the regression tests with Valgrind memcheck. These additional
bugs are probably of lower severity than the first one, since we at
least have a buffer pin (we just don't have buffer locks). All three
bugs are very similar, though: they all involve dereferencing a
pointer to the special area of a page at a point where the underlying
buffer is no longer safe to access.

The final two patches fix the two newly discovered bugs -- I don't
have a fix for the first bug yet, since that one is more complicated
(and probably more serious). The regression tests run with Valgrind
will complain about all three bugs if you just apply the first two
patches (though you only need the first patch to see a complaint about
the first, more serious bug when the tests are run).

--
Peter Geoghegan

Attachment Content-Type Size
v1-0001-Add-Valgrind-buffer-access-instrumentation.patch application/octet-stream 2.3 KB
v1-0002-Add-nbtree-Valgrind-buffer-lock-checks.patch application/octet-stream 13.4 KB
v1-0003-Fix-second-page-deletion-bug.patch application/octet-stream 1.7 KB
v1-0004-Fix-third-page-deletion-bug.patch application/octet-stream 812 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-04-25 03:52:32 Re: psql - pager support - using invisible chars for signalling end of report
Previous Message James Coleman 2020-04-25 01:22:34 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays