Valgrind Memcheck support

From: Noah Misch <noah(at)leadboat(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Valgrind Memcheck support
Date: 2013-06-09 21:25:59
Message-ID: 20130609212559.GB491289@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Valgrind's Memcheck tool[1] is handy for finding bugs, but our use of a custom
allocator limits its ability to detect problems in unmodified PostgreSQL.
During the 9.1 beta cycle, I found some bugs[2] with a rough patch adding
instrumentation to aset.c and mcxt.c such that Memcheck understood our
allocator. I've passed that patch around to a few people over time, and I've
now removed the roughness such that it's ready for upstream. In hopes of
making things clearer in the commit history, I've split out a preliminary
refactoring patch from the main patch and attached each separately.

Besides the aset.c/mcxt.c instrumentation, this patch adds explicit checks for
undefined memory to PageAddItem() and printtup(); this has caught C-language
functions that fabricate a Datum without initializing all bits. The inclusion
of all this is controlled by a pg_config_manual.h setting. The patch also
adds a "suppression file" that directs Valgrind to silences nominal errors we
don't plan to fix.

To smoke-test the instrumentation, I used "make installcheck" runs on x86_64
GNU/Linux and ppc64 GNU/Linux. This turned up various new and newly-detected
memory bugs, which I will discuss in a separate thread. With those fixed,
"make installcheck" has a clean report (in my one particular configuration).
I designed the suppression file to work across platforms; I specifically
anticipated eventual use on x86_64 Darwin and x86_64 FreeBSD. Valgrind 3.8.1
quickly crashed when running PostgreSQL on Darwin; I did not dig further.

Since aset.c and mcxt.c contain the hottest code paths in the backend, I
verified that a !USE_VALGRIND, non-assert build produces the same code before
and after the patch. Testing that revealed the need to move up the
AllocSizeIsValid() check in repalloc(), though I don't understand why GCC
reacted that way.

Peter Geoghegan and Korry Douglas provided valuable feedback on earlier
versions of this code.

Possible future directions:

- Test "make installcheck-world". When I last did this in past years, contrib did
trigger some errors.

- Test recovery, such as by running a streaming replica under Memcheck while
the primary runs "make installcheck-world".

- Test newer compilers and higher optimization levels. I used GCC 4.2 at -O1.
A brief look at -O2 results showed a new error that I have not studied. GCC
4.8 at -O3 might show still more due to increasingly-aggressive assumptions.

- A buildfarm member running its installcheck steps this way.

- Memcheck has support for detecting leaks. I have not explored that side at
all, always passing --leak-check=no. We could add support for freeing
"everything" at process exit, thereby making the leak detection meaningful.

Brief notes for folks reproducing my approach: I typically start the
Memcheck-hosted postgres like this:

valgrind --leak-check=no --gen-suppressions=all \
--suppressions=src/tools/valgrind.supp --time-stamp=yes \
--log-file=$HOME/pg-valgrind/%p.log postgres

If that detected an error on which I desired more detail, I would rerun a
smaller test case with "--track-origins=yes --read-var-info=yes". That slows
things noticeably but gives more-specific messaging. When even that left the
situation unclear, I would temporarily hack allocChunkLimit so every palloc()
turned into a malloc().

I strongly advise installing the latest-available Valgrind, particularly
because recent releases suffer far less of a performance drop processing the
instrumentation added by this patch. A "make installcheck" run takes 273
minutes under Vaglrind 3.6.0 but just 27 minutes under Valgrind 3.8.1.

Thanks,
nm

[1] http://valgrind.org/docs/manual/mc-manual.html
[2] http://www.postgresql.org/message-id/20110312133224.GA7833@tornado.gateway.2wire.net

--
Noah Misch
EnterpriseDB http://www.enterprisedb.com

Attachment Content-Type Size
aset-debug-refactor-v1.patch text/plain 8.7 KB
valgrind-hooks-201306-v1.patch text/plain 26.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2013-06-09 21:29:32 9.3 crop of memory errors
Previous Message Andrew Dunstan 2013-06-09 20:55:02 postgres_fdw regression tests order dependency