Cache-flush stress testing

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Cache-flush stress testing
Date: 2006-01-19 22:03:20
Message-ID: 9648.1137708200@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've completed a round of stress testing the system for vulnerabilities
to unexpected cache flush events (relcache, catcache, or typcache
entries disappearing while in use). I'm pleased to report that the 8.1
branch now passes all available regression tests (main, contrib, pl)
with CLOBBER_CACHE_ALWAYS defined as per the attached patch.
I have not had the patience to run a full regression cycle with
CLOBBER_CACHE_RECURSIVELY (I estimate that would take over a week on the
fastest machine I have) but I have gotten through the first dozen or so
tests, and I doubt that completing the full set would find anything not
found by CLOBBER_CACHE_ALWAYS.

HEAD is still broken pending resolution of the lookup_rowtype_tupdesc()
business. 8.0 should be OK but I haven't actually tested it.

I'm still bothered by the likelihood that there are cache-flush bugs in
code paths that are not exercised by the regression tests. The
CLOBBER_CACHE patch is far too slow to consider enabling on any regular
basis, but it seems that throwing in cache flushes at random intervals,
as in the test program I posted here:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
doesn't provide very good test coverage. Has anyone got any ideas about
better ways to locate such bugs?

regards, tom lane

Index: inval.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/cache/inval.c,v
retrieving revision 1.74
diff -c -r1.74 inval.c
*** inval.c 22 Nov 2005 18:17:24 -0000 1.74
--- inval.c 19 Jan 2006 21:47:07 -0000
***************
*** 625,630 ****
--- 625,660 ----
{
ReceiveSharedInvalidMessages(LocalExecuteInvalidationMessage,
InvalidateSystemCaches);
+
+ /*
+ * Test code to force cache flushes anytime a flush could happen.
+ *
+ * If used with CLOBBER_FREED_MEMORY, CLOBBER_CACHE_ALWAYS provides a
+ * fairly thorough test that the system contains no cache-flush hazards.
+ * However, it also makes the system unbelievably slow --- the regression
+ * tests take about 100 times longer than normal.
+ *
+ * If you're a glutton for punishment, try CLOBBER_CACHE_RECURSIVELY.
+ * This slows things by at least a factor of 10000, so I wouldn't suggest
+ * trying to run the entire regression tests that way. It's useful to
+ * try a few simple tests, to make sure that cache reload isn't subject
+ * to internal cache-flush hazards, but after you've done a few thousand
+ * recursive reloads it's unlikely you'll learn more.
+ */
+ #if defined(CLOBBER_CACHE_ALWAYS)
+ {
+ static bool in_recursion = false;
+
+ if (!in_recursion)
+ {
+ in_recursion = true;
+ InvalidateSystemCaches();
+ in_recursion = false;
+ }
+ }
+ #elif defined(CLOBBER_CACHE_RECURSIVELY)
+ InvalidateSystemCaches();
+ #endif
}

/*

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2006-01-19 22:11:26 Re: Cache-flush stress testing
Previous Message Michael Fuhr 2006-01-19 21:58:31 Re: un-vacuum?