| From: | Reini Urban <rurban(at)x-ray(dot)at> | 
|---|---|
| To: | pgsql-hackers-win32(at)postgresql(dot)org | 
| Subject: | Re: Can someone verify CVS tip on Win32? | 
| Date: | 2004-11-18 12:09:17 | 
| Message-ID: | 419C90ED.70706@x-ray.at | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers-win32 | 
Tom Lane schrieb:
> Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> 
>>Tom Lane wrote:
>>
>>>Hmm ... I have a theory about it, but I'm not sure how to reproduce the
>>>problem.  How many databases have you created in the installation that
>>>the contrib installcheck is running against?  
> 
> 
>>Just what make installcheck / make contrib installcheck  runs.
> 
> OK.  I still haven't been able to reproduce it, but the place where it
> is failing is consistent with my theory, which is:
> 
> 1. CREATE DATABASE creates a pg_database row for "regression" that is
> the last or nearly last row that will fit into block 0 of pg_database.
> It then flushes this block to disk to ensure that new backends can see
> the row in GetRawDatabaseInfo.
> 
> 2. pg_regress.sh then does several ALTER DATABASE operations.  These
> will mark the original row dead and make a new row.  At the end of this,
> I hypothesize that the live copy of the "regression" row is in
> pg_database block 1, not block 0.  And it's not been flushed to disk,
> because ALTER DATABASE fails to do that.
> 
> 3. (Here's the hard-to-reproduce part.)  Assume that something causes
> block 0, but not block 1, of pg_database to be flushed from shared
> buffers to disk.
> 
> 4. Now, an incoming backend will see the original pg_database row for
> "regression" as committed dead, so it'll ignore it.  It can't see the
> live row because that's not been flushed to disk; it's only in shared
> buffers.  Ergo, GetRawDatabaseInfo fails.
> 
> The problem goes away as soon as a checkpoint happens, but it's still
> possible for the regression tests to fail this way.
> 
> A reasonable theory about step 3 is that the bgwriter chooses to write
> out block 0 at just the right time.  This would happen infrequently
> enough to explain why we've not seen this reported before.
> 
> This theory explains why the failure consistently happens at the same
> place in the test sequence, and why that place is machine-architecture
> dependent: it can only happen when a certain number of pg_database rows
> have been created and deleted, and the magic number depends on the
> machine MAXALIGN value because that affects the size of the rows.
> 
> The fix of course is that ALTER DATABASE must flush pg_database to disk,
> just as RENAME does.
This also explains my strange regression problems on cygwin. Thanks for 
the change. Everything looks much easier now.
-- 
Reini Urban
http://xarch.tu-graz.ac.at/home/rurban/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | ctnan | 2004-11-18 19:18:20 | There's any version that work on Windows 2000?? | 
| Previous Message | Tom Lane | 2004-11-18 01:24:23 | Re: Can someone verify CVS tip on Win32? |