From: | Heikki Linnakangas <heikki(at)enterprisedb(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Unhelpful debug tools on OS X :-( |
Date: | 2007-04-17 20:55:11 |
Message-ID: | 4625342F.40907@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Tom Lane wrote:
> Heikki Linnakangas <heikki(at)enterprisedb(dot)com> writes:
>> Tom Lane wrote:
>>> Any suggestions how to extract some info out of this?
>
>> Does OS X have the catchsegv tool?
>
> No, but I suddenly remembered about CrashReporter, and sure enough it's
> catching these crashes:
>
> Exception: EXC_BAD_ACCESS (0x0001)
> Codes: KERN_PROTECTION_FAILURE (0x0002) at 0x00000010
>
> Thread 0 Crashed:
> 0 postmaster 0x001af4ef smgrextend + 12 (smgr.c:485)
> 1 postmaster 0x00029044 end_heap_rewrite + 208 (rewriteheap.c:278)
> 2 postmaster 0x000bdc22 cluster_rel + 850 (cluster.c:806)
> 3 postmaster 0x000be119 cluster + 160 (cluster.c:220)
> 4 postmaster 0x001b74a8 PortalRunUtility + 233 (palloc.h:84)
> 5 postmaster 0x001b7784 PortalRunMulti + 237 (pquery.c:1271)
> 6 postmaster 0x001b80ae PortalRun + 918 (pquery.c:813)
> 7 postmaster 0x001b2afd exec_simple_query + 656 (postgres.c:965)
> 8 postmaster 0x001b4b0c PostgresMain + 5628 (postgres.c:3507)
> 9 postmaster 0x00183973 ServerLoop + 2828 (postmaster.c:2614)
> 10 postmaster 0x00184b1e PostmasterMain + 2794 (postmaster.c:972)
> 11 postmaster 0x00130f8e main + 1236 (main.c:188)
> 12 postmaster 0x00001e86 _start + 216
> 13 postmaster 0x00001dad start + 41
>
> So it looks like this has got something to do with the MVCC-safe cluster
> changes, which is not too surprising considering it started happening
> around about then. Off to have a look ...
I've been looking at the code for a few minutes as well, but haven't
found an explanation for that yet.
But I did notice that we're not fsyncing the newly written relation like
we should. There's a comment raw_heap_insert:
/*
* Now write the page. We say isTemp = true even if it's not a
* temp table, because there's no need for smgr to schedule an
* fsync for this write; we'll do it ourselves before committing.
*/
smgrextend(state->rs_new_rel->rd_smgr, state->rs_blockno,
(char *) page, true);
That's copy-pasted from tablecmds.c. But unlike in tablecmds.c,
end_heap_rewrite only fsyncs the new file if we're not WAL-logging.
Proposed fix:
Index: src/backend/access/heap/rewriteheap.c
===================================================================
RCS file:
/home/hlinnaka/pgcvsrepository/pgsql/src/backend/access/heap/rewriteheap.c,v
retrieving revision 1.1
diff -c -r1.1 rewriteheap.c
*** src/backend/access/heap/rewriteheap.c 8 Apr 2007 01:26:27
-0000 1.1
--- src/backend/access/heap/rewriteheap.c 17 Apr 2007 20:50:05 -0000
***************
*** 272,282 ****
}
/*
! * If not WAL-logging, must fsync before commit. We use heap_sync
! * to ensure that the toast table gets fsync'd too.
*/
! if (!state->rs_use_wal)
! heap_sync(state->rs_new_rel);
/* Deleting the context frees everything */
MemoryContextDelete(state->rs_cxt);
--- 272,284 ----
}
/*
! * Must fsync before commit, even if we've WAL-logged the changes,
! * because we've written pages outside the buffer manager. See
comments! * in copy_relation_data in commands/tablecmds.c for
more information.
! *
! * We use heap_sync to ensure that the toast table gets fsync'd too.
*/
! heap_sync(state->rs_new_rel);
/* Deleting the context frees everything */
MemoryContextDelete(state->rs_cxt);
BTW: In tablecmds.c the new relation is fsynced with smgrimmedsync, not
heap_sync. How about the toast table, it goes through shared buffers as
usual, right?
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2007-04-17 20:58:00 | Re: Unhelpful debug tools on OS X :-( |
Previous Message | Bruce Momjian | 2007-04-17 20:51:01 | Re: CREATE DATABASE foo OWNER bar |