From: | Fujii Masao <masao(dot)fujii(at)gmail(dot)com> |
---|---|
To: | Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com> |
Cc: | Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock) |
Date: | 2012-02-16 11:31:04 |
Message-ID: | CAHGQGwGzUWvkWk7w0W3O79uaHrTXowZ6ou58E2df2K+5JqMRZg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Feb 16, 2012 at 6:15 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Thu, Feb 16, 2012 at 5:02 AM, Heikki Linnakangas
> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>> On 15.02.2012 18:52, Fujii Masao wrote:
>>>
>>> On Thu, Feb 16, 2012 at 1:01 AM, Heikki Linnakangas
>>> <heikki(dot)linnakangas(at)enterprisedb(dot)com> wrote:
>>>>
>>>> Are you still seeing this failure with the latest patch I posted
>>>>
>>>> (http://archives.postgresql.org/message-id/4F38F5E5.8050203@enterprisedb.com)?
>>>
>>>
>>> Yes. Just to be safe, I again applied the latest patch to HEAD,
>>> compiled that and tried
>>> the same test. Then unfortunately I got the same failure again.
>>
>>
>> Ok.
>>
>>> I ran the configure with '--enable-debug' '--enable-cassert'
>>> 'CPPFLAGS=-DWAL_DEBUG',
>>> and make with -j 2 option.
>>>
>>> When I ran the test with wal_debug = on, I got the following assertion
>>> failure.
>>>
>>> LOG: INSERT @ 0/17B3F90: prev 0/17B3F10; xid 998; len 31: Heap -
>>> insert: rel 1663/12277/16384; tid 0/197
>>> STATEMENT: create table t (i int); insert into t
>>> values(generate_series(1,10000)); delete from t
>>> LOG: INSERT @ 0/17B3FD0: prev 0/17B3F50; xid 998; len 31: Heap -
>>> insert: rel 1663/12277/16384; tid 0/198
>>> STATEMENT: create table t (i int); insert into t
>>> values(generate_series(1,10000)); delete from t
>>> TRAP: FailedAssertion("!(((bool) (((void*)(&(target->tid)) != ((void
>>> *)0))&& ((&(target->tid))->ip_posid != 0))))", File: "heapam.c",
>>>
>>> Line: 5578)
>>> LOG: xlog bg flush request 0/17B4000; write 0/17A6000; flush 0/179D5C0
>>> LOG: xlog bg flush request 0/17B4000; write 0/17B0000; flush 0/17B0000
>>> LOG: server process (PID 16806) was terminated by signal 6: Abort trap
>>>
>>> This might be related to the original problem which Jeff and I saw.
>>
>>
>> That's strange. I made a fresh checkout, too, and applied the patch, but
>> still can't reproduce. I used the attached script to test it.
>>
>> It's surprising that the crash happens when the records are inserted, not at
>> recovery. I don't see anything obviously wrong there, so could you please
>> take a look around in gdb and see if you can get a clue what's going on?
>> What's the stack trace?
>
> According to the above log messages, one strange thing is that the location
> of the WAL record (i.e., 0/17B3F90) is not the same as the previous location
> of the following WAL record (i.e., 0/17B3F50). Is this intentional?
>
> BTW, when I ran the test on my Ubuntu, I could not reproduce the problem.
> I could reproduce the problem only in MacOS.
+ nextslot = Insert->nextslot;
+ if (NextSlotNo(nextslot) == lastslot)
+ {
+ /*
+ * Oops, we've "caught our tail" and the oldest slot is still in use.
+ * Have to wait for it to become vacant.
+ */
+ SpinLockRelease(&Insert->insertpos_lck);
+ WaitForXLogInsertionSlotToBecomeFree();
+ goto retry;
+ }
+ myslot = &XLogCtl->XLogInsertSlots[nextslot];
+ nextslot = NextSlotNo(nextslot);
nextslot can reach NumXLogInsertSlots, which would be a bug, I guess.
When I did the quick-fix and ran the test, I could not reproduce the problem
any more. I'm not sure if this is really the cause of the problem, though.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From | Date | Subject | |
---|---|---|---|
Next Message | Alexander Korotkov | 2012-02-16 11:36:46 | Re: Designing an extension for feature-space similarity search |
Previous Message | Simon Riggs | 2012-02-16 11:16:31 | Re: 16-bit page checksums for 9.2 |