Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait.

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Mikael Kjellström <mikael(dot)kjellstrom(at)gmail(dot)com>
Subject: Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait.
Date: 2016-04-27 21:54:55
Message-ID: CAEepm=3c6anZDXi+XXDiRpY6qYNSDDVQOK_q12qocPA0x1YMLg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Thu, Apr 28, 2016 at 8:46 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Thu, Apr 28, 2016 at 7:02 AM, Alvaro Herrera
> <alvherre(at)2ndquadrant(dot)com> wrote:
>> Robert Haas wrote:
>>
>>> That looks like malloc() returned NULL. I noticed when writing that
>>> patch that isolationtester has never had any checks for malloc
>>> returning NULL, which is bad, and probably worth fixing, but I didn't
>>> choose to stop and fix it at that time.
>>
>> I didn't actually check closely but I wondered whether the pointer
>> arithmetic is actually correct. Note that the memcpy length is zero.
>> I doubt malloc returning null is the problem; how could it happen
>> exactly at the same spot every time the suite has run?
>>
>>> I don't know off-hand why you see that problem starting at this commit
>>> and not before, or why it doesn't show up on other machines.
>>
>> Perhaps it's only a problem for OpenBSD's libc and not for glibc which
>> is the most common. The only other openbsd machine in buildfarm doesn't
>> run the isolation tests.
>
> Also happens on OpenBSD 5.8. Isn't this a classic case where memmove
> is called for? Replacing the memcpy at line 617 with memmove makes
> the tests run successfully, but at first glance the other two
> instances of memcpy in run_permutation should also be changed to
> memmove, no?

That abort at memcpy.c:65 is indeed a check for overlapping regions:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/lib/libc/string/memcpy.c?rev=1.2&content-type=text/x-cvsweb-markup

That leaves the question of why the backtrace shows arguments that
wouldn't trigger it (for example memcpy exits early for length == 0).
But it appears that those stack arguments are mangled by the time gdb
dumps them, like this:

#include <string.h>
int main(int argc, char *argv[])
{
char buffer[] = "hello world";
memcpy(&buffer[0], &buffer[1], 2);
return 0;
}

#0 0x00000bce27eab90a in kill () at <stdin>:2
#1 0x00000bce27ee5b19 in abort () at /usr/src/lib/libc/stdlib/abort.c:53
#2 0x00000bce27ebcde8 in memcpy (dst0=0xf7ec5, src0=0x6, length=0) at
/usr/src/lib/libc/string/memcpy.c:65
#3 0x00000bcb5fc00c7a in main (argc=1, argv=0x7f7fffff6818) at foo.c:6

Suggested patch attached.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
memmove.patch application/octet-stream 1.3 KB

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Tom Lane 2016-04-27 21:55:39 pgsql: Clean up parsing of synchronous_standby_names GUC variable.
Previous Message Thomas Munro 2016-04-27 20:46:49 Re: Re: [COMMITTERS] pgsql: Modify the isolation tester so that multiple sessions can wait.

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-04-27 22:05:26 Re: Support for N synchronous standby servers - take 2
Previous Message David Steele 2016-04-27 21:47:14 Re: WIP: Covering + unique indexes.