From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com> |
Cc: | Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com> |
Subject: | Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev) |
Date: | 2019-08-26 01:44:14 |
Message-ID: | 20190826014414.GC7201@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Mon, Aug 26, 2019 at 01:09:19PM +1200, Thomas Munro wrote:
> On Sun, Aug 25, 2019 at 3:15 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > I was reminded of this issue from last year, which also appeared to
> > involve BufFileClose() and a double-free:
> >
> > https://postgr.es/m/87y3hmee19.fsf@news-spur.riddles.org.uk
> >
> > That was a BufFile that was under the control of a tuplestore, so it
> > was similar to but different from your case. I suspect it's related.
>
> Hmm. tuplestore.c follows the same coding pattern as nodeHashjoin.c:
> it always nukes its pointer after calling BufFileFlush(), so it
> shouldn't be capable of calling it twice for the same pointer, unless
> we have two copies of that pointer somehow.
>
> Merlin's reported a double-free apparently in ExecHashJoin(), not
> ExecHashJoinNewBatch() like this report. Unfortunately that tells us
> very little.
>
> On Sun, Aug 25, 2019 at 2:25 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
> > #4 0x00000039ff678dd0 in _int_free (av=0x39ff98e120, p=0x1d40b090, have_lock=0) at malloc.c:4846
> > #5 0x00000000006269e5 in ExecHashJoinNewBatch (pstate=0x2771218) at nodeHashjoin.c:1058
>
> Can you reproduce this or was it a one-off crash?
The query was of our large reports, and this job runs every 15min against
recently-loaded data; in the immediate case, between
2019-08-24t08:00:00 and 2019-08-24 09:00:00
I can rerun it fine, and I ran it in a loop for awhile last night with no
issues.
time psql ts -f tmp/sql-2019-08-24.1 |wc
5416 779356 9793941
Since it was asked in other thread Peter mentioned:
ts=# SHOW work_mem;
work_mem | 128MB
ts=# SHOW shared_buffers ;
shared_buffers | 1536MB
> might be some obscure path somewhere, possibly through a custom
> operator or suchlike, that leaves us in a strange memory context, or
> something like that? But then I feel like we'd have received
> reproducible reports and a test case by now.
No custom operator in sight. Just NATURAL JOIN on integers, and WHERE on
timestamp, some plpgsql and int[].
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | Craig Ringer | 2019-08-26 01:46:57 | Re: The serial pseudotypes |
Previous Message | Thomas Munro | 2019-08-26 01:09:19 | Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev) |