From: | Justin Pryzby <pryzby(at)telsasoft(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Cc: | Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Tim Bishop <tim(at)inroads(dot)ai>, Christoph Berg <myon(at)debian(dot)org>, Bernhard Übelacker <bernhardu(at)mailbox(dot)org> |
Subject: | debian bugrept involving fast default crash in pg11.7 |
Date: | 2020-03-28 22:30:52 |
Message-ID: | 20200328223052.GK20103@telsasoft.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
I happened across this bugreport, which seems to have just enough information
to be interesting.
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=953204
|Version: 11.7-0+deb10u1
|2020-03-05 16:55:55.511 UTC [515] LOG: background worker "parallel worker" (PID 884) was terminated by signal 11: Segmentation fault
|2020-03-05 16:55:55.511 UTC [515] DETAIL: Failed process was running:
|SELECT distinct student_prob.student_id, student_prob.score, student_name, v_capacity_score.capacity
|FROM data JOIN model on model.id = 2 AND data_stage(data) = model.target_begin_field_id
|JOIN student_prob ON data.crm_id = student_prob.student_id AND model.id = student_prob.model_id AND (student_prob.additional_aid < 1)
|LEFT JOIN v_capacity_score ON data.crm_id = v_capacity_score.student_id AND student_prob.model_id = v_capacity_score.model_id
|WHERE data.term_code = '202090' AND student_prob.score > 0
|ORDER BY student_prob.score DESC, student_name
|LIMIT 100 OFFSET 100 ;
Tim: it'd be nice to get more information, if and when possible:
- "explain" plan for that query;
- \d for the tables involved: constraints, inheritence, defaults;
- corefile or backtrace; it looks like there's two different crashes (maybe same problem) so both would be useful;
- Can you reprodue the crash if you "SET max_parallel_workers_per_gather=0" ?
- Do you know if it crashed under v11.6 ?
If anyone wants to hack on the .deb:
https://packages.debian.org/buster/amd64/postgresql-11/download and (I couldn't find the dbg package anywhere else)
https://snapshot.debian.org/package/postgresql-11/11.7-0%2Bdeb10u1/#postgresql-11-dbgsym_11.7-0:2b:deb10u1
$ mkdir pg11
$ cd pg11
$ wget -q http://security.debian.org/debian-security/pool/updates/main/p/postgresql-11/postgresql-11_11.7-0+deb10u1_amd64.deb
$ ar x ./postgresql-11_11.7-0+deb10u1_amd64.deb
$ tar xf ./data.tar.xz
$ ar x postgresql-11-dbgsym_11.7-0+deb10u1_amd64.deb
$ tar tf data.tar.xz
$ gdb usr/lib/postgresql/11/bin/postgres
(gdb) set debug-file-directory usr/lib/debug/
(gdb) file usr/lib/postgresql/11/bin/postmaster
(gdb) info target
If I repeat the process Bernhard used (thanks for that) on the first crash in
libc6, I get:
(gdb) find /b 0x0000000000022320, 0x000000000016839b, 0xf9, 0x20, 0x77, 0x1f, 0xc5, 0xfd, 0x74, 0x0f, 0xc5, 0xfd, 0xd7, 0xc1, 0x85, 0xc0, 0x0f, 0x85, 0xdf, 0x00, 0x00, 0x00, 0x48, 0x83, 0xc7, 0x20, 0x83, 0xe1, 0x1f, 0x48, 0x83, 0xe7, 0xe0, 0xeb, 0x36, 0x66, 0x90, 0x83, 0xe1, 0x1f, 0x48, 0x83, 0xe7, 0xe0, 0xc5, 0xfd, 0x74, 0x0f, 0xc5, 0xfd, 0xd7, 0xc1, 0xd3, 0xf8, 0x85, 0xc0, 0x74, 0x1b, 0xf3, 0x0f, 0xbc, 0xc0, 0x48, 0x01, 0xf8, 0x48
0x15c17d <__strlen_avx2+13>
warning: Unable to access 1631 bytes of target memory at 0x167d3d, halting search.
1 pattern found.
I'm tentatively guessing that heap_modify_tuple() is involved, since it calls
getmissingattr and (probably) fill_val. It looks like maybe some data
structure is corrupted which crashed two parallel workers, one in
fill_val()/strlen() and one in heap_deform_tuple()/getmissingattr(). Maybe
something not initialized in parallel worker, or a use-after-free? I'll stop
guessing.
Justin
From | Date | Subject | |
---|---|---|---|
Next Message | James Coleman | 2020-03-28 22:58:10 | Re: [PATCH] Incremental sort (was: PoC: Partial sort) |
Previous Message | Thomas Munro | 2020-03-28 22:25:12 | Re: pgsql: Add kqueue(2) support to the WaitEventSet API. |