I try to further condense the use-case to make it easier to figure out the root cause.
The call involved here was recently modified. Could it be a regression?
@Tom Lane: As you reviewe that change, could you please have another look on the details?
Please find further hints below in the hope it helps you to get to the root cause.
While the issue seems to appear frequently when I run the task from a cron job, it crashes much more sporadic when manually calling.
Executing the full query using pgAdmin crashed postgres. Sub-Sequent runs seem to work fine.
Today I tried to execute smaller parts of the query.
The following query caused a segfault:
SELECT count(1) FROM planet_osm_ways WHERE ARRAY['motorway','trunk','primary','secondary','tertiary'] && tags;
postgres: segfault at 557f561cddbc ip 0000557f5450bc60 sp 00007ffe79c8a6c0 error 4 in postgres[557f541bc000+64d000]
The instruction pointer is at least similar to the initial crash.
This is the table queried:
Column | Type | Modifiers
id | bigint | not null
nodes | bigint | not null
tags | text |
"planet_osm_ways_pkey" PRIMARY KEY, btree (id)
"planet_osm_ways_nodes" gin (nodes) WITH (fastupdate=off)
I checked, the tags all have a cardinality greater of at least 2.
The table planet_osm_ways is updated every few minutes by parallel tasks. Could it be some glitch in the table update?
Executing the same query again works later on:
gis=> SELECT count(1) FROM planet_osm_ways WHERE ARRAY['motorway','trunk','primary','secondary','tertiary'] && tags;
also waiting a bit longer I was not able to reproduce the issue. But usually it comes back when waiting long enough.
The initial call-stack showed the crash here:
segfault ip 0000560103865c60 error 4 in postgres
#0 deconstruct_array (array=array(at)entry=0x564008a6bde8,
elmtype=elmtype(at)entry=25, elmlen=elmlen(at)entry=-1, elmbyval=elmbyval(at)entry=0
'\000', elmalign=elmalign(at)entry=105 'i', elemsp=elemsp(at)entry=0x7ffc5a33d570,
Which is later on only a read operation on the array address plus an offset:
p = att_addlength_pointer(p, elmlen, p);
p = (cur_offset) + VARSIZE_ANY(attptr)
with VARSIZE_ANY doing a read like this:
((((varattrib_1b *) (PTR))->va_header) == 0x01)
Crashing here would mean either the array pointer is off, or the pointer is too far towards the end of the array and va_header points after the end of the array.
Does the address sound reasonable? I am not that familiar with the virtual address space layout involved here. It is quite close to the instruction pointer address.
As I am doing an array overlap operation and the call stack passes through this function it hints that the crashing query could be the one above.
#2 0x0000564007ada7dd in arrayoverlap (fcinfo=0x564008a5d6e8)
If I have a bit more time and it would bring further details I could try getting a core for such crashes as well. Currently my assumption is that it is the same root cause.
Please let me know what other details might help in getting an idea of what breaks here.