Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Merlin Moncure <mmoncure(at)gmail(dot)com>
Subject: Re: pg11.5: ExecHashJoinNewBatch: glibc detected...double free or corruption (!prev)
Date: 2019-08-26 02:54:21
Message-ID: 20190826025421.GD7201@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm not sure but maybe this is useful ?

|(gdb) p VfdCache[2397]
|$9 = {fd = -1, fdstate = 0, resowner = 0x24f93e0, nextFree = 2393, lruMoreRecently = 0, lruLessRecently = 2360, seekPos = 73016512, fileSize = 0, fileName = 0x0, fileFlags = 2, fileMode = 384}

Knowing this report, very possibly this was a tempfile, possible a parallel fileset.
I don't see parallel workers in the query plan, but I do have external sort:

-> Sort (cost=20801.16..20801.60 rows=175 width=628) (actual time=39150.619..40730.793 rows=2002440 loops=1)
Sort Key: data_cell.sect_id
Sort Method: external sort Disk: 1613224kB
-> Hash Join (cost=20649.46..20794.64 rows=175 width=628) (actual time=736.734..2423.020 rows=2002440 loops=1)
Hash Cond: (eric_enodeb_cell_201908.start_time = data_cell.period)

Note, we run report with enable_nestloop=off (say what you will). And the
report runs within an transaction which we roll back. I suspect that's maybe
relevant for cleaning up files.

Ah, it's more than 24h old but saved the logfile with crash marker, so I found:

sudo zgrep 26188 /var/log/postgresql/crash-postgresql-2019-08-24_121600.log.gz
< 2019-08-24 12:16:17.037 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.5", size 336726940
< 2019-08-24 12:16:17.038 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.54", size 6354
< 2019-08-24 12:16:20.081 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.6", size 270914376
< 2019-08-24 12:16:20.083 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.39", size 7024
< 2019-08-24 12:16:23.464 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.7", size 213332328
< 2019-08-24 12:16:23.465 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.37", size 5984
...
< 2019-08-24 12:17:42.966 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.125", size 6534
< 2019-08-24 12:17:43.035 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.97", size 72828152
< 2019-08-24 12:17:43.036 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.112", size 6474
< 2019-08-24 12:17:43.114 CDT telsasoft >LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp26188.90", size 73016512
< 2019-08-24 12:17:50.109 CDT >LOG: server process (PID 26188) was terminated by signal 6: Aborted

sudo zgrep -cFw 'pgsql_tmp26188' /var/log/postgresql/crash-postgresql-2019-08-24_121600.log.gz
=> 116

sudo zgrep -Fw 'pgsql_tmp26188' /var/log/postgresql/crash-postgresql-2019-08-24_121600.log.gz |awk '{s+=$NF}END{print s/1024^3}'
9.86781

FWIW:

log_temp_files | 0
stats_temp_directory | pg_stat_tmp
temp_file_limit | -1
temp_tablespaces | """"""""""""""

Apparently, the last is garbage from our upgrade script, but in any case it
went to PGDATA.

Justin

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Moon, Insung 2019-08-26 03:08:22 Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)
Previous Message Vik Fearing 2019-08-26 02:43:52 Re: Outputting Standard SQL