Re: segmentation fault when cassert enabled

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org, <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: segmentation fault when cassert enabled
Date: 2019-11-05 16:29:18
Message-ID: 20191105172918.3e32a446@firost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 25 Oct 2019 12:28:38 -0400
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com> writes:
> > When investigating for the bug reported in thread "logical replication -
> > negative bitmapset member not allowed", I found a way to seg fault
> > postgresql only when cassert is enabled.
> > ...
> > I hadn't time to digg further yet. However, I don't understand why this
> > crash is triggered when cassert is enabled.
>
> Most likely, it's not so much assertions that provoke the crash as
> CLOBBER_FREED_MEMORY, ie the actual problem here is use of already-freed
> memory.

Thank you. Indeed, enabling CLOBBER_FREED_MEMORY on its own is enough to
trigger the segfault.

In fact, valgrind detect it as an uninitialised value, no matter
CLOBBER_FREED_MEMORY is defined or not:

Conditional jump or move depends on uninitialised value(s)
at 0x43F410: slot_modify_cstrings (worker.c:398)
by 0x43FBE9: apply_handle_update (worker.c:744)
by 0x440088: apply_dispatch (worker.c:968)
by 0x4405D7: LogicalRepApplyLoop (worker.c:1175)
by 0x440CD0: ApplyWorkerMain (worker.c:1733)
by 0x411C34: StartBackgroundWorker (bgworker.c:834)
by 0x41EA24: do_start_bgworker (postmaster.c:5763)
by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976)
by 0x41F562: sigusr1_handler (postmaster.c:5161)
by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
by 0x4B31FF6: select (select.c:41)
by 0x41FDDE: ServerLoop (postmaster.c:1668)
Uninitialised value was created by a heap allocation
at 0x5C579B: palloc (mcxt.c:949)
by 0x437116: logicalrep_rel_open (relation.c:270)
by 0x43FA8F: apply_handle_update (worker.c:684)
by 0x440088: apply_dispatch (worker.c:968)
by 0x4405D7: LogicalRepApplyLoop (worker.c:1175)
by 0x440CD0: ApplyWorkerMain (worker.c:1733)
by 0x411C34: StartBackgroundWorker (bgworker.c:834)
by 0x41EA24: do_start_bgworker (postmaster.c:5763)
by 0x41EB6F: maybe_start_bgworkers (postmaster.c:5976)
by 0x41F562: sigusr1_handler (postmaster.c:5161)
by 0x48A072F: ??? (in /lib/x86_64-linux-gnu/libpthread-2.28.so)
by 0x4B31FF6: select (select.c:41)

My best bet so far is that logicalrep_relmap_invalidate_cb is not called after
the DDL on the subscriber so the relmap cache is not invalidated. So we end up
with slot->tts_tupleDescriptor->natts superior than rel->remoterel->natts in
slot_store_cstrings, leading to the overflow on attrmap and the sigsev.

I hadn't follow this path yet.

By the way, I noticed attrmap is declared as AttrNumber * in struct
LogicalRepRelMapEntry, AttrNumber being typedef'd as an int16. However, attrmap
is allocated based on sizeof(int) in logicalrep_rel_open:

entry->attrmap = palloc(desc->natts * sizeof(int));

It doesn't look like a major problem, it just allocates more memory than
needed.

Regards,

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message rtorre 2019-11-05 16:49:27 Re: [Proposal] Arbitrary queries in postgres_fdw
Previous Message Fujii Masao 2019-11-05 15:56:51 Re: pgbench - extend initialization phase control