Re: skink's test_decoding failures in 9.4 branch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: skink's test_decoding failures in 9.4 branch
Date: 2016-07-20 17:01:59
Message-ID: 20160720170159.i7ctvb3i6ut3aref@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2016-07-20 12:45:04 -0400, Tom Lane wrote:
> I wrote:
> > I've still had no luck reproducing it here, though.

Same here so far.

> Hah --- I take that back. On about the fourth or fifth trial:

Interesting.

> ==00:00:00:34.291 21525== Invalid read of size 1
> ==00:00:00:34.291 21525== at 0x4A08DEC: memcpy (mc_replace_strmem.c:882)
> ==00:00:00:34.291 21525== by 0x66FA54: DecodeXLogTuple (decode.c:899)
> ==00:00:00:34.291 21525== by 0x670561: LogicalDecodingProcessRecord (decode.c:711)
> ==00:00:00:34.291 21525== by 0x671BC3: pg_logical_slot_get_changes_guts (logicalfuncs.c:440)
> ==00:00:00:34.291 21525== by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
> ==00:00:00:34.291 21525== by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
> ==00:00:00:34.291 21525== by 0x5C170D: ExecScan (execScan.c:82)
> ==00:00:00:34.291 21525== by 0x5BA007: ExecProcNode (execProcnode.c:426)
> ==00:00:00:34.291 21525== by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
> ==00:00:00:34.291 21525== by 0x6BFE36: PortalRunSelect (pquery.c:942)
> ==00:00:00:34.291 21525== by 0x6C11EF: PortalRun (pquery.c:786)
> ==00:00:00:34.291 21525== by 0x6BD7E3: exec_simple_query (postgres.c:1072)
> ==00:00:00:34.291 21525== Address 0xe5311d6 is 6 bytes after a block of size 8,192 alloc'd
> ==00:00:00:34.291 21525== at 0x4A06A2E: malloc (vg_replace_malloc.c:270)
> ==00:00:00:34.291 21525== by 0x4ED399: XLogReaderAllocate (xlogreader.c:83)
> ==00:00:00:34.291 21525== by 0x6710B3: StartupDecodingContext (logical.c:161)
> ==00:00:00:34.291 21525== by 0x671303: CreateDecodingContext (logical.c:413)
> ==00:00:00:34.291 21525== by 0x671AF7: pg_logical_slot_get_changes_guts (logicalfuncs.c:394)
> ==00:00:00:34.291 21525== by 0x5C0B6B: ExecMakeTableFunctionResult (execQual.c:2196)
> ==00:00:00:34.291 21525== by 0x5D4131: FunctionNext (nodeFunctionscan.c:95)
> ==00:00:00:34.291 21525== by 0x5C170D: ExecScan (execScan.c:82)
> ==00:00:00:34.291 21525== by 0x5BA007: ExecProcNode (execProcnode.c:426)
> ==00:00:00:34.291 21525== by 0x5B8A61: standard_ExecutorRun (execMain.c:1490)
> ==00:00:00:34.291 21525== by 0x6BFE36: PortalRunSelect (pquery.c:942)
> ==00:00:00:34.291 21525== by 0x6C11EF: PortalRun (pquery.c:786)
> ==00:00:00:34.291 21525==

> This is rather interesting because I do not recall that any of skink's
> failures have shown an access more than 1 byte past the end of the buffer.
>
> Any suggestions how to debug this?

I guess either using valgrind's gdb server on error, or putting some
asserts checking the size would be best. I can look into it, but it'll
not be today likely.

Regards,

Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2016-07-20 17:07:28 Re: Design for In-Core Logical Replication
Previous Message Teodor Sigaev 2016-07-20 16:53:18 Re: One process per session lack of sharing