RE: Forget close an open relation in ReorderBufferProcessTXN()

From: "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>
To: 'Amit Langote' <amitlangote09(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Japin Li <japinli(at)hotmail(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Forget close an open relation in ReorderBufferProcessTXN()
Date: 2021-05-21 07:26:32
Message-ID: OSBPR01MB48880D3B760074587D4D2424ED299@OSBPR01MB4888.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, May 21, 2021 3:55 PM I wrote:
> On Thursday, May 20, 2021 9:59 PM Amit Langote
> <amitlangote09(at)gmail(dot)com> wrote:
> > Here are updated/divided patches.
> Thanks for your updates.
>
> But, I've detected segmentation faults caused by the patch, which can
> happen during 100_bugs.pl in src/test/subscription.
> This happens more than one in ten times.
>
> This problem would be a timing issue and has been introduced by v3 already.
> I used v5 for HEAD also and reproduced this failure, while OSS HEAD doesn't
> reproduce this, even when I executed 100_bugs.pl 200 times in a tight loop.
> I aligned the commit id 4f586fe2 for all check. Below logs are ones I got from v3.
>
> * The message of the failure during TAP test.
>
> # Postmaster PID for node "twoways" is 5015 Waiting for replication conn
> testsub's replay_lsn to pass pg_current_wal_lsn() on twoways #
> poll_query_until timed out executing this query:
> # SELECT pg_current_wal_lsn() <= replay_lsn AND state = 'streaming'
> FROM pg_catalog.pg_stat_replication WHERE application_name = 'testsub';
> # expecting this output:
> # t
> # last actual query output:
> #
> # with stderr:
> # psql: error: connection to server on socket
> "/tmp/cs8dhFOtZZ/.s.PGSQL.59345" failed: No such file or directory
> # Is the server running locally and accepting connections on that
> socket?
> timed out waiting for catchup at t/100_bugs.pl line 148.
>
>
> The failure produces core file and its back trace is below.
> My first guess of the cause is that between the timing to get an entry from
> hash_search() in get_rel_sync_entry() and to set the map by
> convert_tuples_by_name() in maybe_send_schema(), we had invalidation
> message, which tries to free unset descs in the entry ?
Sorry, this guess was not accurate at all.
Please ignore this because we need to have the entry->map set
to free descs. Sorry for making noises.

Best Regards,
Takamichi Osumi

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2021-05-21 07:42:42 Re: Forget close an open relation in ReorderBufferProcessTXN()
Previous Message Dilip Kumar 2021-05-21 06:55:11 Re: Move pg_attribute.attcompression to earlier in struct for reduced size?