Re: src/test/subscription/t/002_types.pl hanging on particular environment

From: Andres Freund <andres(at)anarazel(dot)de>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: src/test/subscription/t/002_types.pl hanging on particular environment
Date: 2017-09-18 10:18:10
Message-ID: 20170918101810.l5irpe2bdb6xkksi@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2017-09-18 21:57:04 +1200, Thomas Munro wrote:
> The subscription tests 002_types.pl sometimes hangs for a while and
> then times out when run on a Travis CI build VM running Ubuntu Trusty
> if --enable-coverage is used.

Yea, I saw that too.

> I guess it might be a timing/race
> problem because I can't think of any mechanism specific to coverage
> instrumentation except that it slows stuff down, but I don't know.
> The only other thing I can think of is that perhaps the instrumented
> code is writing something to stdout or stderr and that's finding its
> way into some protocol stream and confusing things, but I can't
> reproduce this on any of my usual development machines. I haven't
> tried that exact operating system. Maybe it's a bug in the toolchain,
> but that's an Ubuntu LTS release so I assume other people use it
> (though I don't see it in the buildfarm).

I've run this locally through a number of iterations with coverage
enabled, I think I reproduced it once, but unfortunately I'd continued
because I was working on something else at that moment.

It might be worthwhile to play around with replacing the or die in
my $synced_query =
"SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT IN ('s', 'r');";
$node_subscriber->poll_query_until('postgres', $synced_query)
or die "Timed out while waiting for subscriber to synchronize data";
with something like
or diag($node_subscriber->safe_psql('postgres', 'SELECT * FROM pg_subscription_rel')
just to know where to go from here.

> WARNING: terminating connection because of crash of another server process
> DETAIL: The postmaster has commanded this server process to roll
> back the current transaction and exit, because another server process
> exited abnormally and possibly corrupted shared memory.
> HINT: In a moment you should be able to reconnect to the database
> and repeat your command.
>
> As far as I know these are misleading, it's really just an immediate
> shutdown. There is no core file, so I don't believe anything actually
> crashed.

I was about to complain about these, for entirely unrelated reasons. I
think it's a bad idea - and there's a couple complains on the lists too,
to emit these warnings. It's not entirely trivial to fix though :(

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2017-09-18 10:25:04 Re: [PATCH] Generic type subscripting
Previous Message Thomas Munro 2017-09-18 09:57:04 src/test/subscription/t/002_types.pl hanging on particular environment