Re: test_shm_mq failing on mips*

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Christoph Berg <cb(at)df7cb(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Dave Page <dave(dot)page(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, CM Team <cm(at)enterprisedb(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, bernd(dot)helmle(at)credativ(dot)de
Subject: Re: test_shm_mq failing on mips*
Date: 2014-12-02 15:12:29
Message-ID: CA+Tgmoa9HE=1GObU7MKuGLDBjBNSNRW0bDE4P3JA=P=9mqGgqw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 25, 2014 at 10:42 AM, Christoph Berg <cb(at)df7cb(dot)de> wrote:
> Re: To Robert Haas 2014-11-24 <20141124200824(dot)GA22662(at)msg(dot)df7cb(dot)de>
>> > Does it fail every time when run on a machine where it fails sometimes?
>>
>> So far there's a consistent host -> fail-or-not mapping, but most
>> mips/mipsel build hosts have seen only one attempt so far which
>> actually came so far to actually run the shm_mq test.
>
> I got the build rescheduled on the same machine and it's hanging
> again.
>
>> Atm I don't have access to the boxes where it was failing (the builds
>> succeed on the mips(el) porter hosts available to Debian developers).
>> I'll see if I can arrange access there and run a test.
>
> Julien Cristau was so kind to poke into the hanging processes. The
> build has been stuck now for about 4h, while that postgres backend has
> only consumed 4s of CPU time (according to plain "ps"). The currently
> executing query is:
>
> SELECT test_shm_mq_pipelined(16384, (select string_agg(chr(32+(random()*95)::int), '') from generate_series(1,270000)), 200, 3);
>
> (Waiting f, active, backend_start 6s older than xact_start, xact_start
> same as query_start, state_change 19盜 newer, all 4h old)

I can't tell from this exactly what's going wrong. Questions:

1. Are there any background worker processes running at the same time?
If so, how many? We'd expect to see 3.
2. Can we printout of the following variables in stack frame 3
(test_shm_mq_pipelined)? send_count, loop_count, *outqh, *inqh,
inqh->mqh_queue[0], outqh->mqh_queue[0]
3. What does a backtrace of each background worker process look like?
If they are stuck inside copy_messages(), can you print *outqh, *inqh,
inqh->mqh_queue[0], outqh->mqh_queue[0] from that stack frame?

Sorry for the hassle; I just don't have a better idea how to debug this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2014-12-02 15:23:04 Re: Nitpicky doc corrections for BRIN functions of pageinspect
Previous Message Andres Freund 2014-12-02 15:04:52 Re: pg_stat_statement normalization fails due to temporary tables