Re: BUG #10123: Weird entries in pg_stat_activity

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #10123: Weird entries in pg_stat_activity
Date: 2014-05-25 09:40:36
Message-ID: CAK-MWwSVCAX9m4S4C948gZ-DfKFLpho=fEoq1AJ8F17tzCTvmg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Apr 25, 2014 at 11:04 AM, Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

>
>
>
> On Fri, Apr 25, 2014 at 4:38 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> maxim(dot)boguk(at)gmail(dot)com writes:
>> > During debug of the weird nagios database alerts about long
>> transactions I
>> > found that sometime pg_stat_activity contains very weird entries with
>> > xact_start NOT NULL and state='idle'...
>> > ...
>> > What's more: i tested that problem on another databases - and found they
>> > have same effect (so it doesn't look like a problem local to that single
>> > exact database).
>>
>> So can you provide a self-contained test case for this? I tried to
>> reproduce it here without much success, and I don't see a code path
>> other than PREPARE TRANSACTION that could lead to this.
>>
>> regards, tom lane
>>
>
> I going to work on test case over weekend.
> Initial quick and dirty attempts to reproduce outside of complicated
> production environments provided no results yet.
>
> What I almost sure now that reproducing of the bug requires a mix of very
> active connections to the database (with a lot of requests per second) with
> semi-idle connections.
> And that transient problem affects only these semi-idle connections.
>
> I will provide an update when I get more information.
>
> Kind Regards,
> Maksym
>

Unfortunately even if problem can be seen in at least 10 different project
and db-servers, creating reproducible test case had not been successful.
So I decided work with gdb on the real database.

At this moment I dig only to the pgstat.c :
pgstat_read_current_status(void) function.

gdb:

set print pretty on
b pgstat.c:2796 if ((localentry->st_procpid>0) &&
(localentry->st_xact_start_timestamp>0) &&
(localentry->st_state==STATE_IDLE))
cont

2796 line in 9.3.4 source is pgstat_read_current_status function just after
memcpy(localentry, (char *) beentry,
sizeof(PgBackendStatus));

After short time running:
select pg_sleep(0.01); select now() as ts,now()-xact_start as tx_age,
now()-state_change as change_age,* from pg_stat_activity where state='idle'
and xact_start is not null;
in loop in process under gdb I got conditional breakpoint fired with the
next results:

(gdb) cont
Continuing.

Breakpoint 1, pgstat_read_current_status () at
/tmp/buildd/postgresql-9.3-9.3.4/build/../src/backend/postmaster/pgstat.c:2796
2796 strcpy(localappname, (char *)
beentry->st_appname);
(gdb) p *localentry
$1 = {
st_changecount = 1222944,
st_procpid = 16215,
st_proc_start_timestamp = 454324143169490,
st_xact_start_timestamp = 454324206165363,
st_activity_start_timestamp = 454324206165363,
st_state_start_timestamp = 454324206165650,
st_databaseid = 16423,
st_userid = 25078444,
st_clientaddr = {
addr = {
ss_family = 2,
__ss_align = 0,
__ss_padding = '\000' <repeats 111 times>
},
salen = 16
},
st_clienthostname = 0x7fac4344fbe0 "",
st_waiting = 0 '\000',
st_state = STATE_IDLE,
st_appname = 0x7fac624d8ff0 "",
st_activity = 0x7fac624e4bb0 "SELECT
\"last_host_hour_accumulation_flag\".\"id\",
\"last_host_hour_accumulation_flag\".\"day\",
\"last_host_hour_accumulation_flag\".\"hour\",
\"last_host_hour_accumulation_flag\".\"type_id\", \"last_host_hour_accum"...
}

Please note that the:
st_xact_start_timestamp = 454324206165363,
st_activity_start_timestamp = 454324206165363,

and in the same time:
st_state = STATE_IDLE

On the next iteration after 0.01 second st_proc_start_timestamp and
st_activity_start_timestamp returned to 0 again.

Now I have no good idea how to perform the future debug because
BackendStatusArray is shared memory structure with every backend writing
information into it (so I can't set a conditional breakpoint in specific
process)
Ideas welcome.

Kind Regards,
Maksym

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-05-25 17:42:48 Re: BUG #10436: Can't compile 9.4 with this mingw64-w32 gcc 4.8.1, 4.8.2, 4.8.3
Previous Message Paragon Corporation 2014-05-25 02:57:52 Re: BUG #10436: Can't compile 9.4 with this mingw64-w32 gcc 4.8.1, 4.8.2, 4.8.3