Re: BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Marko Tiikkaja <marko(at)joh(dot)to>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24
Date: 2017-10-09 15:15:10
Message-ID: 15920.1507562110@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Marko Tiikkaja <marko(at)joh(dot)to> writes:
> After running it for a few days I start getting logged messages such as:

> out of order notification Q_97882353: 97882353 != 97882349 + 1 (prefix Q)
> out of order notification F_97947433: 97947433 != 97947429 + 1 (prefix F)
> out of order notification F_97947439: 97947439 != 97947436 + 1 (prefix F)

> I did it on both 9.1.24 and 9.6.5 and they both exhibit the same behavior:
> it takes days to get into this state, but then notifications are missed all
> the time. I currently have both systems in this state, so any idea what to
> look at to try and debug this further?

You might try gdb'ing the recipient and stepping through
asyncQueueProcessPageEntries to see what happens. Are the missing
entries present in the queue but it decides to ignore them for some
reason, or are they just not there?

An interesting black-box test might be to do this with two receiver
processes and see if they miss identical sets of messages. That
would be a different way of triangulating on question number 1,
which is whether the sender or the recipient is at fault.

I wonder whether the long ramp-up time indicates that you have to
wrap around some counter somewhere before things go south. Although
the only obvious candidate is wrapping the pg_notify SLRU queue,
and I'd think that would have happened many times already.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2017-10-10 02:29:44 Re: 10.0: Logical replication doesn't execute BEFORE UPDATE OF <columns> trigger
Previous Message Marko Tiikkaja 2017-10-09 14:52:16 Re: BUG #14830: Missed NOTIFications, PostgreSQL 9.1.24