Re: LISTEN denial of service with aborted transaction

From: Matt Newell <newellm(at)blur(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, it(at)blur(dot)com
Subject: Re: LISTEN denial of service with aborted transaction
Date: 2015-09-30 02:37:38
Message-ID: 4705384.iNuYeV9kCG@obsidian
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, September 29, 2015 09:58:35 PM Tom Lane wrote:
> Matt Newell <newellm(at)blur(dot)com> writes:
> > On Monday, September 28, 2015 07:22:26 PM Tom Lane wrote:
> >> Possibly. sinval catchup notification works like that, so you could
> >> maybe
> >> invent a similar mechanism for notify. Beware that we've had to fix
> >> performance issues with sinval catchup; you don't want to release a whole
> >> bunch of processes to do work at the same time.
> >
> > I'll have to think about this more but i don't envision it causing such as
> > scenario.
> > The extra processing that will happen at signaling time would have been
> > done anyway when the aborted transaction is finally rolled back, the only
> > extra work is copying the relevant notifications to local memory.
>
> Hm. An idle-in-transaction listening backend will eventually cause a more
> serious form of denial of service: once the pg_notify SLRU area fills up
> to its maximum of 8GB, no more new messages can be inserted, and every
> transaction that tries to do a NOTIFY will abort. However, that's a
> documented hazard and we've not really had complaints about it. In any
> case, trying to fix that by having such a backend absorb messages into
> local memory doesn't seem like a great idea: if it sits for long enough,
> its local memory usage will bloat. Eventually you'd have a failure
> anyway.
>
> > Regardless it might not be worthwhile since my patch for #1 seems to
> > address the symptom that bit me.
>
> I looked at this patch for a bit and found it kind of ugly, particularly
> the business about "we allow one process at a time to advance
> firstUncommitted by using advancingFirstUncommitted as a mutex".
> That doesn't sound terribly safe or performant.
>
It can be implemented without that but then you'll have multiple backends
trying to do the same work. This might not be an issue at all but I couldn't
tell at first glance the potential cost of extra calls to
TransactionIdIsInProgress. Since there are no extra locks taken I figured
ensuring the work is only done once is good for performance.

If the cluster only has one database generating notifies then there is
practically no extra work with the patch.

As far as safety is concerned I think the worst case is that behavior returns
to what it is now, where firstUncommitted ends up tracking QUEUE_TAIL.

I originally used a boolean then realized I could get some extra safety by
using the backendId so that the mutex would be released automatically if the
backend dies.

> After further thought, I wonder whether instead of having an incoming
> listener initialize its "pos" to QUEUE_TAIL, we couldn't have it
> initialize to the maximum "pos" among existing listeners (with a floor of
> QUEUE_TAIL of course). If any other listener has advanced over a given
> message X, then X was committed (or aborted) earlier and the new listener
> is not required to return it; so it should be all right to take that
> listener's "pos" as a starting point.

That's a great idea and I will try it. It does need to be the maximum
position of existing listeners *with matching db oid" since
asyncQueueReadAllNotifications doesn't check transaction status if the db oid
doesn't match it's own. Another advantage of this approach is that a queued
notify from an open transaction in one database won't incur additional cost on
listens coming from other databases, whereas with my patch such a situation
will prevent firstUncommitted from advancing.

>
> The minimum-code-change way to do that would be to compute the max pos
> the hard way while Exec_ListenPreCommit holds the AsyncQueueLock, which
> it would now have to do in exclusive mode. That's a little bit annoying
> but it's surely not much worse than what SignalBackends has to do after
> every notification.
Yeah I think it's going to be a win regardless. Could also skip the whole
thing if QUEUE_HEAD - QUEUE_TAIL is under a fixed amount, which would probably
cover a lot of use cases.

>
> Alternatively, we could try to maintain a shared pointer that is always
> equal to the frontmost backend's "pos". But I think that would require
> more locking during queue reading than exists now; so it's far from clear
> it would be a win, especially in systems where listening sessions last for
> a reasonable amount of time (so that Exec_ListenPreCommit is infrequent).
And that would be even more complicated since it has to be per db oid.

Matt Newell

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2015-09-30 04:01:33 Re: No Issue Tracker - Say it Ain't So!
Previous Message Michael Paquier 2015-09-30 02:02:38 Re: [PATCH 5/6] pg_basebackup: allow GetConnection() to make non-replication connections.