Re: fixing LISTEN/NOTIFY

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Neil Conway <neilc(at)samurai(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: fixing LISTEN/NOTIFY
Date: 2005-10-06 13:53:36
Message-ID: Pine.OSF.4.61.0510061605580.76160@kosh.hut.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

First of all, I'd like to give a name to the thing that a backend listens
to. The code talks about "listening to a relation", but as the
comments there point out, the name doesn't actually identify a relation.
I'm going to call it a topic from now on.

I'd like to point out that we don't really have a notification queue,
but a notification flag for each topic. We're not constrained by the
number of notifications, but by the number of topics.

It might make sense to change the semantics so that we never lose a
notification, if we're going to implement NOTIFY 'msg', but that's another
discussion.

I've been thinking about the options for shmem data structure. Replacing
pg_listener in the straightforward way would give us an array:

struct {
char topicname[NAMEDATALEN];
int listenerpid;
int notification;
} listenertable[max_subscriptions];

Where max_subscriptions is the maximum number of active LISTENs.

If we're ready to take the performance hit, we can exploit the fact that
it's signal extra backends, as long as those backends can figure out that
it was a false alarm. In fact, we can always signal all backends.

Exploiting that, we could have:

struct {
char topicname[NAMEDATALEN];
int subscriptions; /* number of active subscriptions for this topic */
int notification_counter; /* increase by 1 on each NOTIFY */
} listenertable[max_topics]

NOTIFY increases the notification_counter by one, and signals all
backends. Every backend keeps a private list of
(topicname, notification_counter) pairs for the topics it's subscribed
to, in addition to the shared memory table. The signal handler compares
the notification_counter in the private list and in the listenertable. If
they don't match, notify the client. If they match, it was a false alarm.

The shmem requirement of this is
max_topics * (NAMEDATALEN + sizeof(int) * 2)

If we're not ready to take the performance hit, we can add the list of
backends to shmem:

struct {
char topicname[NAMEDATALEN];
int subscriptions; /* number of active subscriptions for this topic */
int notification_counter; /* increase by 1 on each NOTIFY */
int subscribers[max_backends];
} listenertable[max_topics]

and only signal those backends that are in the subscribers array.

The shmem requirement of this is
max_topics * (NAMEDATALEN + sizeof(int) * (2 + max_backends))

We can also do a tradeoff between shmem usage and unnecessary signals:

struct {
char topicname[NAMEDATALEN];
int subscriptions; /* number of active subscriptions for this topic */
int notification_counter; /* increase by 1 on each NOTIFY */
int subscriber_cache[cache_size];
} listenertable[max_topics]

Where cache_size can be any number
max_topics * (NAMEDATALEN + sizeof(int) * (2 + cache_size))

Where cache_size can be anything between 0 and max_backends. If the cache
gets full, NOTIFY signals all backends. Otherwise, only those that are in
the cache.

If max_topics is large, a hash table should be used instead of an array of
structs.

Now that I think of it, using the notification_counter, we *can* guarantee
that no notification is lost. The signal handler just needs to notify the
client (shmem notification_counter) - (private notification_counter)
times.

- Heikki

On Thu, 6 Oct 2005, Neil Conway wrote:

> Applications that frequently use LISTEN/NOTIFY can suffer from
> performance problems because of the MVCC bloat created by frequent
> insertions into pg_listener. A solution to this has been suggested in
> the past: rewrite LISTEN/NOTIFY to use shared memory rather than system
> catalogs.
>
> The problem is that there is a static amount of shared memory and a
> potentially unbounded number of notifications, so we can run out of
> memory. There are two ways to solve this: we can do as sinval does and
> clear the shared memory queue, then effectively issue a NOTIFY ALL that
> awakens all listeners. I don't like this behaviour: it seems ugly to
> expose an implementation detail (static sizing of shared memory) to
> applications. While a lot of applications are only using LISTEN/NOTIFY
> for cache invalidation (and so spurious notifications are just a
> performance hit), this behaviour still seems unfortunate to me. Using
> NOTIFY ALL also makes NOTIFY 'msg' far less useful, which is a feature
> several users have asked for in the past.
>
> I think it would be better to either fail the NOTIFY when there is not
> enough shared memory to add a new notification to the queue, or have the
> NOTIFY block until shared memory does become available (applications
> could of course implement the latter on top of the former by using
> savepoints and a loop, either on the client-side or in PL/PgSQL). I
> guess we could add an option to NOTIFY to specify how to handle
> failures.
>
> A related question is when to add the notification to the shared memory
> queue. We don't want the notification to fire until the NOTIFY's
> transaction commits, so one alternative would be to delay appending to
> the queue until transaction-commit time. However, that would mean we
> wouldn't notice NOTIFY failure until the end of the transaction, or else
> that we would block waiting for free space during the transaction-commit
> process. I think it would be better to add an entry to shared memory
> during the NOTIFY itself, and stamp that entry with the NOTIFY's
> toplevel XID. Other backends can read that the notification immediately
> (and once all the backends have seen it, the notification can be removed
> from the queue). Each backend can use the XID to determine when to
> "fire" the notification (and if the notifying backend rolls back, they
> can just discard the notification). This scheme is more expensive when
> the notifying transaction rolls back, but I don't think that is the
> common case.
>
> Comments? (I'm still thinking about how to organize the shared memory
> queue, and whether any of the sinval stuff can be reused...)
>
> -Neil
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message smile khmer 2005-10-06 14:06:59 PG function call
Previous Message Alvaro Herrera 2005-10-06 13:49:04 Re: fixing LISTEN/NOTIFY