Re: NOTIFY performance

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Artur Zając <azajac(at)ang(dot)com(dot)pl>, pgsql-performance(at)postgresql(dot)org
Subject: Re: NOTIFY performance
Date: 2012-08-31 20:22:59
Message-ID: 18884.1346444579@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:
> I wonder if should be trying to drop duplicates at all. I think that
> doing that made a lot more sense before payloads existed.

Perhaps, but we have a lot of history to be backwards-compatible with.

> The docs said that the system "can" drop duplicates, so making it no
> longer do so would be backwards compatible.

Maybe compatible from a language-lawyerly point of view, but the
performance characteristics would be hugely different - and since this
complaint is entirely about performance, I don't think it's fair to
ignore that. We'd be screwing people who've depended on the historical
behavior to accommodate people who expect something that never worked
well before to start working well.

The case that I'm specifically worried about is rules and triggers that
issue NOTIFY without worrying about generating lots of duplicates when
many rows are updated in one command.

> Maybe drop duplicates where the payload was the empty string, but keep
> them otherwise?

Maybe, but that seems pretty weird/unpredictable. (In particular, if
you have a mixed workload with some of both types of notify, you lose
twice: some of the inserts will need to scan the list, so that cost
is still quadratic, but you still have a huge event list to dump into
the queue when the time comes.)

I seem to recall that we discussed the idea of checking only the last N
notifies for duplicates, for some reasonably small N (somewhere between
10 and 100 perhaps). That would prevent the quadratic behavior and yet
also eliminate dups in most of the situations where it would matter.
Any N>1 would require a more complicated data structure than is there
now, but it doesn't seem that hard.

The other thing we'd need to find out is whether that's the only problem
for generating bazillions of notify events per transaction. It won't
help to hack AsyncExistsPendingNotify if dropping the events into the
queue is still too expensive. I am worried about the overall processing
cost here, consumers and producers both.

regards, tom lane

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Craig Ringer 2012-09-01 00:43:49 Re: JDBC 5 million function insert returning Single Transaction Lock Access Exclusive Problem
Previous Message Jeff Janes 2012-08-31 19:54:26 Re: NOTIFY performance