Re: proposal: make NOTIFY list de-duplication optional

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Filip Rembiałkowski <filip(dot)rembialkowski(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>, Catalin Iacob <iacobcatalin(at)gmail(dot)com>, David Steele <david(at)pgmasters(dot)net>
Subject: Re: proposal: make NOTIFY list de-duplication optional
Date: 2019-07-27 00:20:06
Message-ID: 17822.1564186806@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

=?UTF-8?Q?Filip_Rembia=C5=82kowski?= <filip(dot)rembialkowski(at)gmail(dot)com> writes:
> Still no hash table fallback is implemented, so this is *not* a
> performance improvement. Only a little more flexibility.

I think that we'd probably be better off fixing the root performance issue
than adding semantic complexity to bypass it. Especially since, if you
don't de-duplicate, that's going to cost you when it comes time to
actually send the notifications, receive them, forward them to clients,
and process them in the clients.

Admittedly, if the app *knows* that it's generating non-duplicate events,
maybe it'd be all right to skip checking that. But if we can make the
check cheap, I'd just as soon keep it.

Accordingly, I looked into making a hash table when there are more than
a small number of notifications pending, and attached is a lightly-tested
version of that. This seems to be more or less similar speed to the
existing code for up to 100 or so distinct notifies, but it soon pulls
away above that.

A point that needs discussion is that this patch, unlike the existing
code, *does* de-duplicate fully: any events generated by a subtransaction
that duplicate events already emitted by a parent will get removed when
the subxact is merged to its parent. I did this partly because we have
to expend O(N) work to merge N subtransaction notifies in any case,
now that we have to make new hashtable entries in the parent xact;
so the old excuse that subxact-end processing is really cheap no
longer applies. Also because the Assert(!found) assertions in the
attached hash coding fall over if we cheat on this. If we really want
to maintain exactly the old semantics here, we could relax the hashtable
code to just ignore duplicate entries. But, per the argument above,
de-duplication is a good thing so I'm inclined to keep it like this.

I also noticed that as things stand, it costs us two or three pallocs to
construct a Notification event. It wouldn't be terribly hard to reduce
that to one palloc, and maybe it'd be worthwhile if we're thinking that
transactions with many many notifies are a case worth optimizing.
But I didn't do that here; it seems like a separable patch.

I also thought for awhile about not having the hashtable as an auxiliary
data structure, but making it the main data structure. We could preserve
the required notification ordering information by threading the live
hashtable entries into an slist, say. However, this would greatly
increase the overhead for transactions with just one or a few distinct
NOTIFY events, since we'd have to set up the hashtable at the first one.
I think that's a common enough case that we shouldn't de-optimize it.
A smaller objection is that such a data structure would absolutely commit
us to de-duplication semantics, whereas the list plus separate hashtable
can cope with not de-duping if someone persuades us that's sane.

Thoughts?

regards, tom lane

Attachment Content-Type Size
detect-dup-notifies-by-hashing-1.patch text/x-diff 12.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-07-27 00:24:16 Re: TopoSort() fix
Previous Message Andres Freund 2019-07-26 23:48:35 Re: TopoSort() fix