Re: bad COPY performance with NOTIFY in a trigger

From: Filip Rembiałkowski <filip(dot)rembialkowski(at)gmail(dot)com>
To: Merlin Moncure <mmoncure(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, postgres performance list <pgsql-performance(at)postgresql(dot)org>
Subject: Re: bad COPY performance with NOTIFY in a trigger
Date: 2016-02-06 12:03:53
Message-ID: CAP_rwwk7revDKQVyP--e4ph9fhGkzvJ8ZSBoN+hctzFkL7VL6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Thanks for the feedback.

This patch is my first and obvious approach.

@Merlin, I'm not sure if I get your idea:
- keep previous behaviour as obligatory? (which is: automatic
de-duplicating of incoming messages by channel+payload),
- instead of trivial search (sorting by browsing) use some kind of
faster lookups?

I'm not sure if this statement in async.c is carved in stone:

* Duplicate notifications from the same transaction are sent out as one
* notification only. This is done to save work when for example a trigger
* on a 2 million row table fires a notification for each row that has been
* changed. If the application needs to receive every single notification
* that has been sent, it can easily add some unique string into the extra
* payload parameter.

1) "work-saving" is disputable in some cases

2) an idea to "add some unique string" is OK logical-wise but it's not
OK performance-wise.

Current search code is a sequential search:
https://github.com/filiprem/postgres/blob/master/src/backend/commands/async.c#L2139

I'm not that smart to devise an algorithm for faster lookups -
probably you guys can give some advice.

Again, my rationale is... This feature can burn a lot of CPU for
nothing. I was hoping to use NOTIFY/LISTEN as superfast notification
mechanism. Superfast regardless on whether you insert 100, 10k or 1m
rows.

On Fri, Feb 5, 2016 at 8:52 PM, Merlin Moncure <mmoncure(at)gmail(dot)com> wrote:
> On Fri, Feb 5, 2016 at 9:33 AM, Filip Rembiałkowski
> <filip(dot)rembialkowski(at)gmail(dot)com> wrote:
>> patch submitted on -hackers list.
>> http://www.postgresql.org/message-id/CAP_rwwn2z0gPOn8GuQ3qDVS5+HgEcG2EzEOyiJtcA=vpDEhoCg@mail.gmail.com
>>
>> results after the patch:
>>
>> trigger= BEGIN RETURN NULL; END
>> rows=40000
>> 228ms COPY test.tab FROM '/tmp/test.dat'
>> 205ms COPY test.tab FROM '/tmp/test.dat'
>> rows=80000
>> 494ms COPY test.tab FROM '/tmp/test.dat'
>> 395ms COPY test.tab FROM '/tmp/test.dat'
>> rows=120000
>> 678ms COPY test.tab FROM '/tmp/test.dat'
>> 652ms COPY test.tab FROM '/tmp/test.dat'
>> rows=160000
>> 956ms COPY test.tab FROM '/tmp/test.dat'
>> 822ms COPY test.tab FROM '/tmp/test.dat'
>> rows=200000
>> 1184ms COPY test.tab FROM '/tmp/test.dat'
>> 1072ms COPY test.tab FROM '/tmp/test.dat'
>> trigger= BEGIN PERFORM pg_notify('test',NEW.id::text); RETURN NULL; END
>> rows=40000
>> 440ms COPY test.tab FROM '/tmp/test.dat'
>> 406ms COPY test.tab FROM '/tmp/test.dat'
>> rows=80000
>> 887ms COPY test.tab FROM '/tmp/test.dat'
>> 769ms COPY test.tab FROM '/tmp/test.dat'
>> rows=120000
>> 1346ms COPY test.tab FROM '/tmp/test.dat'
>> 1171ms COPY test.tab FROM '/tmp/test.dat'
>> rows=160000
>> 1710ms COPY test.tab FROM '/tmp/test.dat'
>> 1709ms COPY test.tab FROM '/tmp/test.dat'
>> rows=200000
>> 2189ms COPY test.tab FROM '/tmp/test.dat'
>> 2206ms COPY test.tab FROM '/tmp/test.dat'
>
> I'm not so sure that this is a great idea. Generally, we tend to
> discourage GUCs that control behavior at the SQL level. Are you 100%
> certain that there is no path to optimizing this case without changing
> behvior?
>
> merlin

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Hedayat Vatankhah 2016-02-08 08:50:49 Re: PostgreSQL seems to create inefficient plans in simple conditional joins
Previous Message Merlin Moncure 2016-02-05 19:52:51 Re: bad COPY performance with NOTIFY in a trigger