Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2018-12-19 09:58:58
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hi Tomas,

> I'm a bit confused by the changes to TAP tests. Per the patch summary,
> some .pl files get renamed (nor sure why), a new one is added, etc.

I added new tap test case, streaming=true option inside old stream_*
ones and incremented streaming tests number (+2) because of the
collision between / and / At least in the previous
version of the patch they were under the same numbers. Nothing special,
but for simplicity, please, find attached my new tap test separately.

> So
> I've instead enabled streaming subscriptions in all tests, which with
> this patch produces two failures:
> Test Summary Report
> -------------------
> t/ (Wstat: 7424 Tests: 1 Failed: 0)
> Non-zero exit status: 29
> Parse errors: Bad plan. You planned 7 tests but ran 1.
> t/ (Wstat: 256 Tests: 2 Failed: 1)
> Failed test: 2
> Non-zero exit status: 1
> So yeah, there's more stuff to fix. But I can't directly apply your
> fixes because the updated patches are somewhat different.

Fixes should apply clearly to the previous version of your patch. Also,
I am not sure, that it is a good idea to simply enable streaming
subscriptions in all tests (e.g. pre streaming patch t/,
since then they do not hit not streaming code.

>>> Interesting. Any idea where does the extra overhead in this particular
>>> case come from? It's hard to deduce that from the single flame graph,
>>> when I don't have anything to compare it with (i.e. the flame graph for
>>> the "normal" case).
>> I guess that bottleneck is in disk operations. You can check
>> logical_repl_worker_new_perf.svg flame graph: disk reads (~9%) and
>> writes (~26%) take around 35% of CPU time in summary. To compare,
>> please, see attached flame graph for the following transaction:
>> INSERT INTO large_text
>> SELECT (SELECT string_agg('x', ',')
>> FROM generate_series(1, 2000)) FROM generate_series(1, 1000000);
>> Execution Time: 44519.816 ms
>> Time: 98333,642 ms (01:38,334)
>> where disk IO is only ~7-8% in total. So we get very roughly the same
>> ~x4-5 performance drop here. JFYI, I am using a machine with SSD for tests.
>> Therefore, probably you may write changes on receiver in bigger chunks,
>> not each change separately.
> Possibly, I/O is certainly a possible culprit, although we should be
> using buffered I/O and there certainly are not any fsyncs here. So I'm
> not sure why would it be cheaper to do the writes in batches.
> BTW does this mean you see the overhead on the apply side? Or are you
> running this on a single machine, and it's difficult to decide?

I run this on a single machine, but walsender and worker are utilizing
almost 100% of CPU per each process all the time, and at apply side I/O
syscalls take about 1/3 of CPU time. Though I am still not sure, but for
me this result somehow links performance drop with problems at receiver

Writing in batches was just a hypothesis and to validate it I have
performed test with large txn, but consisting of a smaller number of
wide rows. This test does not exhibit any significant performance drop,
while it was streamed too. So it seems to be valid. Anyway, I do not
have other reasonable ideas beside that right now.


Alexey Kondratov

Postgres Professional
Russian Postgres Company

Attachment Content-Type Size application/x-perl 3.7 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Matsumura, Ryo 2018-12-19 10:04:47 RE: [PROPOSAL]a new data type 'bytea' for ECPG
Previous Message David Rowley 2018-12-19 09:51:01 Re: Ordered Partitioned Table Scans