Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Alexey Kondratov <a(dot)kondratov(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2019-10-22 19:02:38
Message-ID: a9ed9172-63be-34b1-e60b-08d7c7deaed8@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 22.10.2019 20:22, Tomas Vondra wrote:
> On Tue, Oct 22, 2019 at 11:01:48AM +0530, Dilip Kumar wrote:
>> On Tue, Oct 22, 2019 at 10:46 AM Amit Kapila
>> <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>   In general, yours and Alexy's test results
>>> show that there is merit by having workers applying such transactions.
>>>   OTOH, as noted above [1], we are also worried about the performance
>>> of Rollbacks if we follow that approach.  I am not sure how much we
>>> need to worry about Rollabcks if commits are faster, but can we think
>>> of recording the changes in memory and only write to a file if the
>>> changes are above a certain threshold?  I think that might help saving
>>> I/O in many cases.  I am not very sure if we do that how much
>>> additional workers can help, but they might still help.  I think we
>>> need to do some tests and experiments to figure out what is the best
>>> approach?  What do you think?
>> I agree with the point.  I think we might need to do some small
>> changes and test to see what could be the best method to handle the
>> streamed changes at the subscriber end.
>>
>>>
>>> Tomas, Alexey, do you have any thoughts on this matter?  I think it is
>>> important that we figure out the way to proceed in this patch.
>>>
>>> [1] -
>>> https://www.postgresql.org/message-id/b25ce80e-f536-78c8-d5c8-a5df3e230785%40postgrespro.ru
>>>
>>
>
> I think the patch should do the simplest thing possible, i.e. what it
> does today. Otherwise we'll never get it committed.
>

I have to agree with Tomas, that keeping things as simple as possible
should be a main priority right now. Otherwise, the entire patch set
will pass next release cycle without being committed at least partially.
In the same time, it resolves important problem from my perspective. It
moves I/O overhead from primary to replica using large transactions
streaming, which is a nice to have feature I guess.

Later it would be possible to replace logical apply worker with
bgworkers pool in a separated patch, if we decide that it is a viable
solution. Anyway, regarding the Amit's questions:

- I doubt that maintaining a separate buffer on the apply side before
spilling to disk would help enough. We already have ReorderBuffer with
logical_work_mem limit, and if we exceeded that limit on the sender
side, then most probably we exceed it on the applier side as well,
excepting the case when this new buffer size will be significantly
higher then logical_work_mem to keep multiple open xacts.

- I still think that we should optimize database for commits, not
rollbacks. BGworkers pool is dramatically slower for rollbacks-only
load, though being at least twice as faster for commits-only. I do not
know how it will perform with real life load, but this drawback may be
inappropriate for such a general purpose database like Postgres.

- Tomas' implementation of streaming with spilling does not have this
bias between commits/aborts. However, it has a noticeable performance
drop (~x5 slower compared with master [1]) for large transaction
consisting of many small rows. Although it is not of an order of
magnitude slower.

Another thing is it that about a year ago I have found some problems
with MVCC/visibility and fixed them somehow [1]. If I get it correctly
Tomas adapted some of those fixes into his patch set, but I think that
this part should be reviewed carefully again. I would be glad to check
it, but now I am a little bit confused with all the patch set variants
in the thread. Which is the last one? Is it still dependent on 2pc decoding?

[1]
https://www.postgresql.org/message-id/flat/40c38758-04b5-74f4-c963-cf300f9e5dff%40postgrespro.ru#98d06fefc88122385dacb2f03f7c30f7

Thanks for moving this patch forward!

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-10-22 19:06:50 Re: Declaring a strict function returns not null / eval speed
Previous Message Maciek Sakrejda 2019-10-22 18:58:35 Duplicate Workers entries in some EXPLAIN plans