Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-12-01 06:08:00
Message-ID: CAFiTN-t1fN-fedz7mWwMmZUdPo6ZC5wnDT4=6UiZoj5kb64WEg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, Nov 30, 2020 at 3:14 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> >
> > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > see the buildfarm reports
> >
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14
> > failed the new 015_stream.pl test with the subscriber looping like this:
> >
> > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG: logical replication apply worker for subscription "tap_sub" has started
> > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR: could not open temporary file "16393-510.changes.0" from BufFile "16393-510.changes": No such file or directory
> > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG: logical replication apply worker for subscription "tap_sub" has started
> > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG: background worker "logical replication worker" (PID 13959252) exited with exit code 1
> > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR: could not open temporary file "16393-510.changes.0" from BufFile "16393-510.changes": No such file or directory
> > ...
> >
> > What happened there?
> >
>
> What is going on here is that the expected streaming file is missing.
> Normally, the first time we send a stream of changes (some percentage
> of transaction changes) we create the streaming file, and then in
> respective streams we just keep on writing in that file the changes we
> receive from the publisher, and on commit, we read that file and apply
> all the changes.
>
> The above kind of error can happen due to the following reasons: (a)
> the first time we sent the stream and created the file and that got
> removed before the second stream reached the subscriber. (b) from the
> publisher-side, we never sent the indication that it is the first
> stream and the subscriber directly tries to open the file thinking it
> is already there.
>
> Now, the publisher and subscriber log doesn't directly indicate any of
> the above problems but I have some observations.
>
> The subscriber log indicates that before the apply worker exits due to
> an error the new apply worker gets started. We delete the
> streaming-related temporary files on proc_exit, so one possibility
> could have been that the new apply worker has created the streaming
> file which the old apply worker has removed but that is not possible
> because we always create these temp-files by having procid in the
> path.

Yeah, and I have tried to test on this line, basically, after the
streaming has started I have set the binary=on. Now using gdb I have
made the worker wait before it deletes the temp file and meanwhile the
new worker started and it worked properly as expected.

> The other thing I observed in the code is that we can mark the
> transaction as streamed (via ReorderBufferTruncateTxn) if we try to
> stream a transaction that has no changes the first time we try to
> stream the transaction. This would lead to symptom (b) because the
> second-time when there are more changes we would stream the changes as
> it is not the first time. However, this shouldn't happen because we
> never pick-up a transaction to stream which has no changes. I can try
> to fix the code here such that we don't mark the transaction as
> streamed unless we have streamed at least one change but I don't see
> how it is related to this particular test failure.

Yeah, this can be improved but as you mentioned that we never select
an empty transaction for streaming so this case should not occur. I
will perform some testing/review around this and report.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2020-12-01 06:10:13 Re: Allow CLUSTER, VACUUM FULL and REINDEX to change tablespace on the fly
Previous Message Michael Paquier 2020-12-01 06:01:33 Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted