Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Noah Misch <noah(at)leadboat(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-12-02 08:20:25
Message-ID: CAA4eK1+z07HvtULZXxkcvvBios8ZeKkRMuDoUZvAHvVcvSjAew@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 2, 2020 at 1:20 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
>
> On Tue, Dec 1, 2020 at 11:38 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Mon, Nov 30, 2020 at 6:49 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > > On Mon, Nov 30, 2020 at 3:14 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > >
> > > > On Mon, Sep 07, 2020 at 12:00:41PM +0530, Amit Kapila wrote:
> > > > > Thanks, I have pushed the last patch. Let's wait for a day or so to
> > > > > see the buildfarm reports
> > > >
> > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer&dt=2020-09-08%2006%3A24%3A14
> > > > failed the new 015_stream.pl test with the subscriber looping like this:
> > > >
> > > > 2020-09-08 11:22:49.848 UTC [13959252:1] LOG: logical replication apply worker for subscription "tap_sub" has started
> > > > 2020-09-08 11:22:54.045 UTC [13959252:2] ERROR: could not open temporary file "16393-510.changes.0" from BufFile "16393-510.changes": No such file or directory
> > > > 2020-09-08 11:22:54.055 UTC [7602182:1] LOG: logical replication apply worker for subscription "tap_sub" has started
> > > > 2020-09-08 11:22:54.101 UTC [31785284:4] LOG: background worker "logical replication worker" (PID 13959252) exited with exit code 1
> > > > 2020-09-08 11:23:01.142 UTC [7602182:2] ERROR: could not open temporary file "16393-510.changes.0" from BufFile "16393-510.changes": No such file or directory
> > > > ...
> > > >
> > > > What happened there?
> > > >
> > >
> > > What is going on here is that the expected streaming file is missing.
> > > Normally, the first time we send a stream of changes (some percentage
> > > of transaction changes) we create the streaming file, and then in
> > > respective streams we just keep on writing in that file the changes we
> > > receive from the publisher, and on commit, we read that file and apply
> > > all the changes.
> > >
> > > The above kind of error can happen due to the following reasons: (a)
> > > the first time we sent the stream and created the file and that got
> > > removed before the second stream reached the subscriber. (b) from the
> > > publisher-side, we never sent the indication that it is the first
> > > stream and the subscriber directly tries to open the file thinking it
> > > is already there.
> > >
>
> I have executed "make check" in the loop with only this file. I have
> repeated it 5000 times but no failure, I am wondering shall we try to
> execute in the same machine in a loop where it failed once?
>

Yes, that might help. Noah, would it be possible for you to try that
out, and if it failed then probably get the stack trace of subscriber?
If we are able to reproduce it then we can add elogs in functions
SharedFileSetInit, BufFileCreateShared, BufFileOpenShared, and
SharedFileSetDeleteAll to print the paths to see if we are sometimes
unintentionally removing some files. I have checked the code and there
doesn't appear to be any such problems but I might be missing
something.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message tsunakawa.takay@fujitsu.com 2020-12-02 08:21:42 [bug fix] ALTER TABLE SET LOGGED/UNLOGGED on a partitioned table does nothing silently
Previous Message Dilip Kumar 2020-12-02 07:50:40 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions