Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions
Date: 2020-06-02 14:22:48
Message-ID: CAFiTN-ufHPL5VAk=BW5abxWq2OVv55niuNnc7Kjh5tcCPJmQYQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 2, 2020 at 4:56 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, Jun 2, 2020 at 3:59 PM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> >
> > On Thu, May 28, 2020 at 5:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >
> > >
> > > Also, what if the changes file size overflows "OS file size limit"?
> > > If we agree that the above are problems then do you think we should
> > > explore using BufFile interface (see storage/file/buffile.c) to avoid
> > > all such problems?
> >
> > I also think that the file size is a problem. I think we can use
> > BufFile with some modifications. We can not use the
> > BufFileCreateTemp, because of few reasons
> > 1) files get deleted on close, but we have to open/close on every
> > stream start/stop.
> > 2) even if we try to avoid closing we need to the BufFile pointers
> > (which take 8192k per file) because there is no option to pass the
> > file name.
> >
> > I thin for our use case BufFileCreateShared is more suitable. I think
> > we need to do some modifications so that we can use these apps without
> > SharedFileSet. Otherwise, we need to unnecessarily need to create
> > SharedFileSet for each transaction and also need to maintain it in xid
> > array or xid hash until transaction commit/abort. So I suggest
> > following modifications in shared files set so that we can
> > conveniently use it.
> > 1. ChooseTablespace(const SharedFileSet fileset, const char name)
> > if fileset is NULL then select the DEFAULTTABLESPACEOID
> > 2. SharedFileSetPath(char path, SharedFileSet fileset, Oid tablespace)
> > If fileset is NULL then in directory path we can use MyProcPID or
> > something instead of fileset->creator_pid.
> >
>
> Hmm, I find these modifications a bit ad-hoc. So, not sure if it is
> better than the patch maintains sharedfileset information.

I think we might do something better here, maybe by supplying function
pointer or so, but maintaining sharedfileset which contains different
tablespace/mutext which we don't need at all for our purpose also
doesn't sound very appealing. Let me see if I can not come up with
some clean way of avoiding the need to shared-fileset then maybe we
can go with the shared fileset idea.

> > 3. Pass some parameter to BufFileOpenShared, so that it can open the
> > file in RW mode instead of read-only mode.
> >
>
> This seems okay.
>
> >
> > > 2.
> > > apply_handle_stream_abort()
> > > {
> > > ..
> > > + /* discard the subxacts added later */
> > > + nsubxacts = subidx;
> > > +
> > > + /* write the updated subxact list */
> > > + subxact_info_write(MyLogicalRepWorker->subid, xid);
> > > ..
> > > }
> > >
> > > Here, if subxacts becomes zero, then also subxact_info_write will
> > > create a new file and write checksum.
> >
> > How, will it create the new file, in fact it will write nsubxacts as 0
> > in the existing file, and I think we need to do that right so that in
> > next open we will know that the nsubxact is 0.
> >
> > I think subxact_info_write
> > > should have a check for nsubxacts > 0 before writing to the file.
> >
> > But, even if nsubxacts become 0 we want to write the file so that we
> > can overwrite the previous info.
> >
>
> Can't we just remove the file for such a case?

But, as of now, we expect if it is not a first-time stream start then
the file exists. Actually, currently, it's very easy that if it is
not the first segment we always expect that the file must exist,
otherwise an error. Now if it is not the first segment then we will
need to handle multiple cases.

a) subxact_info_read need to handle the error case, because the file
may not exist because there was no subxact in last stream or it was
deleted because nsubxact become 0.
b) subxact_info_write, there will be multiple cases that if nsubxact
was already 0 then we can avoid writing the file, but if it become 0
now we need to remove the file.

Let me think more on that.

>
> apply_handle_stream_abort()
> {
> ..
> + /* XXX optimize the search by bsearch on sorted data */
> + for (i = nsubxacts; i > 0; i--)
> + {
> + if (subxacts[i - 1].xid == subxid)
> + {
> + subidx = (i - 1);
> + found = true;
> + break;
> + }
> + }
> +
> + /*
> + * If it's an empty sub-transaction then we will not find the subxid
> + * here so just free the memory and return.
> + */
> + if (!found)
> + {
> + /* Free the subxacts memory */
> + if (subxacts)
> + pfree(subxacts);
> +
> + subxacts = NULL;
> + subxact_last = InvalidTransactionId;
> + nsubxacts = 0;
> + nsubxacts_max = 0;
> +
> + return;
> + }
> ..
> }
>
> I have one question regarding the above code. Isn't it possible that
> a particular subtransaction id doesn't have any change but others do
> we have? For ex. cases like below:
>
> postgres=# begin;
> BEGIN
> postgres=*# insert into t1 values(1);
> INSERT 0 1
> postgres=*# savepoint s1;
> SAVEPOINT
> postgres=*# savepoint s2;
> SAVEPOINT
> postgres=*# insert into t1 values(2);
> INSERT 0 1
> postgres=*# insert into t1 values(3);
> INSERT 0 1
> postgres=*# Rollback to savepoint s1;
> ROLLBACK
> postgres=*# commit;
>
> Here, we have performed Rolledback to savepoint s1 which doesn't have
> any change of its own. I think this would have handled but just
> wanted to confirm.

But internally, that will send abort for the s2 first, and for that,
we will find xid and truncate, and later we will send abort for s1 but
that we will not find and do nothing? Anyway, I will test it and let
you know.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martín Marqués 2020-06-02 14:23:26 Re: Read access for pg_monitor to pg_replication_origin_status view
Previous Message Jehan-Guillaume de Rorthais 2020-06-02 14:11:15 Re: Strange decreasing value of pg_last_wal_receive_lsn()