Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-16 05:01:39
Message-ID: CAA4eK1JKPVUDTCnJwuv=btJHhpccmRXA=rQt=N-xVRcsGwfiJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Jul 15, 2019 at 9:56 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Sat, Jul 13, 2019 at 6:26 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > I think then I am missing something because what I am talking about is
> > below code in rbt_insert:
>
> What you're saying here is that, with an rbtree, an exact match will
> result in a merging of requests which we don't want, so we have to
> make them always unique. That's fine, but even if you used a binary
> heap where it wouldn't be absolutely required that you break the ties,
> you'd still want to think at least a little bit about what behavior is
> best in case of a tie, just from the point of view of making the
> system efficient.
>

Okay.

> > I think it would be better to use start_urec_ptr because XID can be
> > non-unique in our case. As I explained in one of the emails above [1]
> > that we register the requests for logged and unlogged relations
> > separately, so XID can be non-unique.
>
> Yeah. I didn't understand that explanation. It seems to me that one
> of the fundamental design questions for this system is whether we
> should allow there to be an unbounded number of transactions that are
> pending undo application, or whether it's OK to enforce a hard limit.
> Either way, there should certainly be pressure applied to try to keep
> the number low, like forcing undo application into the foreground when
> a backlog is accumulating, but the question is what to do when that's
> insufficient. My original idea was that we should not have a hard
> limit, in which case the shared memory data on what is pending might
> be incomplete, in which case we would need the discard workers to
> discover transactions needing undo and add them to the shared memory
> data structures, and if those structures are full, then we'd just skip
> adding those details and rediscover those transactions again at some
> future point.
>
> But, my understanding of the current design being implemented is that
> there is a hard limit on the number of transactions that can be
> pending undo and the in-memory data structures are sized accordingly.
>

Yes, that is correct.

> In such a system, we cannot rely on the discard worker(s) to
> (re)discover transactions that need undo, because if there can be
> transactions that need undo that we don't know about, then we can't
> enforce a hard limit correctly.
>

I have responded in the email above about this point.

> The exception, I suppose, is that
> after a crash, we'll need to scan all the undo logs and figure out
> which transactions are pending, but that doesn't preclude using a
> single queue entry covering both the logged and the unlogged portion
> of a transaction that has written undo of both kinds. We've got to
> scan all of the undo logs before we allow any new undo-using
> transactions to start, and so we can create one fully-up-to-date entry
> that reflects the data for both persistence levels before any
> concurrent activity happens.
>

It is correct that no new undo using transaction can start, but
nothing prevents undo launcher to start the undo workers to process
the already registered requests which can lead to some concurrent
activity.

> I am wondering (and would love to hear other opinions on) the question
> of which kind of design we ought to be pursuing, but it's got to be
> one or the other, not something in the middle.
>

I agree that it should not be in the middle. It is possible that I am
missing or misunderstanding something here, but AFAIU, the current
design, and implementation allows us to maintain the pending undo
state in-memory.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2019-07-16 05:05:55 Re: pg_receivewal documentation
Previous Message Justin Pryzby 2019-07-16 05:01:24 Re: make \d pg_toast.foo show its indices ; and, \d toast show its main table ; and \d relkind=I show its partitions (and tablespace)