Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-19 04:10:02
Message-ID: CAA4eK1+Ya2zSwa-wQgog0U_oD9vcvFrR7gPb=KTcsYegkgjr6Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 19, 2019 at 12:28 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Wed, Jul 17, 2019 at 2:13 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > > Anyway, if you don't like this solution, propose something else. It's
> > > impossible to correctly implement a hard limit unless the number of
> > > aborted-but-not-yet-undone transaction is bounded to (HARD_LIMIT -
> > > ENTRIES_THAT_WOULD_BE_ADDED_AFTER_RECOVERY_IF_THE_SYSTEM_CRASHED_NOW).
> > > If there are 100 transactions each bound to 2 undo logs, and you
> > > crash, you will need to (as you have it designed now) add another 200
> > > transactions to the hash table upon recovery, and that will make you
> > > exceed the hard limit unless you were at least 200 transactions below
> > > the limit before the crash. Have you handled that somehow? If so,
> > > how?
> >
> > Yeah, we have handled it by reserving the space of MaxBackends. It is
> > UndoRollbackHashTableSize() - MaxBackends. There is a bug in the
> > current patch which is that it should reserve space for 2 *
> > MaxBackends so that after recovery, we are safe, but that can be
> > fixed.
>
> One of us is REALLY confused here. Nothing you do in
> UndoRollbackHashTableSize() can possibly fix the problem that I'm
> talking about. Suppose the system gets to a point where all of the
> rollback hash table entries are in use - there are some entries that
> are used because work was pushed into the background, and then there
> are other entries that are present because those transactions are
> being rolled back in the foreground.
>

We are doing exactly what you have written in the last line of the
next paragraph "stop the transaction from writing undo when the hash
table is already too full.". So we will
never face the problems related to repeated crash recovery. The
definition of too full is that we stop allowing the new transactions
that can write undo when we have the hash table already have entries
equivalent to (UndoRollbackHashTableSize() - MaxBackends). Does this
make sense?

> Now at this point you crash. Now
> when you start up, all the hash table entries, including the reserved
> ones, are already in use before any running transactions start. Now
> if you allow transactions to start before some of the rollbacks
> complete, you have got big problems. The system might crash again,
> and if it does, when it restarts, the total amount of outstanding
> requests will no longer fit in the hash table, which was the whole
> premise of this design.
>
> Maybe that doesn't make sense, so think about it this way.
>

All you are saying makes sense and I think I can understand the
problem you are trying to describe, but we have thought about the same
thing and have the algorithm/code in place which won't allow such
situations.

>
> > Another related thing is that to update the existing entry for queues,
> > we need to delete and re-insert the entry after we find the request in
> > a different log category. Again it depends if we point queue entries
> > to hash table, then we might not have this additional work but that
> > has its own set of complexities.
>
> I don't follow this. If you have a hash table where the key is XID,
> there is no need to delete and reinsert anything just because you
> discover that the XID has not only permanent undo but also unlogged
> undo, or something of that sort.
>

The size of the total undo to be processed will be changed if we find
anyone (permanent or unlogged) later. Based on the size, the entry
location should be changed in size queue.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-07-19 04:16:22 Re: refactoring - share str2*int64 functions
Previous Message Tom Lane 2019-07-19 04:06:01 Re: pgsql: Sync our copy of the timezone library with IANA release tzcode20