Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-13 10:25:51
Message-ID: CAA4eK1K_eDXnkYWTeFbXzHMamMLfdxnJBB3wLE5c_6idERXMpg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 12, 2019 at 7:08 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> > Apart from this, the duplicate key (ex. for size queues the size of
> > two requests can be same) handling might need some work. Basically,
> > either special combine function needs to be written (not sure yet what
> > we should do there) or we always need to ensure that the key is unique
> > like (size + start_urec_ptr). If the size is the same, then we can
> > decide based on start_urec_ptr.
>
> I think that this problem is somewhat independent of whether we use an
> rbtree or a binaryheap or some other data structure.
>

I think then I am missing something because what I am talking about is
below code in rbt_insert:
rbt_insert()
{
..
cmp = rbt->comparator(data, current, rbt->arg);
if (cmp == 0)
{
/*
* Found node with given key. Apply combiner.
*/
rbt->combiner(current, data, rbt->arg);
*isNew = false;
return current;
}
..
}

If you see, here it doesn't add the duplicate key in the tree and that
is not the case with binary_heap as far as I can understand.

> I would be
> inclined to use XID as a tiebreak for the size queue, so that it's
> more likely to match the ordering of the XID queue, but if that's
> inconvenient, then some other arbitrary value like start_urec_ptr
> should be fine.
>

I think it would be better to use start_urec_ptr because XID can be
non-unique in our case. As I explained in one of the emails above [1]
that we register the requests for logged and unlogged relations
separately, so XID can be non-unique.

> >
> > I think even if we currently go with a binary heap, it will be
> > possible to change it to rbtree later, but I am fine either way.
>
> Well, I don't see much point in revising all of this logic twice. We
> should pick the way we want it to work and make it work that way.
>

Yeah, I agree. So, I am assuming here that as you have discussed this
idea with Andres offlist, he is on board with changing it as he has
originally suggested using binary_heap. Andres, do let us know if you
think differently here. It would be good if anyone else following the
thread can also weigh in.

[1] - https://www.postgresql.org/message-id/CAA4eK1LEKyPZD5Dy4j1u2smUUyMzxgC2YLj8E%2BaJpsvG7sVJYA%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2019-07-13 11:17:00 Re: refactoring - share str2*int64 functions
Previous Message Jose Luis Tallon 2019-07-13 10:00:48 Re: [PATCH] Implement uuid_version()