Re: POC: Cleaning up orphaned files using undo logs

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-12 13:37:46
Message-ID: CA+TgmoZfPzo7-Wf0Gi+ZxjTan2Mvy5ip4sPJK6rueH4axW5o0g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 12, 2019 at 5:40 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> I am not sure but don't we need to retain the color of z as well?

I believe that would be very wrong. If you recolor an internal node,
you'll break the constant-black-height invariant.

> Apart from this, the duplicate key (ex. for size queues the size of
> two requests can be same) handling might need some work. Basically,
> either special combine function needs to be written (not sure yet what
> we should do there) or we always need to ensure that the key is unique
> like (size + start_urec_ptr). If the size is the same, then we can
> decide based on start_urec_ptr.

I think that this problem is somewhat independent of whether we use an
rbtree or a binaryheap or some other data structure. I would be
inclined to use XID as a tiebreak for the size queue, so that it's
more likely to match the ordering of the XID queue, but if that's
inconvenient, then some other arbitrary value like start_urec_ptr
should be fine.

> I think we can go by changing the implementation to rbtree by doing
> some enhancements instead of the binary heap or alternatively, we can
> use one of the two ideas suggested by you in the email above [1] to
> simplify the code and keep using the binary heap for now. Especially,
> I like the below one.
> "2. However, I don't think we should have a separate request object
> for each queue anyway. We should insert pointers to the same objects
> in all the relevant queue (either size + XID, or else error). So
> instead of having three sets of objects, one for each queue, we'd just
> have one set of objects and point to them with as many as two
> pointers.
> We'd therefore need LESS memory than we're using today, because we
> wouldn't have separate arrays for XID, size, and error queue
> elements."
>
> I think even if we currently go with a binary heap, it will be
> possible to change it to rbtree later, but I am fine either way.

Well, I don't see much point in revising all of this logic twice. We
should pick the way we want it to work and make it work that way.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2019-07-12 13:47:02 Re: rebased background worker reimplementation prototype
Previous Message Fabien COELHO 2019-07-12 13:10:26 Re: proposal - patch: psql - sort_by_size