Re: POC: Cleaning up orphaned files using undo logs

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, Kuntal Ghosh <kuntalghosh(dot)2007(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: POC: Cleaning up orphaned files using undo logs
Date: 2019-07-18 05:45:05
Message-ID: CAA4eK1+1Vf0tS=hT+rxUz02JvLPqbDSqwrCjpW8N-K8tFPdTzA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 17, 2019 at 3:37 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> Hi,
>
> On 2019-07-13 15:55:51 +0530, Amit Kapila wrote:
> > On Fri, Jul 12, 2019 at 7:08 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > > I think even if we currently go with a binary heap, it will be
> > > > possible to change it to rbtree later, but I am fine either way.
> > >
> > > Well, I don't see much point in revising all of this logic twice. We
> > > should pick the way we want it to work and make it work that way.
> > >
> >
> > Yeah, I agree. So, I am assuming here that as you have discussed this
> > idea with Andres offlist, he is on board with changing it as he has
> > originally suggested using binary_heap. Andres, do let us know if you
> > think differently here. It would be good if anyone else following the
> > thread can also weigh in.
>
> Yes, I think using an rbtree makes sense.
>

Okay.

> I'm not yet sure whether we'd want the rbtree nodes being pointed to
> directly by the hashtable, or whether we'd want one indirection.
>
> e.g. either something like:
>
>
> typedef struct UndoWorkerQueue
> {
> /* priority ordered tree */
> RBTree *tree;
> ....
> }
>

I think we also need the size of rbtree (aka how many nodes/undo
requests it has) to know whether we can add more. This information is
available in binary heap, but here I think we need to track it in
UndoWorkerQueue. Basically, at each enqueue/dequeue, we need to
increment/decrement the same.

> typedef struct UndoWorkerQueueEntry
> {
> RBTNode tree_node;
>
> /*
> * Reference hashtable via key, not pointers, entries might be
> * moved.
> */
> RollbackHashKey rollback_key
> ...
> } UndoWorkerQueueEntry;
>

In UndoWorkerQueueEntry, we might also want to include some other info
like dbid, request_size, next_retry_at, err_occurred_at so that while
accessing queue entry in comparator functions or other times, we don't
always need to perform hash table search. OTOH, we can do hash_search
as well, but may be code-wise it will be better to keep additional
information.

Another thing is we need some freelist/array for
UndoWorkerQueueEntries equivalent to size of three queues?

> typedef struct RollbackHashEntry
> {
> ...
> UndoWorkerQueueEntry *queue_memb_size;
> UndoWorkerQueueEntry *queue_memb_age;
> UndoWorkerQueueEntry *queue_memb_error;
> }
>
> and call rbt_delete() for any non-NULL queue_memb_* whenever an entry is
> dequeued via one of the queues (after setting the one already dequeued
> from to NULL, of course). Which requires - as Robert mentioned - that
> rbtree pointers remain stable after insertions.
>

Right.

BTW, do you have any preference for using dynahash or simplehash for
RollbackHashTable?

>
> Alternatively we can have a more complicated arrangement without the
> "stable pointer" requirement (which'd also similarly work for a binary
> heap):
>
>
> I think the first approach is clearly preferrable from a simplicity POV,
> but the second approach would be a bit more generic (applicable to heap
> as well) and wouldn't require adjusting the rbtree code.
>

+1 for the first approach, the second one appears to be quite
complicated as compared to first.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2019-07-18 05:53:34 Re: partition routing layering in nodeModifyTable.c
Previous Message Michael Paquier 2019-07-18 05:38:34 Re: Intermittent pg_ctl failures on Windows