Re: Reviewing freeze map code

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reviewing freeze map code
Date: 2016-06-21 04:59:53
Message-ID: CAA4eK1J94FQJ8cEVdHmS2aDnPj99kGomScf+QjQLKBZ+6yaHbA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jun 21, 2016 at 9:08 AM, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
wrote:
>
> On Tue, Jun 21, 2016 at 3:29 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Tue, Jun 21, 2016 at 1:03 AM, Andres Freund <andres(at)anarazel(dot)de>
wrote:
> >> Well, I think generally nobody seriously looked at actually refactoring
> >> heap_update(), even though that'd be a good idea. But in this
instance,
> >> the problem seems relatively fundamental:
> >>
> >> We need to lock the origin page, to do visibility checks, etc. Then we
> >> need to figure out the target page. Even disregarding toasting - which
> >> we could be doing earlier with some refactoring - we're going to have
to
> >> release the page level lock, to lock them in ascending order. Otherwise
> >> we'll risk kinda likely deadlocks.
> >
> > Can we consider to use some strategy to avoid deadlocks without
releasing
> > the lock on old page? Consider if we could have a mechanism such that
> > RelationGetBufferForTuple() will ensure that it always returns a new
buffer
> > which has targetblock greater than the old block (on which we already
held a
> > lock). I think here tricky part is whether we can get anything like
that
> > from FSM. Also, there could be cases where we need to extend the heap
when
> > there were pages in heap with space available, but we have ignored them
> > because there block number is smaller than the block number on which we
have
> > lock.
>
> Doesn't that mean that over time, given a workload that does only or
> mostly updates, your records tend to migrate further and further away
> from the start of the file, leaving a growing unusable space at the
> beginning, until you eventually need to CLUSTER/VACUUM FULL?
>

The request for updates should ideally fit in same page as old tuple for
many of the cases if fillfactor is properly configured, considering
update-mostly loads. Why would it be that always the records will migrate
further away, they should get the space freed by other updates in
intermediate pages. I think there could be some impact space-wise, but
freed-up space will be eventually used.

> I was wondering about speculatively asking for a free page with a
> lower block number than the origin page, if one is available, before
> locking the origin page.

Do you wan't to lock it as well? In any-case, I think adding the code
without deciding whether the update can be accommodated in current page can
prove to be costly.

> Then after locking the origin page, if it
> turns out you need a page but didn't get it earlier, asking for a free
> page with a higher block number than the origin page.
>

Something like that might workout if it is feasible and people agree on
pursuing such an approach.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2016-06-21 05:07:38 Re: Reviewing freeze map code
Previous Message Michael Paquier 2016-06-21 04:47:32 Missing checks when malloc returns NULL...