Re: zheap: a new storage format for PostgreSQL

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: zheap: a new storage format for PostgreSQL
Date: 2018-12-07 03:27:27
Message-ID: CAA4eK1KvmovSF=ytK1F+G_Ls1jUMYyRscYNJouXT2X6EpojDxg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 6, 2018 at 9:32 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> > čt 6. 12. 2018 v 16:26 odesílatel Robert Haas <robertmhaas(at)gmail(dot)com> napsal:
> >> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> >> > I have a problem to imagine it. When fill factor will be low, then there is high risk of high fragmentation - or there some body should to do defragmentation.
> >>
> >> I don't understand this.
> >
> > I don't know if zheap has or has not any tools for elimination fragmentation of space of page. But I expect so after some set of updates, when record size is mutable, the free space on page should be fragmented. Usually, when you have less memory, then fragmentation is faster.
>
> Still not sure I completely understand, but it's true that zheap
> sometimes needs to compact free space on a page. For example, if
> you've got a page with a 100-byte hole, and somebody updates a tuple
> to make it 2 bytes bigger, you've got to shift that tuple and any that
> precede it backwards to reduce the size of the hole to 98 bytes, so
> that you can fit the new version of the tuple. If, later, somebody
> shrinks that tuple back to the original size, you've now got 100 bytes
> of free space on the page, but they are fragmented: 98 bytes in the
> "hole," and 2 bytes following the newly-shrunk tuple. If someone
> tries to insert a 100-byte tuple in that page, we'll need to
> reorganize the page a second time to bring all that free space back
> together in a single chunk.
>
> In my view, and I'm not sure if this is how the code currently works,
> we should have just one routine to do a zheap page reorganization
> which can cope with all possible scenarios. I imagine that you would
> give it the page is it currently exists plus a "minimum tuple size"
> for one or more tuples on the page (which must not be smaller than the
> current size of that tuple, but could be bigger). It then reorganizes
> the page so that every tuple for which a minimum size was given
> consumes exactly that amount of space, every other tuple consumes the
> minimum possible amount of space, and the remaining space goes into
> the hole. So if you call this function with no minimal tuple sizes,
> it does a straight defragmentation; if you give it minimum tuple
> sizes, then it rearranges the page to make it suitable for a pending
> in-place update of those tuples.
>

Yeah, the code is also along these lines, however, as of now, the API
takes input for one tuple (it's offset number and delta space
(additional space required by update that updates tuple to a bigger
size)). As of now, we don't have a requirement for multiple tuples,
but if there is a case, I think the API can be adapted. One more
thing we do during repair-fragmentation is to arrange tuples in their
offset order so that future sequence scans can be faster.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-12-07 03:51:07 Re: WIP: Avoid creation of the free space map for small tables
Previous Message Michael Paquier 2018-12-07 02:46:05 Re: Add pg_partition_root to get top-most parent of a partition tree