Re: zheap: a new storage format for PostgreSQL

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mithun Cy <mithun(dot)cy(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: zheap: a new storage format for PostgreSQL
Date: 2018-12-06 16:01:46
Message-ID: CA+TgmoY1Xquzci-1Zi8TxitfYVkgBFYi4js5WrswRjAoqmRuEQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 6, 2018 at 10:53 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> čt 6. 12. 2018 v 16:26 odesílatel Robert Haas <robertmhaas(at)gmail(dot)com> napsal:
>> On Thu, Dec 6, 2018 at 10:23 AM Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> > I have a problem to imagine it. When fill factor will be low, then there is high risk of high fragmentation - or there some body should to do defragmentation.
>>
>> I don't understand this.
>
> I don't know if zheap has or has not any tools for elimination fragmentation of space of page. But I expect so after some set of updates, when record size is mutable, the free space on page should be fragmented. Usually, when you have less memory, then fragmentation is faster.

Still not sure I completely understand, but it's true that zheap
sometimes needs to compact free space on a page. For example, if
you've got a page with a 100-byte hole, and somebody updates a tuple
to make it 2 bytes bigger, you've got to shift that tuple and any that
precede it backwards to reduce the size of the hole to 98 bytes, so
that you can fit the new version of the tuple. If, later, somebody
shrinks that tuple back to the original size, you've now got 100 bytes
of free space on the page, but they are fragmented: 98 bytes in the
"hole," and 2 bytes following the newly-shrunk tuple. If someone
tries to insert a 100-byte tuple in that page, we'll need to
reorganize the page a second time to bring all that free space back
together in a single chunk.

In my view, and I'm not sure if this is how the code currently works,
we should have just one routine to do a zheap page reorganization
which can cope with all possible scenarios. I imagine that you would
give it the page is it currently exists plus a "minimum tuple size"
for one or more tuples on the page (which must not be smaller than the
current size of that tuple, but could be bigger). It then reorganizes
the page so that every tuple for which a minimum size was given
consumes exactly that amount of space, every other tuple consumes the
minimum possible amount of space, and the remaining space goes into
the hole. So if you call this function with no minimal tuple sizes,
it does a straight defragmentation; if you give it minimum tuple
sizes, then it rearranges the page to make it suitable for a pending
in-place update of those tuples.

Actually, I think Amit and I discussed further refining this by
splitting the page reorganization function in half. One half would
make a plan for where to put each tuple on the page following the
reorg, but would not actually do anything. That would be executed
before entering a critical section, and might fail if the requested
minimum tuple sizes can't be satisfied. The other half would take the
previously-constructed plan as input and perform the reorganization.
That would be done in the critical section.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2018-12-06 16:05:09 Re: zheap: a new storage format for PostgreSQL
Previous Message Pavel Stehule 2018-12-06 15:52:34 Re: zheap: a new storage format for PostgreSQL