Re: The Free Space Map: Problems and Opportunities

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Jan Wieck <jan(at)wi3ck(dot)info>, Gregory Smith <gregsmithpgsql(at)gmail(dot)com>, John Naylor <john(dot)naylor(at)enterprisedb(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: The Free Space Map: Problems and Opportunities
Date: 2021-08-20 16:00:36
Message-ID: CAH2-WzkEVAPce9S4Z+4FOHFEPf2BySOipenh5QSOtmy=2=ww4g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Aug 20, 2021 at 8:34 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I expect they ran more than zero tests before selecting that value, so
> it's probably a decent choice in their system. However, that does seem
> rather low. I would have guessed that a good value would be in the
> 50-80 percent range.

They don't have to deal with non-HOT updates, which effectively move
the row to another block. Actually, that's not quite true -- they do
have a concept called row migration. But it's the strategy of last
resort, and is presumably very rare -- much rarer than a non-HOT
update that moves a row.

My point is that it's easier to believe that you run into sparse
deletion patterns in a version of Postgres with open and closed heap
pages -- because it's not only deletes that you have to worry about.

> It's hard to know, though, partly because
> everything is workload dependent, and partly because you're balancing
> two good things that are qualitatively different. A lower value
> figures to reduce the degree of "mixing" of older and newer data
> within the same pages, but it also risks permanently wasting space
> that could have been put to efficient use.

All true -- to me this is all about adapting to workloads at a fine granularity.

If we have closed pages, then non-HOT updates that cannot keep the row
on the same heap page now actually "relieve the pressure", which is
not currently true. If we started out with a heap page that had 100
tuples (say with fill factor 90), and then we find that we cannot keep
the entire page together...then maybe we'll have more luck with 99, or
just 90, or even less. By firmly sticking with our original goal, then
we have some chance to learn what really will be stable for a given
heap page. There is nothing wrong with learning that lesson through
trial and error, based on negative information from the workload.

Right now we go ahead and reuse the space at the next opportunity (we
particularly favor pages like this, even). So we never learn from our
mistakes.

--
Peter Geoghegan

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema 2021-08-20 16:01:06 Re: [EXTERNAL] Re: Allow declaration after statement and reformat code to use it
Previous Message Tom Lane 2021-08-20 15:50:24 Improving some plpgsql error messages