generalized conveyor belt storage

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: generalized conveyor belt storage
Date: 2021-12-14 23:00:34
Message-ID: CA+Tgmoa_VNzG4ZouZyQQ9h=oRiy=ZQV5+xHQXxMWmep4Ygg8Dg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Back when we were working on zheap, we realized that we needed some
way of storing undo records that would permit us to discard old undo
efficiently when it was no longer needed. Thomas Munro dubbed this
"conveyor belt storage," the idea being that items are added at one
end and removed from the other. In the zheap patches, Thomas took an
approach similar to what we've done elsewhere for CLOG and WAL: keep
creating new files, put a relatively small amount of data in each one,
and remove the old files in their entirety when you can prove that
they are no longer needed. While that did and does seem reasonable, I
came to dislike it, because it meant we needed a separate smgr for
undo as compared with everything else, which was kind of complicated.
Also, that approach was tightly integrated with and thus only useful
for zheap, and as Thomas observed at the time, the problem seems to be
fairly general. I got interested in this problem again because of the
idea discussed in
https://www.postgresql.org/message-id/CA%2BTgmoZgapzekbTqdBrcH8O8Yifi10_nB7uWLB8ajAhGL21M6A%40mail.gmail.com
of having a "dead TID" relation fork in which to accumulate TIDs that
have been marked as dead in the table but not yet removed from the
indexes, so as to permit a looser coupling between table vacuum and
index vacuum. That's yet another case where you accumulate new data
and then at a certain point the oldest data can be thrown away because
its intended purpose has been served.

So here's a patch. Basically, it lets you initialize a relation fork
as a "conveyor belt," and then you can add pages of basically
arbitrary data to the conveyor belt and then throw away old ones and,
modulo bugs, it will take care of recycling space for you. There's a
fairly detailed README in the patch if you want a more detailed
description of how the whole thing works. It's missing some features
that I want it to have: for example, I'd like to have on-line
compaction, where whatever logical page numbers of data currently
exist can be relocated to lower physical page numbers thus allowing
you to return space to the operating system, hopefully without
requiring a strong heavyweight lock. But that's not implemented yet,
and it's also missing a few other things, like test cases, performance
results, more thorough debugging, better write-ahead logging
integration, and some code to use it to do something useful. But
there's enough here, I think, for you to form an opinion about whether
you think this is a reasonable direction, and give any design-level
feedback that you'd like to give. My colleagues Dilip Kumar and Mark
Dilger have contributed to this effort with some testing help, but all
the code in this patch is mine.

When I was chatting with Andres about this, he jumped to the question
of whether this could be used to replace SLRUs. To be honest, it's not
really designed for applications that are quite that intense. I think
we would get too much contention on the metapage, which you have to
lock and often modify for just about every conveyor belt operation.
Perhaps that problem can be dodged somehow, and it might even be a
good idea, because (1) then we'd have that data in shared_buffers
instead of a separate tiny buffer space and (2) the SLRU code is
pretty crappy. But I'm more interested in using this for new things
than I am in replacing existing core technology where any new bugs
will break everything for everyone. Still, I'm happy to hear ideas
around this kind of thing, or to hear the results of any
experimentation you may want to do.

Let me know what you think.

Thanks,

--
Robert Haas
EDB: http://www.enterprisedb.com

Attachment Content-Type Size
v1-0001-WIP-Conveyor-belt-storage.patch application/octet-stream 200.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-12-14 23:03:00 Re: Life cycles of tuple descriptors
Previous Message Dag Lem 2021-12-14 22:34:00 Re: daitch_mokotoff module