[GSoC 2026] - B-tree Index Bloat Reduction - Introduction

From: Salma El-Sayed <salmasayed182003(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: [GSoC 2026] - B-tree Index Bloat Reduction - Introduction
Date: 2026-05-01 09:52:54
Message-ID: CANBEAPFW64QUGmMYoPekV6jGrn5Yx8kTNfeEtcTG9CCBUUBoLQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello Hackers!

I will be working on the B-tree Index Bloat Reduction (Page Merge) project
for GSoC 2026 [1]: For the last 2 months, I have been actively engaging
with the mentors about the project. Here is a brief introduction about
myself, the project, and the work done so far.

I am being mentored by Kirk, Andreas, Pavlo, Nikolay, and Andrei. I am
sending this introductory email to share my progress and to learn how to
properly engage with the PostgreSQL Community.

*Brief introduction about me*
I am a final-year Computer Engineering student. I recently completed a
training program at STMicroelectronics where I built various projects in C,
including a custom Unix-like shell [2] and a dynamic memory manager
overriding libc's malloc [3]. For the last couple of months, I have also
been working on the CMU 15-445/645
<https://15445.courses.cs.cmu.edu/fall2025/> database systems course
alongside my university work. Through this course, I built a Buffer Pool
Manager and a B+Tree Index. (I cannot provide the code publicly due to
academic integrity rules, but I passed all tests for both projects [4]).

Working with my mentors, I have reviewed and defined my proposed solution
to the bloat problem [5] and have already written tooling to help visualize
the nbtree before and after a merge [6].
*project review*
B-tree indexes can become severely bloated after heavy UPDATE/DELETE
workloads - in production systems, indexes with 90%+ bloat are common. The
core engine currently has no way to consolidate two sparse pages into one:
VACUUM can only delete pages that are completely empty.

*proposed approach*
To solve this, I propose introducing a B-tree page merge mechanism. The
core idea is to identify adjacent sparse pages that share the same parent,
consolidate their tuples into a single page, and update the parent node's
downlinks accordingly.

While my initial proposal outlines this specific mechanism, I intend to
keep the approach completely flexible. I fully expect the design to evolve
based on the community's feedback and what we learn during the analysis
phase. Our immediate plan is to finalize this analysis, identify impacted
areas (WAL, reverse/forward scans, RECOVERY), and ensure we have
comprehensive testing strategies mapped out.

As a GSoC participant, I am willing to do the hard work and I know this
might not get committed during the GSoC2026, but I will stick around if it
is close, and if not, I will try to leave it in good enough shape that a
more permanent Hacker can pick this up and carry it through.

-----
[1]
wiki.postgresql.org/wiki/GSoC_2026#B-tree_Index_Bloat_Reduction_(Page_Merge)
[2] github.com/salmaaliia/My-Shell
[3] github.com/salmaaliia/Heap-Memory-Manager
[4] B-Tree Index
<https://www.linkedin.com/feed/update/urn:li:activity:7432588522382835713/>,
Buffer Pool Manager
<https://www.linkedin.com/feed/update/urn:li:activity:7406286985495023616/>
[5]
docs.google.com/document/d/1PIMG0N__4BIB0uDWOfcK-AruatNL8SCTlfTlBTCaoCo/edit?usp=sharing
[6] github.com/salmaaliia/postgres/tree/tooling

Best regards,
Salma El-Sayed

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2026-05-01 10:41:39 Re: [PATCH] Fix stale relation close in sequence synchronization
Previous Message Ayush Tiwari 2026-05-01 09:14:07 Re: Refactor code around GUC default_toast_compression