Re: LSM tree for Postgres

From: Dmitry Dolgov <9erthalion6(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LSM tree for Postgres
Date: 2020-08-05 11:09:38
Message-ID: 20200805110938.hwgxtwyzo5wvy6sc@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Tue, Aug 04, 2020 at 11:22:13AM +0300, Konstantin Knizhnik wrote:
>
> Then I think about implementing ideas of LSM using standard Postgres
> nbtree.
>
> We need two indexes: one small for fast inserts and another - big
> (main) index. This top index is small enough to fit in memory so
> inserts in this index are very fast. Periodically we will merge data
> from top index to base index and truncate the top index. To prevent
> blocking of inserts in the table while we are merging indexes we can
> add ... on more index, which will be used during merge.
>
> So final architecture of Lsm3 is the following: two top indexes used
> in cyclic way and one main index. When top index reaches some
> threshold value we initiate merge with main index, done by bgworker
> and switch to another top index. As far as merging indexes is done in
> background, it doesn't  affect insert speed. Unfortunately Postgres
> Index AM has not bulk insert operation, so we have to perform normal
> inserts. But inserted data is already sorted by key which should
> improve access locality and partly solve random reads problem for base
> index.
>
> Certainly to perform search in Lsm3 we have to make lookups in all
> three indexes and merge search results.

Thanks for sharing this! In fact this reminds me more of partitioned
b-trees [1] (and more older [2]) rather than LSM as it is (although
could be that the former was influenced by the latter). What could be
interesting is that quite often in these and many other whitepapers
(e.g. [3]) to address the lookup overhead the design includes bloom
filters in one or another way to avoid searching not relevant part of an
index. Tomas mentioned them in this thread as well (in the different
context), probably the design suggested here could also benefit from it?

[1]: Riegger Christian, Vincon Tobias, Petrov Ilia. Write-optimized
indexing with partitioned b-trees. (2017). 296-300. 10.1145/3151759.3151814.
[2]: Graefe Goetz. Write-Optimized B-Trees. (2004). 672-683.
10.1016/B978-012088469-8/50060-7.
[3]: Huanchen Zhang, David G. Andersen, Andrew Pavlo, Michael Kaminsky,
Lin Ma, and Rui Shen. Reducing the Storage Overhead of Main-Memory OLTP
Databases with Hybrid Indexes. (2016). 1567–1581. 10.1145/2882903.2915222.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-08-05 11:19:15 Re: display offset along with block number in vacuum errors
Previous Message Masahiko Sawada 2020-08-05 09:56:35 Re: Add MAIN_RELATION_CLEANUP and SECONDARY_RELATION_CLEANUP options to VACUUM