Re: LSM tree for Postgres

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LSM tree for Postgres
Date: 2020-08-05 06:13:12
Message-ID: 6d224b6c-194e-11a0-3924-446492b10f71@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05.08.2020 02:59, Alexander Korotkov wrote:
>
> The things you're writing makes me uneasy. I initially understood
> lsm3 as a quick and dirty prototype, while you're probably keeping
> some design in your mind (for instance, original design of LSM).
> However, your message makes me think you're trying to defend the
> approach currently implemented in lsm3 extension. Therefore, I've to
> criticise this approach.
>
> 1) The base index can degrade. At first, since merge can cause page
> splits. Therefore logical ordering of pages will become less
> correlated with their physical ordering with each merge.
> 2) If your workload will include updates and/or deletes, page
> utilization may also degrade.
> 3) While base index degrades, merge performance also degrades.
> Traverse of base index in logical order will require more and more
> random reads (at some point almost every page read will be random).
> While the base index becomes large and/or bloat, you push less top
> index tuples to a single base index page (at some point you will push
> one tuple per page).
>
> Original LSM design implies strict guarantees over average resources
> spent per index operation. Your design doesn't. Moreover, I bet lsm3
> will degrade significantly even on insert-only workload. It should
> degrade to the performance level of B-tree once you insert enough
> data. Try something like number_of_merges =
> numer_of_tuples_per_index_page * 2 and you should see this
> degradation. Real LSM doesn't degrade that way.

I mostly agree with your critices.
My Lsm3 is not true LSM, but from my point of view it preserves basic
principles of LSM: fast and small top index and bulk updates of main index.
My experiments with RocksDB shows that degradation also takes place in
this case. More experiments are needed to compare two approaches.

Concerning degrade of basic index - B-Tree itself is balanced tree. Yes,
insertion of random keys can cause split of B-Tree page.
In the worst case half of B-Tree page will be empty. So B-Tree size will
be two times larger than ideal tree.
It may cause degrade up to two times. But that is all. There should not
be infinite degrade of speed tending to zero.

>
> Right now vacuum process Lsm3 indexes in usual way.
> Removing records from top indexes may be not needed at all (just because
> this indexes will be truncated in any case).
> But as far as size of top index is expected to be small enough
> vacuumming it should not take a long time,
> so I didn't to avoid it (although it should not be difficult - just
> disable ambulkdelete for correspondent nbtree wrappers).
> It doesn't seem important, but I don't get your point here. Postgres
> expects ambulkdelete to delete TIDs from index. If you don't delete
> it from the top index, this TID will be merged to the base index. And
> that could lead wrong query answers unless you eliminate those TIDs in
> a different way (during the merge stage or something).

Yes, your are right. It is not possible to avoid delete TIDs from top
indexes.
>> Concerning deletes from main index - I do not understand how it can be
>> optimized.
> This is a trick you can learn from almost every LSM implementation.
> For instance, check docs for leveldb [1] about "delete marker". For
> sure, that requires some redesign of the vacuum and can't be done in
> extension (at least in the reasonable way). But, frankly speaking, I
> think core modifications are inevitable to utilize the power of LSM in
> PostgreSQL.

The main idea of Lsm3 was to investigate whether it is possible to
achieve the same result as with "classical" LSM
using standard Postgres nbtree indexes. Right now it seems t me that
answer is positive, but I have not performed
exhaustive measurements. For example I have not measured vacuum overhead
(it was enabled, so vacuumming takes place
in my benchmark, but I have not tries to separate its overhead and
influence on performance), index search speed,...

> Links
> 1. https://github.com/google/leveldb/blob/master/doc/impl.md
>
> ------
> Regards,
> Alexander Korotkov

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Asim Praveen 2020-08-05 07:08:41 Re: [PATCH] - Provide robust alternatives for replace_string
Previous Message Joel Mariadasan (jomariad) 2020-08-05 06:08:15 Reg. Postgres 13