Re: LSM tree for Postgres

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LSM tree for Postgres
Date: 2020-08-08 14:07:31
Message-ID: 7c0f264f-8e13-321b-8035-9c930682966e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07.08.2020 15:31, Alexander Korotkov wrote:
> ср, 5 авг. 2020 г., 09:13 Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>:
>> Concerning degrade of basic index - B-Tree itself is balanced tree. Yes,
>> insertion of random keys can cause split of B-Tree page.
>> In the worst case half of B-Tree page will be empty. So B-Tree size will
>> be two times larger than ideal tree.
>> It may cause degrade up to two times. But that is all. There should not
>> be infinite degrade of speed tending to zero.
> My concerns are not just about space utilization. My main concern is
> about the order of the pages. After the first merge the base index
> will be filled in key order. So physical page ordering perfectly
> matches their logical ordering. After the second merge some pages of
> base index splits, and new pages are added to the end of the index.
> Splits also happen in key order. So, now physical and logical
> orderings match within two extents corresponding to first and second
> merges, but not within the whole tree. While there are only few such
> extents, disk page reads may in fact be mostly sequential, thanks to
> OS cache and readahead. But finally, after many merges, we can end up
> with mostly random page reads. For instance, leveldb doesn't have a
> problem of ordering degradation, because it stores levels in sorted
> files.
>
I agree with your that loosing sequential order of B-Tree pages may have
negative impact on performance.
But it first of all critical for order-by and range queries, when we
should traverse several subsequent leave pages.
It is less critical for exact-search or delete/insert operations.
Efficiency of merge operations mostly depends on how much keys
will be stored at the same B-Tree page. And it is first of all
determined by size of top index and key distribution.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2020-08-08 15:09:14 Re: get rid of distprep?
Previous Message Amit Kapila 2020-08-08 10:32:27 Re: [Patch] Optimize dropping of relation buffers using dlist