Re: LSM tree for Postgres

From: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: LSM tree for Postgres
Date: 2020-08-04 18:55:30
Message-ID: bfed551a-b165-e4aa-16c5-f729a05d3f55@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.08.2020 20:44, Tomas Vondra wrote:
> Unique indexes are not supported now.
>> And I do not see some acceptable solution here.
>> If we will have to check presence of duplicate at the time of insert
>> then it will eliminate all advantages of LSM approach.
>> And if we postpone to the moment of merge, then... I afraid that it
>> will be too late.
>>
>
> Ummm, but in your response to Stephen you said:
>
>     But search locates not ANY record with specified key in top index
>     but record which satisfies snapshot of the transaction. Why do we
>     need more records if we know that there are no duplicates?
>
> So how do you know there are no duplicates, if unique indexes are not
> supported (and may not be for LSM)?
>

In index AM I marked Lsm3 index as not supporting unique constraint.
So it can not be used to enforce unique contraint.
But it is possible to specify "unique" in index properties.
In this case it is responsibility of programmer to guarantee that there
are no duplicates in the index.
This option allows to use this search optimization - locate first record
satisfying snapshot and not touch other indexes.

>>>
>>> Isn't it a bit suspicious that with more clients the throughput
>>> actually
>>> drops significantly? Is this merely due to PoC stage, or is there some
>>> inherent concurrency bottleneck?
>>>
>> My explaination is the following (I am not 100% sure that it is
>> true): multiple clients insert records faster than merge bgworker is
>> able to merge them to main index. It cause blown of top index and as
>> a result it doesn't fir in memory any more.
>> So we loose advantages of fast inserts. If we have N top indexes
>> instead of just 2, we can keep size of each top index small enough.
>> But in this case search operations will have to merge N indexes and
>> so search is almost N times slow (the fact that each top index fits
>> in memory
>> doesn't mean that all of the fits in memory at the same time, so we
>> still have to read pages from disk during lookups in top indexes).
>>
>
> Hmmm, maybe. Should be easy to verify by monitoring the size of the top
> index, and limiting it to some reasonable value to keep good
> performance. Something like gin_pending_list_size I guess.
>

Lsm3 provides functions for getting size of active top index, explicitly
force merge of top index and
wait completion of merge operation.
Once of use cases of Lsm3 may be delayed update of indexes.
For some application insert speed is very critical: them can not loose
data which is received at high rate.
In this case in working hours we insert data in small index and at night
initiate merge of this index with main index.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashwin Agrawal 2020-08-04 19:01:45 For standby pg_ctl doesn't wait for PM_STATUS_READY in presence of promote_trigger_file
Previous Message Robert Haas 2020-08-04 17:53:17 Re: ALTER TABLE .. DETACH PARTITION CONCURRENTLY