Re: All Taxi Services need Index Clustered Heap Append

From: Evgeniy Shishkin <itparanoia(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: Darafei Komяpa Praliaskouski <me(at)komzpa(dot)net>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>
Subject: Re: All Taxi Services need Index Clustered Heap Append
Date: 2018-03-06 18:34:30
Message-ID: 8C124B05-35A1-41E9-B182-183299885F38@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Mar 6, 2018, at 04:57, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:
>
> On 3 March 2018 at 00:30, Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net <mailto:me(at)komzpa(dot)net>> wrote:
>
> I gave this all some thought and it looks like it all could have not happened if Postgres was able to cluster heap insertions by (id, ts) index. We're ok with synchronuous_commit=off, so amplified write won't immediately hit disk and can get cooled down in progress. Clustering doesn't require perfect sorting: we need to minimize number of pages fetched, it's ok if the pages are not consecutive on disk.
>
> I'm surprised nobody has mentioned BRIN yet.
>
> Ever since BRIN was introduced, I've thought it would be very interesting to use it + the freespace map for coarse-grained tuple routing in heap inserts. Try to find the best-match range with BRIN and look for free space there. I think there was discussion of this early on, so you may want to look up the BRIN threads.
>
> The main challenge probably comes when a range is exhausted. Your writes would start spilling over into new heap pages and get intermixed again. Some support for pre-allocating new ranges would be needed.

Basically Poor Man's Clustering solution consists of two parts:
1. Find eligible pages to insert
2. Make sure you don't screw new page

At first we want to define how we want to cluster our data, it can be a mix of equality, hash, range and geo operators.
In Darafei's case appropriate clustering would be: CLUSTER BY (hash(id), ts asc) or (eq(id), range(ts, '1 day interval')).
To support fast search of pages to insert, there should be present Bloom index on id and BRIN or Btree index on ts
for the first example, and Btree or Hash on id and Btree or BRIN or Gist for the second.

So, we can bitmap scan all needed indexes to find pages with tuples with already present clustered keys,
we AND those bitmaps and consult fsm and proceed to insert in pages if place allows.

Now, if we have not found pages with the same cluster keys or the pages are full, we need to find free page and don't screw it.
To do so, we must estimate ideal clustering of a page and either insert or make a completely new page.

In order not to intermix page with hash or equality clustering, we should examine already present keys and consult statistics for them.
If key is in MCV - better find new page, otherwise we need to calculate how many tuples per page on average we have for the key
and compare with number of tuples on the page with our new key. I.e. if the page we about to insert have 30 tuples with 3 clustering keys,
and we conclude that on average we have 20 tuples per key this means that there would be 30 more tuples with that keys and 20 for our key.
If the page can hold that 80 tuples, we insert there. I believe existing statistics are enough to make this work.

For range based clustering we should estimate we should estimate how many tuples there would be on a page and its interval and compare
if we would break this interval. For example we have plain ordering clustering, we insert value 130, the page have min value 100 and number
of tuples is 20. We estimate that on average a page max value minus min value is 100 and number of tuples is 30. So each tuple increments
range by 3. So our page with min value of 100 and 20 tuples have future closing max value of 200, as our value is 130 we proceed to insert.
I don't know if there are enough stats for this already, or if BRIN can help here. But it is doable.

Now for the geo part. This is the least thought through. We definitely need an index to consult MBR and use gist penalty functions to decide.

Do you think the above rough design is doable and desirable?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-03-06 18:51:53 Re: using index or check in ALTER TABLE SET NOT NULL
Previous Message David Steele 2018-03-06 18:32:49 Re: PATCH: Configurable file mode mask