Re: zheap: a new storage format for PostgreSQL

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: me(at)komzpa(dot)net
Cc: Adam Brusselback <adambrusselback(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: zheap: a new storage format for PostgreSQL
Date: 2018-11-19 03:55:29
Message-ID: CAA4eK1L9QEFtOgfxV23qnBwETNvfcZMBnvRSod-sxtpNh4qGcA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Nov 18, 2018 at 3:42 PM Darafei "Komяpa" Praliaskouski
<me(at)komzpa(dot)net> wrote:
>
> On Sat, Nov 17, 2018 at 8:51 AM Adam Brusselback <adambrusselback(at)gmail(dot)com> wrote:
>>
>> > I don't know how much what I write on this thread is read by others or
>> how useful this is for others who are following this work
>>
>> I've been following this thread and many others like it, silently soaking it up, because I don't feel like i'd have anything useful to add in most cases. It is very interesting seeing the development take place though, so just know it's appreciated at least from my perspective.
>
> I'm also following the development and have hopes about it going forward. Not much low-level details I can comment on though :)
>
> In PostGIS workloads, UPDATE table SET geom = ST_CostyFunction(geom, magicnumber); is one of biggest time-eaters that happen upon initial load and clean up of your data. It is commonly followed by CLUSTER table using table_geom_idx; to make sure you're back at full speed and no VACUUM is needed, and your table (usually static after that) is more-or-less spatially ordered. I see that zheap can remove the need for VACUUM, which is a big win already. If you can do something that will allow reorder of tuples according to index happen during an UPDATE that rewrites most of table, that would be a game changer :)
>

If the tuples are already in the order of the index, then we would
retain the order, otherwise, we might not want to anything special for
ordering w.r.t index. I think this is important as we are not sure of
the user's intention and I guess it won't be easy to do such
rearrangement during Update statement.

> Another story is Visibility Map and Index-Only Scans. Right now there is a huge gap between the insert of rows and the moment they are available for index only scan, as VACUUM is required. Do I understand correctly that for zheap this all can be inverted, and UNDO can become "invisibility map" that may be quite small and discarded quickly?
>

Yeah, eventually that is our goal with the help of delete-marking in
indexes, however, for the first version, we still need to rely on
visibility maps for index-only-scans.

Thank you for showing interest in this work.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2018-11-19 04:05:43 Re: BUG #15449: file_fdw using program cause exit code error when using LIMIT
Previous Message Amit Kapila 2018-11-19 03:39:24 Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query