Re: zheap: a new storage format for PostgreSQL

From: Darafei "Komяpa" Praliaskouski <me(at)komzpa(dot)net>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Adam Brusselback <adambrusselback(at)gmail(dot)com>, Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>, "Tsunakawa, Takayuki" <tsunakawa(dot)takay(at)jp(dot)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: zheap: a new storage format for PostgreSQL
Date: 2018-11-20 07:23:25
Message-ID: CAC8Q8tKn_LB_m3SH=KMooZJcn6A1=FqiFcWRn36JJ=zq75pUNQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>
> > In PostGIS workloads, UPDATE table SET geom = ST_CostyFunction(geom,
> magicnumber); is one of biggest time-eaters that happen upon initial load
> and clean up of your data. It is commonly followed by CLUSTER table using
> table_geom_idx; to make sure you're back at full speed and no VACUUM is
> needed, and your table (usually static after that) is more-or-less
> spatially ordered. I see that zheap can remove the need for VACUUM, which
> is a big win already. If you can do something that will allow reorder of
> tuples according to index happen during an UPDATE that rewrites most of
> table, that would be a game changer :)
> >
>
> If the tuples are already in the order of the index, then we would
> retain the order, otherwise, we might not want to anything special for
> ordering w.r.t index. I think this is important as we are not sure of
> the user's intention and I guess it won't be easy to do such
> rearrangement during Update statement.
>

User's clustering intention is recorded in existence of CLUSTER index over
table. That's not used by anything other than CLUSTER command now though.

When I was looking into current heap implementation it seemed that it's
possible to hook in a lookup for a couple blocks with values adjacent to
the new value, and prefer them to FSM lookup and "current page", for
clustered table. Due to dead tuples, free space is going to end very very
soon in usual heap, so it probably doesn't make sense there - you're
consuming space with old one in old page and new one in new page.

If I understand correctly, in zheap an update would not result in a dead
tuple in old page, so space is not going to end immediately, and this may
unblock path for such further developments. That is, if there is a spot
where to plug in such or similar logic in code :)

I've described the business case in [1].

1:
https://www.postgresql.org/message-id/flat/CAC8Q8tLBeAxR%2BBXWuKK%2BHP5m8tEVYn270CVrDvKXt%3D0PkJTY9g%40mail.gmail.com

--
Darafei Praliaskouski
Support me: http://patreon.com/komzpa

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-11-20 07:31:33 Re: typo fix
Previous Message Haozhou Wang 2018-11-20 07:20:27 Re: Control your disk usage in PG: Introduction to Disk Quota Extension