Quick Links

Re: Status of the table access method work

From:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Dmitry Dolgov <9erthalion6(at)gmail(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject:	Re: Status of the table access method work
Date:	2019-04-10 17:14:17
Message-ID:	CAPpHfdu286b3ALZLxvCJ_dD-ccdRwv2GV-2e=WyCMTGSh0w3xQ@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Fri, Apr 5, 2019 at 11:25 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I want to thank Haribabu, Alvaro, Alexander, David, Dmitry and all the
> others that collaborated on making tableam happen. It was/is a huge
> project.

Thank you so much for bringing this project to commit! Excellent work!

Your explanation of existing limitations looks very good and
convincing. But I think there is one you didn't mention. We require
new table AMs to basically save old "contract" between heap and
indexes. We have "all or nothing" choice during updates. So, storage
can either insert to none of indexes or insert to all of them
(HOT-like update). I think any storage, which is going to address
"write amplification" point raised by Uber, needs to break this
"contract".

For example, zheap is promised to implement delete-marking indexes.
But it's not yet published. And for me it's not clear that this
approach is better among the alternatives. With delete-marking
approach you need to update index tuples corresponding to old values
of updated fields. But additionally to that it's not trivial to
delete index tuple. In order to do that, you need to both locate this
index tuple and know that this index value isn't present in undo
chain. So, it's likely required another index lookup during purging
of undo chain. Thus, we basically need to random lookup index twice
for every deleted index tuple. Also, it becomes more complex to
lookup appropriate heap tuple during index scan. Then you need to
check not only visibility, but also matching index value (here we need
to adjust index_fetch_tuple interface). Because it might happen that
visible to you version have different index value. That may lead to
O(N^2) performance while accessing single row with N versions (MySQL
InnoDB has this problem).

Alternative idea is to have MVCC-aware indexes. This approach looks
more attractive for me. In this approach you basically need xmin,
xmax fields in index tuples. On insertion of index tuple you fill
it's xmin. On update, previous index tuple is marked with xmax.
After that outdated index tuples might be deleted in the lazy manner
when page space is required. So, only one random access is required
for deleted index tuple. With this approach fetching single row is
O(N). Also, index-only scan becomes very easy and doesn't even need a
visibility map. The only problem here is extra space requirements for
index tuples. But I think, this is well-isolated problem, which is
easy to attack. For instance, some visibility information could be
evicted to undo chain (like zheap does for its tuples). Also, we can
have special bit for "all visible" index tuples. With "all visible"
bit set this tuple can get rid of visibility fields. We can do this
for index tuples, because if index tuple requires extra space we can
split the page, in spite of heap where tuples are fixed in pages and
xmax needs to be updated in-place.

I understand that delete-marking indexes have some advantages, and
some people find them more appealing. But my point is that we
shouldn't builtin one of this approaches into API unless we have
concrete proof that this approach is strongly overcomes another. It
would be better to have our table-AM API flexible enough to implement
both. I can imagine we have proper encapsulation here bringing more
interaction with indexes to the table AM side.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Status of the table access method work at 2019-04-05 20:25:38 from Andres Freund

Responses

Re: Status of the table access method work at 2019-04-10 17:32:16 from Andres Freund
Re: Status of the table access method work at 2019-04-18 06:27:06 from Simon Riggs
Re: Status of the table access method work at 2019-06-11 15:47:11 from Alvaro Herrera

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-04-10 17:32:16	Re: Status of the table access method work
Previous Message	Robert Haas	2019-04-10 17:08:55	Re: block-level incremental backup