[proposal] de-TOAST'ing using a iterator

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Владимир Лесков <vladimirlesk(at)yandex-team(dot)ru>, pgsql-hackers(at)postgresql(dot)org
Subject: [proposal] de-TOAST'ing using a iterator
Date: 2019-06-19 13:51:27
Message-ID: CAL-OGks_onzpc9M9bXPCztMofWULcFkyeCeKiAgXzwRL8kXiag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!
This proposal aims to provide the ability to de-TOAST a fully TOAST'd and
compressed field using an iterator and then update the appropriate parts of
the code to use the iterator where possible instead of de-TOAST'ing and
de-compressing the entire value. Examples where this can be helpful include
using position() from the beginning of the value, or doing a pattern or
substring match.

de-TOAST iterator overview:
1. The caller requests the slice of the attribute value from the de-TOAST
iterator.
2. The de-TOAST iterator checks if there is a slice available in the output
buffer, if there is, return the result directly,
otherwise goto the step3.
3. The de-TOAST iterator checks if there is the slice available in the
input buffer, if there is, goto step44. Otherwise,
call fetch_datum_iterator to fetch datums from disk to input buffer.
4. If the data in the input buffer is compressed, extract some data from
the input buffer to the output buffer until the caller's
needs are met.

I've implemented the prototype and apply it to the position() function to
test performance.
Test tables:
-----------------------------------------------------------------------------------------------------
create table detoast_c (id serial primary key,
a text
);
insert into detoast_c (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from
generate_series(1,100);

create table detoast_u (id serial primary key,
a text
);
alter table detoast_u alter a set storage external;
insert into detoast_u (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000)||'321' as a from
generate_series(1,100);
**************************************************************************************
-----------------------------------------------------------------------------------------------------
query |
master (ms) | patch (ms) |
-----------------------------------------------------------------------------------------------------
select position('123' in a) from detoast_c; | 4054.838 |
1440.735 |
-----------------------------------------------------------------------------------------------------
select position('321' in a) from detoast_c; | 25549.270 |
27696.245 |
-----------------------------------------------------------------------------------------------------
select position('123' in a) from detoast_u; | 8116.996 |
1386.802 |
-----------------------------------------------------------------------------------------------------
select position('321' in a) from detoast_u | 28442.116 |
27672.319 |
-----------------------------------------------------------------------------------------------------
**************************************************************************************
It can be seen that the iterator greatly improves the efficiency of partial
de-TOAST when it has almost no degradation in full de-TOAST efficiency.
Next, I will continue to study how to apply iterators to more queries
and improve iterator efficiency, such as using macros instead of function
calls.

The patch is also available on github[1].
Any suggestions or comments would be much appreciated:)

Best regards, Binguo Bao.

[1] https://github.com/djydewang/postgres/pull/1/files

Attachment Content-Type Size
0001-de-TOASTing-using-a-iterator.patch text/x-patch 22.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2019-06-19 13:57:29 Re: Index Skip Scan
Previous Message Dilip Kumar 2019-06-19 13:12:55 Re: POC: Cleaning up orphaned files using undo logs