Optimize partial TOAST decompression

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: simon(at)2ndquadrant(dot)com
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Optimize partial TOAST decompression
Date: 2019-06-02 14:48:34
Message-ID: CAL-OGkthU9Gs7TZchf5OWaL-Gsi=hXqufTxKv9qpNG73d5na_g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, hackers!
I'm a student participating in GSoC 2019 and my project is related to TOAST
slices.
When I'm getting familiar with the postgresql codebase, I find that
PG_DETOAST_DATUM_SLICE, when to run on a compressed TOAST entry, will fetch
all compressed data chunks then extract the relevant slice. Obviously, this
is unnecessary, we only need to fetch the data chunks we need.

The patch optimizes partial TOAST decompression.
For an example of the improvement possible, this trivial example:
---------------------------------------------------------------------
create table slicingtest (
id serial primary key,
a text
);

insert into slicingtest (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 1000000) as a from
generate_series(1,100);
\timing
select sum(length(substr(a, 0, 20))) from slicingtest;
---------------------------------------------------------------------
environment: Linux 4.15.0-33-generic #36~16.04.1-Ubuntu x86_64 GNU/Linux
On master, I get
Time: 28.123 ms (Take ten times average)
With the patch, I get
Time: 2.306 ms (take ten times average)

This seems to have a 10x improvement. If the number of toast data chunks is
more, I believe that patch can play a greater role, there are about 200
related TOAST data chunks for each entry in the case.

Related discussion:
https://www.postgresql.org/message-id/flat/CACowWR07EDm7Y4m2kbhN_jnys%3DBBf9A6768RyQdKm_%3DNpkcaWg%40mail.gmail.com

Best regards, Binguo Bao.

Attachment Content-Type Size
0001-Optimize-partial-TOAST-decompression.patch text/x-patch 2.5 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-06-02 16:53:14 Residual cpluspluscheck issues
Previous Message Alvaro Herrera 2019-06-02 04:35:14 Re: coverage increase for worker_spi