Re: [proposal] de-TOAST'ing using a iterator

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <x4mmm(at)yandex-team(dot)ru>, Atri Sharma <atri(dot)jiit(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Владимир Лесков <vladimirlesk(at)yandex-team(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [proposal] de-TOAST'ing using a iterator
Date: 2019-07-16 14:14:41
Message-ID: CAL-OGku4+Q-9fcSe0=SKgoiQy1Hdggqb44dQ2C7EXzWmiZzM8A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi, John

First, I'd like to advocate for caution when using synthetic
> benchmarks involving compression. Consider this test:
> insert into detoast_c (a)
> select
> 'abc'||
> repeat(
> (SELECT string_agg(md5(chr(i)), '')
> FROM generate_series(1,127) i)
> , 10000)
> ||'xyz'
> from generate_series(1,100);
> The results for the uncompressed case were not much different then
> your test. However, in the compressed case the iterator doesn't buy us
> much with beginning searches since full decompression is already fast:
> master patch
> comp. beg. 869ms 837ms
> comp. end 14100ms 16100ms
> uncomp. beg. 6360ms 800ms
> uncomp. end 21100ms 21400ms
> and with compression it's 14% slower searching to the end. This is
> pretty contrived, but I include it for demonstration.

I've reproduced the test case with test scripts in the attachment on my
laptop:

master patch
comp. beg. 2686.77 ms 1532.79 ms
comp. end 17971.8 ms 21206.3 ms
uncomp. beg. 8358.79 ms 1556.93 ms
uncomp. end 23559.7 ms 22547.1 ms

In the compressed beginning case, the test result is different from yours
since the patch is ~1.75x faster
rather than no improvement. The interesting thing is that the patch if 4%
faster than master in the uncompressed end case.
I can't figure out reason now.

Reading the thread where you're working on optimizing partial
> decompression [1], it seems you have two separate solutions for the
> two problems. Maybe this is fine, but I'd like to bring up the
> possibility of using the same approach for both kinds of callers.

> I'm not an expert on TOAST, but maybe one way to solve both problems
> is to work at the level of whole TOAST chunks. In that case, the
> current patch would look like this:
> 1. The caller requests more of the attribute value from the de-TOAST
> iterator.
> 2. The iterator gets the next chunk and either copies or decompresses
> the whole chunk into the buffer. (If inline, just decompress the whole
> thing)

Thanks for your suggestion. It is indeed possible to implement
PG_DETOAST_DATUM_SLICE using the de-TOAST iterator.
IMO the iterator is more suitable for situations where the caller doesn't
know the slice size. If the caller knows the slice size,
it is reasonable to fetch enough chunks at once and then decompress it at
once.
--
Best regards,
Binguo Bao

Attachment Content-Type Size
init-test.sh application/x-shellscript 704 bytes
iterator-test.sh application/x-shellscript 649 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-07-16 14:33:06 Re: A little report on informal commit tag usage
Previous Message Justin Pryzby 2019-07-16 14:06:36 Re: make \d pg_toast.foo show its indices ; and, \d toast show its main table ; and \d relkind=I show its partitions