Re: pglz performance

From: Binguo Bao <djydewang(at)gmail(dot)com>
To: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pglz performance
Date: 2019-05-23 14:27:09
Message-ID: CAL-OGkuVCjsHfCE0wa9sz0CsMZvk53jYCW8a1Wi0TTagFgLsDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers!
I am a student participating in GSoC 2019. I am looking forward to working
with you all and learning from you.
My project would aim to provide the ability to de-TOAST a fully TOAST'd and
compressed field using an iterator.
For more details, please take a look at my proposal[0]. Any suggestions or
comments about my immature ideas would be much appreciated:)

I've implemented the first step of the project, the segment pglz
compression provides the ability to get the subset of the raw data without
decompressing the entire field.
And I've done some test[1] for the compressor. The test result is as
follows:
NOTICE: Test summary:
NOTICE: Payload 000000010000000000000001
NOTICE: Decompressor name | Compression time (ns/bit) |
Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 23.747444 |
0.578344 | 0.159809
NOTICE: pglz_decompress_hacked8 | 23.764193 |
0.677800 | 0.159809
NOTICE: pglz_decompress_hacked16 | 23.740351 |
0.704730 | 0.159809
NOTICE: pglz_decompress_vanilla | 23.797917 |
1.227868 | 0.159809
NOTICE: pglz_decompress_hacked_seg | 12.261808 |
0.625634 | 0.184952

Comment: Compression speed increased by nearly 100% with compression rate
dropped by 15%

NOTICE: Payload 000000010000000000000001 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 12.616956 |
0.621223 | 0.156953
NOTICE: pglz_decompress_hacked8 | 12.583685 |
0.756741 | 0.156953
NOTICE: pglz_decompress_hacked16 | 12.512636 |
0.774980 | 0.156953
NOTICE: pglz_decompress_vanilla | 12.493062 |
1.262820 | 0.156953
NOTICE: pglz_decompress_hacked_seg | 11.986554 |
0.622654 | 0.159590
NOTICE: Payload 000000010000000000000001 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 15.514469 |
0.565565 | 0.154213
NOTICE: pglz_decompress_hacked8 | 15.529144 |
0.699675 | 0.154213
NOTICE: pglz_decompress_hacked16 | 15.514040 |
0.721145 | 0.154213
NOTICE: pglz_decompress_vanilla | 15.558958 |
1.237237 | 0.154213
NOTICE: pglz_decompress_hacked_seg | 14.650309 |
0.563228 | 0.153652
NOTICE: Payload 000000010000000000000006
NOTICE: Decompressor name | Compression time (ns/bit) |
Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 8.610177 |
0.153577 | 0.052294
NOTICE: pglz_decompress_hacked8 | 8.566785 |
0.168002 | 0.052294
NOTICE: pglz_decompress_hacked16 | 8.643126 |
0.167537 | 0.052294
NOTICE: pglz_decompress_vanilla | 8.574498 |
0.930738 | 0.052294
NOTICE: pglz_decompress_hacked_seg | 7.394731 |
0.171924 | 0.056081
NOTICE: Payload 000000010000000000000006 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 6.724060 |
0.295043 | 0.065541
NOTICE: pglz_decompress_hacked8 | 6.623018 |
0.318527 | 0.065541
NOTICE: pglz_decompress_hacked16 | 6.898034 |
0.318360 | 0.065541
NOTICE: pglz_decompress_vanilla | 6.712711 |
1.045430 | 0.065541
NOTICE: pglz_decompress_hacked_seg | 6.630743 |
0.302589 | 0.068471
NOTICE: Payload 000000010000000000000006 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 6.624067 |
0.220942 | 0.058865
NOTICE: pglz_decompress_hacked8 | 6.659424 |
0.240183 | 0.058865
NOTICE: pglz_decompress_hacked16 | 6.763864 |
0.240564 | 0.058865
NOTICE: pglz_decompress_vanilla | 6.743574 |
0.985348 | 0.058865
NOTICE: pglz_decompress_hacked_seg | 6.613123 |
0.227582 | 0.060330
NOTICE: Payload 000000010000000000000008
NOTICE: Decompressor name | Compression time (ns/bit) |
Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 52.425957 |
1.050544 | 0.498941
NOTICE: pglz_decompress_hacked8 | 52.204561 |
1.261592 | 0.498941
NOTICE: pglz_decompress_hacked16 | 52.328491 |
1.466751 | 0.498941
NOTICE: pglz_decompress_vanilla | 52.465308 |
1.341271 | 0.498941
NOTICE: pglz_decompress_hacked_seg | 31.896341 |
1.113260 | 0.600998
NOTICE: Payload 000000010000000000000008 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 30.620611 |
0.768542 | 0.351941
NOTICE: pglz_decompress_hacked8 | 30.557334 |
0.907421 | 0.351941
NOTICE: pglz_decompress_hacked16 | 32.064903 |
1.208913 | 0.351941
NOTICE: pglz_decompress_vanilla | 30.489886 |
1.014197 | 0.351941
NOTICE: pglz_decompress_hacked_seg | 27.145243 |
0.774193 | 0.352868
NOTICE: Payload 000000010000000000000008 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 36.567903 |
1.054633 | 0.514047
NOTICE: pglz_decompress_hacked8 | 36.459124 |
1.267731 | 0.514047
NOTICE: pglz_decompress_hacked16 | 36.791718 |
1.479650 | 0.514047
NOTICE: pglz_decompress_vanilla | 36.241913 |
1.303136 | 0.514047
NOTICE: pglz_decompress_hacked_seg | 31.526327 |
1.059926 | 0.526875
NOTICE: Payload 16398
NOTICE: Decompressor name | Compression time (ns/bit) |
Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 9.508625 |
0.435190 | 0.071816
NOTICE: pglz_decompress_hacked8 | 9.546987 |
0.473871 | 0.071816
NOTICE: pglz_decompress_hacked16 | 9.534496 |
0.471662 | 0.071816
NOTICE: pglz_decompress_vanilla | 9.559053 |
1.352561 | 0.071816
NOTICE: pglz_decompress_hacked_seg | 8.479486 |
0.441536 | 0.073232
NOTICE: Payload 16398 sliced by 2Kb
NOTICE: pglz_decompress_hacked | 6.808167 |
0.326570 | 0.082775
NOTICE: pglz_decompress_hacked8 | 6.790743 |
0.361720 | 0.082775
NOTICE: pglz_decompress_hacked16 | 6.886097 |
0.364549 | 0.082775
NOTICE: pglz_decompress_vanilla | 6.918429 |
1.191265 | 0.082775
NOTICE: pglz_decompress_hacked_seg | 6.752811 |
0.340805 | 0.085705
NOTICE: Payload 16398 sliced by 4Kb
NOTICE: pglz_decompress_hacked | 7.244472 |
0.261872 | 0.076860
NOTICE: pglz_decompress_hacked8 | 7.290275 |
0.295988 | 0.076860
NOTICE: pglz_decompress_hacked16 | 7.340706 |
0.294683 | 0.076860
NOTICE: pglz_decompress_vanilla | 7.429289 |
1.151645 | 0.076860
NOTICE: pglz_decompress_hacked_seg | 7.054166 |
0.267896 | 0.078325
NOTICE: Payload shakespeare.txt
NOTICE: Decompressor name | Compression time (ns/bit) |
Decompression time (ns/bit) | ratio
NOTICE: pglz_decompress_hacked | 25.998753 |
1.345542 | 0.281363
NOTICE: pglz_decompress_hacked8 | 26.121630 |
1.917667 | 0.281363
NOTICE: pglz_decompress_hacked16 | 26.139312 |
2.101329 | 0.281363
NOTICE: pglz_decompress_vanilla | 26.155571 |
2.082123 | 0.281363
NOTICE: pglz_decompress_hacked_seg | 16.792089 |
1.951269 | 0.436558

Comment: In this case, the compression rate has dropped dramatically.

NOTICE: Payload shakespeare.txt sliced by 2Kb
NOTICE: pglz_decompress_hacked | 14.992793 |
1.923663 | 0.436270
NOTICE: pglz_decompress_hacked8 | 14.982428 |
2.695319 | 0.436270
NOTICE: pglz_decompress_hacked16 | 15.211803 |
2.846615 | 0.436270
NOTICE: pglz_decompress_vanilla | 15.113214 |
2.580098 | 0.436270
NOTICE: pglz_decompress_hacked_seg | 15.120852 |
1.922596 | 0.439199
NOTICE: Payload shakespeare.txt sliced by 4Kb
NOTICE: pglz_decompress_hacked | 18.083400 |
1.687598 | 0.366936
NOTICE: pglz_decompress_hacked8 | 18.185038 |
2.395928 | 0.366936
NOTICE: pglz_decompress_hacked16 | 18.096120 |
2.554812 | 0.366936
NOTICE: pglz_decompress_vanilla | 18.435380 |
2.329129 | 0.366936
NOTICE: pglz_decompress_hacked_seg | 18.103267 |
1.705517 | 0.368400
NOTICE:

Decompressor score (summ of all times):
NOTICE: Decompressor pglz_decompress_hacked result 11.288848
NOTICE: Decompressor pglz_decompress_hacked8 result 14.438165
NOTICE: Decompressor pglz_decompress_hacked16 result 15.716280
NOTICE: Decompressor pglz_decompress_vanilla result 21.034867
NOTICE: Decompressor pglz_decompress_hacked_seg result 12.090609
NOTICE:

compressor score (summ of all times):
NOTICE: compressor pglz_compress_vanilla result 276.776671
NOTICE: compressor pglz_compress_hacked_seg result 222.407850

There are some questions now:
1. The compression algorithm is not compatible with the original
compression algorithm now.
2. If the idea works, we need to test more data, what kind of data is more
appropriate?
Any comments are much appreciated.

Best regards, Binguo Bao.

[0]
https://docs.google.com/document/d/1V4oXV5vGrGx24deBTKKM7bVdO3Cy-zfj-wQ4dkBUCl4/edit
[1] https://github.com/djydewang/test_pglz

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-05-23 14:34:32 Re: "long" type is not appropriate for counting tuples
Previous Message Peter Eisentraut 2019-05-23 14:20:57 Re: "long" type is not appropriate for counting tuples