jsonb format is pessimal for toast compression

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Larry White <ljw1001(at)gmail(dot)com>
Subject: jsonb format is pessimal for toast compression
Date: 2014-08-08 03:17:43
Message-ID: 27839.1407467863@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into the issue reported in bug #11109. The problem appears to be
that jsonb's on-disk format is designed in such a way that the leading
portion of any JSON array or object will be fairly incompressible, because
it consists mostly of a strictly-increasing series of integer offsets.
This interacts poorly with the code in pglz_compress() that gives up if
it's found nothing compressible in the first first_success_by bytes of a
value-to-be-compressed. (first_success_by is 1024 in the default set of
compression parameters.)

As an example, here's gdb's report of the bitwise representation of the
example JSON value in the bug thread:

0x2ab85ac: 0x20000005 0x00000004 0x50003098 0x0000309f
0x2ab85bc: 0x000030ae 0x000030b8 0x000030cf 0x000030da
0x2ab85cc: 0x000030df 0x000030ee 0x00003105 0x6b6e756a
0x2ab85dc: 0x400000de 0x00000034 0x00000068 0x0000009c
0x2ab85ec: 0x000000d0 0x00000104 0x00000138 0x0000016c
0x2ab85fc: 0x000001a0 0x000001d4 0x00000208 0x0000023c
0x2ab860c: 0x00000270 0x000002a4 0x000002d8 0x0000030c
0x2ab861c: 0x00000340 0x00000374 0x000003a8 0x000003dc
0x2ab862c: 0x00000410 0x00000444 0x00000478 0x000004ac
0x2ab863c: 0x000004e0 0x00000514 0x00000548 0x0000057c
0x2ab864c: 0x000005b0 0x000005e4 0x00000618 0x0000064c
0x2ab865c: 0x00000680 0x000006b4 0x000006e8 0x0000071c
0x2ab866c: 0x00000750 0x00000784 0x000007b8 0x000007ec
0x2ab867c: 0x00000820 0x00000854 0x00000888 0x000008bc
0x2ab868c: 0x000008f0 0x00000924 0x00000958 0x0000098c
0x2ab869c: 0x000009c0 0x000009f4 0x00000a28 0x00000a5c
0x2ab86ac: 0x00000a90 0x00000ac4 0x00000af8 0x00000b2c
0x2ab86bc: 0x00000b60 0x00000b94 0x00000bc8 0x00000bfc
0x2ab86cc: 0x00000c30 0x00000c64 0x00000c98 0x00000ccc
0x2ab86dc: 0x00000d00 0x00000d34 0x00000d68 0x00000d9c
0x2ab86ec: 0x00000dd0 0x00000e04 0x00000e38 0x00000e6c
0x2ab86fc: 0x00000ea0 0x00000ed4 0x00000f08 0x00000f3c
0x2ab870c: 0x00000f70 0x00000fa4 0x00000fd8 0x0000100c
0x2ab871c: 0x00001040 0x00001074 0x000010a8 0x000010dc
0x2ab872c: 0x00001110 0x00001144 0x00001178 0x000011ac
0x2ab873c: 0x000011e0 0x00001214 0x00001248 0x0000127c
0x2ab874c: 0x000012b0 0x000012e4 0x00001318 0x0000134c
0x2ab875c: 0x00001380 0x000013b4 0x000013e8 0x0000141c
0x2ab876c: 0x00001450 0x00001484 0x000014b8 0x000014ec
0x2ab877c: 0x00001520 0x00001554 0x00001588 0x000015bc
0x2ab878c: 0x000015f0 0x00001624 0x00001658 0x0000168c
0x2ab879c: 0x000016c0 0x000016f4 0x00001728 0x0000175c
0x2ab87ac: 0x00001790 0x000017c4 0x000017f8 0x0000182c
0x2ab87bc: 0x00001860 0x00001894 0x000018c8 0x000018fc
0x2ab87cc: 0x00001930 0x00001964 0x00001998 0x000019cc
0x2ab87dc: 0x00001a00 0x00001a34 0x00001a68 0x00001a9c
0x2ab87ec: 0x00001ad0 0x00001b04 0x00001b38 0x00001b6c
0x2ab87fc: 0x00001ba0 0x00001bd4 0x00001c08 0x00001c3c
0x2ab880c: 0x00001c70 0x00001ca4 0x00001cd8 0x00001d0c
0x2ab881c: 0x00001d40 0x00001d74 0x00001da8 0x00001ddc
0x2ab882c: 0x00001e10 0x00001e44 0x00001e78 0x00001eac
0x2ab883c: 0x00001ee0 0x00001f14 0x00001f48 0x00001f7c
0x2ab884c: 0x00001fb0 0x00001fe4 0x00002018 0x0000204c
0x2ab885c: 0x00002080 0x000020b4 0x000020e8 0x0000211c
0x2ab886c: 0x00002150 0x00002184 0x000021b8 0x000021ec
0x2ab887c: 0x00002220 0x00002254 0x00002288 0x000022bc
0x2ab888c: 0x000022f0 0x00002324 0x00002358 0x0000238c
0x2ab889c: 0x000023c0 0x000023f4 0x00002428 0x0000245c
0x2ab88ac: 0x00002490 0x000024c4 0x000024f8 0x0000252c
0x2ab88bc: 0x00002560 0x00002594 0x000025c8 0x000025fc
0x2ab88cc: 0x00002630 0x00002664 0x00002698 0x000026cc
0x2ab88dc: 0x00002700 0x00002734 0x00002768 0x0000279c
0x2ab88ec: 0x000027d0 0x00002804 0x00002838 0x0000286c
0x2ab88fc: 0x000028a0 0x000028d4 0x00002908 0x0000293c
0x2ab890c: 0x00002970 0x000029a4 0x000029d8 0x00002a0c
0x2ab891c: 0x00002a40 0x00002a74 0x00002aa8 0x00002adc
0x2ab892c: 0x00002b10 0x00002b44 0x00002b78 0x00002bac
0x2ab893c: 0x00002be0 0x00002c14 0x00002c48 0x00002c7c
0x2ab894c: 0x00002cb0 0x00002ce4 0x00002d18 0x32343231
0x2ab895c: 0x74653534 0x74656577 0x33746577 0x77673534
0x2ab896c: 0x74657274 0x33347477 0x72777120 0x20717771
0x2ab897c: 0x65727771 0x20777120 0x66647372 0x73616b6c
0x2ab898c: 0x33353471 0x71772035 0x72777172 0x71727771
0x2ab899c: 0x77203277 0x72777172 0x71727771 0x33323233
0x2ab89ac: 0x6b207732 0x20657773 0x73616673 0x73207372
0x2ab89bc: 0x64736664 0x32343231 0x74653534 0x74656577
0x2ab89cc: 0x33746577 0x77673534 0x74657274 0x33347477
0x2ab89dc: 0x72777120 0x20717771 0x65727771 0x20777120
0x2ab89ec: 0x66647372 0x73616b6c 0x33353471 0x71772035
0x2ab89fc: 0x72777172 0x71727771 0x77203277 0x72777172
0x2ab8a0c: 0x71727771 0x33323233 0x6b207732 0x20657773
0x2ab8a1c: 0x73616673 0x73207372 0x64736664 0x32343231
0x2ab8a2c: 0x74653534 0x74656577 0x33746577 0x77673534
0x2ab8a3c: 0x74657274 0x33347477 0x72777120 0x20717771
0x2ab8a4c: 0x65727771 0x20777120 0x66647372 0x73616b6c
0x2ab8a5c: 0x33353471 0x71772035 0x72777172 0x71727771
0x2ab8a6c: 0x77203277 0x72777172 0x71727771 0x33323233
0x2ab8a7c: 0x6b207732 0x20657773 0x73616673 0x73207372
0x2ab8a8c: 0x64736664 0x32343231 0x74653534 0x74656577
0x2ab8a9c: 0x33746577 0x77673534 0x74657274 0x33347477
0x2ab8aac: 0x72777120 0x20717771 0x65727771 0x20777120
0x2ab8abc: 0x66647372 0x73616b6c 0x33353471 0x71772035
0x2ab8acc: 0x72777172 0x71727771 0x77203277 0x72777172
0x2ab8adc: 0x71727771 0x33323233 0x6b207732 0x20657773
0x2ab8aec: 0x73616673 0x73207372 0x64736664 0x32343231
0x2ab8afc: 0x74653534 0x74656577 0x33746577 0x77673534
...
0x2abb61c: 0x74657274 0x33347477 0x72777120 0x20717771
0x2abb62c: 0x65727771 0x20777120 0x66647372 0x73616b6c
0x2abb63c: 0x33353471 0x71772035 0x72777172 0x71727771
0x2abb64c: 0x77203277 0x72777172 0x71727771 0x33323233
0x2abb65c: 0x6b207732 0x20657773 0x73616673 0x73207372
0x2abb66c: 0x64736664 0x537a6962 0x41706574 0x73756220
0x2abb67c: 0x73656e69 0x74732073 0x45617065 0x746e6576
0x2abb68c: 0x656d6954 0x34313032 0x2d38302d 0x32203730
0x2abb69c: 0x33323a31 0x2e33333a 0x62393434 0x6f4c7a69
0x2abb6ac: 0x69746163 0x61506e6f 0x74736972 0x736e6172
0x2abb6bc: 0x69746361 0x61446e6f 0x30326574 0x302d3431
0x2abb6cc: 0x37302d38 0x3a313220 0x333a3332 0x34342e33

There is plenty of compressible data once we get into the repetitive
strings in the payload part --- but that starts at offset 944, and up to
that point there is nothing that pg_lzcompress can get a handle on. There
are, by definition, no sequences of 4 or more repeated bytes in that area.
I think in principle pg_lzcompress could decide to compress the 3-byte
sequences consisting of the high-order 24 bits of each offset; but it
doesn't choose to do so, probably because of the way its lookup hash table
works:

* pglz_hist_idx -
*
* Computes the history table slot for the lookup by the next 4
* characters in the input.
*
* NB: because we use the next 4 characters, we are not guaranteed to
* find 3-character matches; they very possibly will be in the wrong
* hash list. This seems an acceptable tradeoff for spreading out the
* hash keys more.

For jsonb header data, the "next 4 characters" are *always* different, so
only a chance hash collision can result in a match. There is therefore a
pretty good chance that no compression will occur before it gives up
because of first_success_by.

I'm not sure if there is any easy fix for this. We could possibly change
the default first_success_by value, but I think that'd just be postponing
the problem to larger jsonb objects/arrays, and it would hurt performance
for genuinely incompressible data. A somewhat painful, but not yet
out-of-the-question, alternative is to change the jsonb on-disk
representation. Perhaps the JEntry array could be defined as containing
element lengths instead of element ending offsets. Not sure though if
that would break binary searching for JSON object keys.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2014-08-08 03:22:17 Re: Reporting the commit LSN at commit time
Previous Message Michael Paquier 2014-08-08 03:12:04 Re: Reporting the commit LSN at commit time