Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)

From: Kisung Kim <kskim(at)bitnine(dot)net>
To: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Date: 2016-08-10 23:24:15
Message-ID: CABF0Rr07yJop9OEgaWz5jFYoqqo_-cAv6g-Qy92VV9u5WMMwTA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've run YCSB(Yahoo! Cloud Service Benchmark) on PostgreSQL and MongoDB
with WiredTiger.
And I found some interesting results and some issues(maybe) on Btree index
of PostgreSQL.

Here is my experiments and results.
YCSB is for document store benchmark and I build following schema in PG.

CREATE TABLE usertable (
YCSB_KEY varchar(30) primary key,
FIELDS jsonb);

And the benchmark generated avg-300-byte-length Json documents and loaded
100M rows in PG and Mongo.

First I found that the size difference between PG and Mongo:
I configured Mongo not to use any compression for both storage and index.

MongoDB index size: 2.1 GB
PostgreSQL index size: 5.5 GB

When I used the index bloating estimation script in
https://github.com/ioguix/pgsql-bloat-estimation,
the result is as follows:
current_database | schemaname | tblname | idxname
| real_size | extra_size | extra_ratio | fillfactor |
bloat_size | bloat_ratio | is_na
------------------+------------+------------------+-----------------------------------+------------+------------+-------------------+------------+------------+-------------------+-------
ycsb | public | usertable | usertable_pkey
| 5488852992 | 2673713152 | 48.7116917850949 | 90 |
2362122240 | 43.0348971532448 | f

It says that the bloat_ratio is 42 for the index.

So, I rebuilded the index and the result was changed:

current_database | schemaname | tblname | idxname
| real_size | extra_size | extra_ratio | fillfactor |
bloat_size | bloat_ratio | is_na
------------------+------------+------------------+-----------------------------------+------------+------------+-------------------+------------+------------+-------------------+-------
ycsb | public | usertable | usertable_pkey
| 3154264064 | 339132416 | 10.7515543758863 | 90 |
27533312 | 0.872891788428275 | f

I am curious about the results
1) why the index was bloated even though rows were only inserted not
removed or updated.
2) And then why the bloating is removed when it is rebuilded

I guess that the splits of Btree nodes during inserting rows caused the
bloating but I don't know exact reasons.
And given that MongoDB's index size is much smaller than PG after they
handled the same workload (only insert),
then is there any chances to improve PG's index behavior.

Thank you very much.

--

Bitnine Global Inc., Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : kskim(at)bitnine(dot)net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2016-08-11 01:05:19 Re: Improved ICU patch - WAS: Implementing full UTF-8 support (aka supporting 0x00)
Previous Message Marko Tiikkaja 2016-08-10 23:09:26 Re: Assertion failure in REL9_5_STABLE