Remove 1MB size limit in tsvector

From: Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Remove 1MB size limit in tsvector
Date: 2017-08-01 14:08:46
Message-ID: 20170801170846.66e3ab06@wp.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, hackers!

Historically tsvector type can't hold more than 1MB data.
I want to propose a patch that removes that limit.

That limit is created by 'pos' field from WordEntry, which have only
20 bits for storage.

In the proposed patch I removed this field and instead of it I keep
offsets only at each Nth item in WordEntry's array. Now I set N as 4,
because it gave best results in my benchmarks. It can be increased in
the future without affecting already saved data in database. Also
removing the field improves compression of tsvectors.

I simplified the code by creating functions that can be used to
build tsvectors. There were duplicated code fragments in places where
tsvector was built.

Also new patch frees some space in WordEntry that can be used to
save some additional information about saved words.

Ildus Kurbangaliev
Postgres Professional:
Russian Postgres Company

Attachment Content-Type Size
tsvector_stretched_v1.patch text/x-patch 78.6 KB


Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2017-08-01 14:11:56 Re: Proposal for CSN based snapshots
Previous Message Tom Lane 2017-08-01 13:49:13 Re: PostgreSQL 10 (latest beta) and older ICU