WIP: store additional info in GIN index

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: WIP: store additional info in GIN index
Date: 2012-11-18 21:54:53
Message-ID: CAPpHfdtSt47PpRQBK6OawHePLJk8PF-wNhswaUpre7_+cc_kmA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hackers,

Attached patch enables GIN to store additional information with item
pointers in posting lists and trees.
Such additional information could be positions of words, positions of
trigrams, lengths of arrays and so on.
This is the first and most huge patch of serie of GIN improvements which
was presented at PGConf.EU
http://wiki.postgresql.org/images/2/25/Full-text_search_in_PostgreSQL_in_milliseconds-extended-version.pdf

Patch modifies GIN interface as following:
1) Two arguments are added to extractValue
Datum **addInfo, bool **addInfoIsNull
2) Two arguments are added to consistent
Datum addInfo[], bool addInfoIsNull[]
3) New method config is introduced which returns datatype oid of addtional
information (analogy with SP-GiST config method).

Patch completely changes storage in posting lists and leaf pages of posting
trees. It uses varbyte encoding for BlockNumber and OffsetNumber.
BlockNumber are stored incremental in page. Additionally one bit of
OffsetNumber is reserved for additional information NULL flag. To be able
to find position in leaf data page quickly patch introduces small index in
the end of page.

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
ginaddinfo.1.patch.gz application/x-gzip 31.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2012-11-18 22:49:19 Re: autovacuum stress-testing our system
Previous Message Andres Freund 2012-11-18 20:39:37 Re: Avoiding overflow in timeout-related calculations