Re: GIN vs. Partial Indexes

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, postgres hackers <pgsql-hackers(at)postgresql(dot)org>, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>
Subject: Re: GIN vs. Partial Indexes
Date: 2010-10-08 21:44:50
Message-ID: AANLkTi=kQK3UFrQ5G82uMpdyy0bnS9KWSKAaovvyJNaZ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 8, 2010 at 1:47 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> On Thu, Oct 7, 2010 at 10:52 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> IMO, what's needed is to fix GIN so it doesn't go insane for empty
>>> values or non-restrictive queries, by ensuring there's at least one
>>> index entry for every row.  This has been discussed before; see the TODO
>>> section for GIN.
>
>> That seems like it could waste an awful lot of disk space (and
>> therefore I/O, etc.).  No?
>
> How so?  In a typical application, there would not likely be very many
> such rows --- we're talking about cases like documents containing zero
> indexable words.  In any case, the problem right now is that GIN has
> significant functional limitations because it fails to make any index
> entry at all for such rows.  Even if there are in fact no such rows
> in a particular table, it has to fail on some queries because there
> *might* be such rows.  There is no way to fix those limitations
> unless it undertakes to have some index entry for every row.  That
> will take disk space, but it's *necessary*.  (To adapt the old saw,
> I can make this index arbitrarily small if it doesn't have to give
> the right answers.)
>
> In any case, I would expect that GIN could actually do this quite
> efficiently.  What we'd probably want is a concept of a "null word",
> with empty indexable rows entered in the index as if they contained the
> null word.  So there'd be just one index entry with a posting list of
> however many such rows there are.

<thinks about it more>

Yeah, I think you're right.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2010-10-08 21:59:30 Re: Sync Replication with transaction-controlled durability
Previous Message Simon Riggs 2010-10-08 21:05:09 Re: Issues with Quorum Commit