Re: Postgres + Xapian (was Re: fulltext searching via a custom index type )

From: Eric Ridge <ebr(at)tcdi(dot)com>
To: Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Postgres + Xapian (was Re: fulltext searching via a custom index type )
Date: 2004-01-05 16:00:36
Message-ID: 4CF83855-3F98-11D8-ADB4-000A95BB5944@tcdi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Jan 2, 2004, at 4:54 PM, Alvaro Herrera wrote:
> I think your approach is too ugly. You will have tons of problems the
> minute you start thinking about concurrency (unless you want to allow
> only a single user accessing the index)

It might be ugly, but it's very fast. Surprisingly fast, actually.

Concerning concurrency, Xapian internally supports multiple readers and
only 1 concurrent writer. So the locking requirements should be far
less complex than a true concurrent solution. Now, I'm not arguing
that this ideal, but if Xapian is a search engine you're interested in,
then you've already made up your mind that you're willing to deal with
1 writer at a time.

However, Xapian does have built-in support for searching multiple
databases at once. One thought I've had is to simply create a new
1-document database on every INSERT/UPDATE beyond the initial CREATE
INDEX. Then whenever you do an index scan, tell Xapian to use all the
little databases that exist in the index. This would give some bit of
concurrency. Then on VACUUM (or FULL), all these little databases
could be merged back into the main index.

> and recovery (unless you want to force users to REINDEX when the
> system crashes).

I don't yet understand how the WAL stuff works. I haven't looked at
the API's yet, but if something you can record is "write these bytes to
this BlockNumber at this offset", or if you can say, "index Tuple X
from Relation Y", then it seems like recovery is still possible.

If ya can't do any of that, then I need to go look at WAL further.

> I think one way of attacking the problem would be using the existing
> nbtree by allowing it to store the five btrees. First read the README
> in the nbtree dir, and then poke at the metapage's only structure. You
> will see that it has a BlockNumber to the root page of the index.

Right, I had gotten this far in my investigation already. The daunting
thing about trying to use the nbtree code, is the a code itself. It's
very complex. Plus, I just don't know how well the rest of Xapian
would respond to all of a sudden having a concurrent backend. It's
likely that it would make no difference, but it's just an unknown to me
at this time.

> Try modifying that to make it have a BlockNumber to every index's root
> page.
> You will have to provide ways to access each root page and maybe other
> nonstandard things (such as telling the root split operation what root
> page are you going to split), but you will get recovery and concurrency
> (at least to a point) for free.

And I'm not convinced that recovery and concurrency would be "for free"
in this case either. The need to keep essentially 5 different trees in
sync greatly complicates the concurrency issue, I would think.

thanks for your time!

eric

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Glenn Wiorek 2004-01-05 16:13:11 Re: [HACKERS] Announce: Search PostgreSQL related resources
Previous Message Dave Cramer 2004-01-05 15:51:34 Re: [HACKERS] Announce: Search PostgreSQL related resources

Browse pgsql-hackers by date

  From Date Subject
Next Message Glenn Wiorek 2004-01-05 16:13:11 Re: [HACKERS] Announce: Search PostgreSQL related resources
Previous Message Tom Lane 2004-01-05 16:00:30 Re: Proposed Query Planner TODO items