Re: Why not keeping positions in GIN?

From: "Hitoshi Harada" <hitoshi_harada(at)forcia(dot)com>
To: "'Oleg Bartunov'" <oleg(at)sai(dot)msu(dot)su>
Cc: <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why not keeping positions in GIN?
Date: 2007-05-26 13:30:16
Message-ID: 000001c79f99$ff6106f0$5f01a8c0@daraha
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree,
> n-gram index would be more universal for asian languages.
Yeah, I know, but in tsearch2 for japanese sample you must use external
morphological analysis libraries to separate words. It is powerful but I
need more "lightweight" approach. Also especially when you search for
non-document(such like titles, names, or pattern in the genome), the
approach above is not so useful.

As I mentioned, GIN is also powerful for array data type search, so I am
very expecting it will have additional information.

Anyway, thanks a lot for much information. I try to read it.

Regards,

Hitoshi Harada

> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg(at)sai(dot)msu(dot)su]
> Sent: Saturday, May 26, 2007 10:12 PM
> To: Hitoshi Harada
> Cc: pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] Why not keeping positions in GIN?
>
> On Fri, 25 May 2007, Hitoshi Harada wrote:
>
> > Hi,
> >
> > I was walking through GIN am source code these days, and found that it
has
> > only posting lists but no positions related those.
> >
> > The reason I was doing that is, to try to implement n-gram text search
index
> > on GIN for myself. As you know Japanese is not like English or other
> > European languages. If you write Japanese (or other 'not separated')
text
> > index by n-gram, it should have entry positions on the entry as well as
the
> > posting lists, because you must know if each split query key are joined
with
> > each other in the data. To know this, position must be there.
>
> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree,
> n-gram index would be more universal for asian languages.
>
> >
> > It's not only about Japanese. When you search "phrase" for text in
English,
> > the same logic above will be needed. I don't research about tsearch2 but
is
> > there any problem?? Also, in some case int-array inverted index needs
the
> > entry positions as well, I guess. Obtaining positions with posting lists
is
> > "general" enough for GIN, isn't it?
> >
> > Is there any future plan around it?
>
> Yes, we do have plans. See our todo,
http://www.sai.msu.su/~megera/wiki/todo
> You may read also FTSBOOK, http://www.sai.msu.su/~megera/postgres/fts/doc
> and slides from PGCon2007,
> http://www.sai.msu.su/~megera/postgres/talks/fts-pgcon2007.pdf
> >
> >
> > Regards,
> >
> > Hitoshi Harada
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> > http://archives.postgresql.org
> >
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg(at)sai(dot)msu(dot)su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jan Wieck 2007-05-26 15:19:29 Constraint exclusion crashes 8.3devel
Previous Message Oleg Bartunov 2007-05-26 13:12:14 Re: Why not keeping positions in GIN?