Re: Enhancing phonetic search support for more languages - GSoC 2010

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Dhiraj Lohiya <lohiya(dot)dhiraj(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>, Selena Deckelmann <selenamarie(at)gmail(dot)com>, Dave Page <dpage(at)pgadmin(dot)org>
Subject: Re: Enhancing phonetic search support for more languages - GSoC 2010
Date: 2010-04-07 23:39:52
Message-ID: 4BBD17C8.3040509@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dhiraj,

> For instance, if many users(above a threshold set by us) insert some
> search string for which no wanted search result is retrieved, we
> could track what he finally selects and then accordingly append/modify
> our set of phonetic rules based on the phonetic mismatch amongst the
> query inserted and result wanted according to our set of rules. Using
> this, the * rule sets it could evolve itself when we collect usage
> statistics from users based on their experience. * This feature would
> add a new dimension to the search functionality and would surely stand
> out.

You're mixing two completely different kinds of features here. One is a
backend function and the other is an application for building soundex
rules. While both of these are interesting projects, it is unlikely you
can complete both in one summer.

What I'd suggest focussing on for SoC is creating a new soundex funciton
(suggested name: soundex_ml) which includes a facility for loadable
algorithms and callability on a per-language basis. That would be
plenty of work by itself. From there, you could then continue your
undergraduate work on the tool to build the algorithms in the first place.

I'm also curious why you chose to focus on the extremely imprecise
soundex instead of the more discriminating metaphone.

--
-- Josh Berkus
PostgreSQL Experts Inc.
http://www.pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-04-08 00:21:34 Re: Enhancing phonetic search support for more languages - GSoC 2010
Previous Message Tom Lane 2010-04-07 20:46:14 FM suffix in to_char Y/YY/YYY still screwy