Re: GSoC

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>
Cc: Anindya Jyoti Roy <anindyar(at)iitk(dot)ac(dot)in>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: GSoC
Date: 2010-03-31 00:07:33
Message-ID: 603c8f071003301707x4f5af8c9ua6e23c1b16b8a29d@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 30, 2010 at 9:36 AM, Dimitri Fontaine
<dfontaine(at)hi-media(dot)com> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>
>> On Tue, Mar 30, 2010 at 12:56 AM, Anindya Jyoti Roy <anindyar(at)iitk(dot)ac(dot)in> wrote:
>>> As Jeff Davis pointed out, I followed the modification he suggested and now
>>> I want to have a basic matching only. I think atleast the fingerprint
>>> processing can be done in summer (if not the image processing). Is it a good
>>> GSoC project now?
>>
>> I'm not sure.  Can you provide a more detailed design?
>
> Apply the following to fingerprint searches ?
>
>  http://www.postgresql.org/docs/current/static/gist-implementation.html
>  http://wiki.postgresql.org/wiki/Image:Prato_2008_prefix.pdf
>
> I guess that what remains to be defined is how you get those
> fingerprint, what the datatype is named, is it fixed size or varlena,
> what operators you want to make available, and which will have index
> support. That's GiST + GIN, right ?

Well, yeah. I think the fingerprinting and operator support are the
real questions. My fear is that the student who is asking this
question does not really have a good handle on that aspect of the
project. Maybe I'm wrong. However the description that was given
was:

2> the database search engine will be able to search for image also
3> it will list the matching images in the order of degree of match.
4> in this matching system I will likely use the system of dividing
the image into important parts and match them.

That's pretty vague. If someone came and said, I'm going to use XYZ
system from the following academic papers, that would inspire a lot
more confidence, at least for me. Also I think this item from the
original email reflects a fundamental misunderstanding of how this
would integrate into PostgreSQL:

5> The database will also contain fingerprints, that may be the primary key.

Again, if the student had said, the XYZ system above will work well
with GIN indexing because we can construct the posting lists like
thus-and-so, or if they had said, it will work well with GIST because
there is a similarity metric we can use to construct the penalty and
picksplit functions, I would feel a lot better. But the description
given is so general in terms of both what is to be done on the image
processing side and what is to be done on the PostgreSQL side that I
am afraid that the student is going to be in far too deep. Compare
this description with the one from the student who wants to implement
JSON support - that sounds a whole lot closer to something that
someone (perhaps him) could sit down and code.

My point here is not to discourage anyone or turn them off on trying
to submit a GSoC project related to PostgreSQL. Indeed, I really hope
they do. But it will benefit the project much more if the projects
are small and successful than it will if they are large and not
successful, or successful according to some metric but not actually
producing code that will be widely used or merged into core.

...Robert

In response to

  • Re: GSoC at 2010-03-30 13:36:25 from Dimitri Fontaine

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-03-31 00:47:28 Re: pending patch: Re: HS/SR and smart shutdown
Previous Message Robert Haas 2010-03-30 22:01:28 Re: Alpha release this week?