Re: sha1, sha2 functions into core?

From: "ktm(at)rice(dot)edu" <ktm(at)rice(dot)edu>
To: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sha1, sha2 functions into core?
Date: 2011-09-03 18:59:39
Message-ID: 20110903185939.GX19360@staff-mud-56-27.rice.edu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 02, 2011 at 04:27:46PM -0500, Ross J. Reedstrom wrote:
> On Fri, Sep 02, 2011 at 02:05:45PM -0500, ktm(at)rice(dot)edu wrote:
> > On Fri, Sep 02, 2011 at 09:54:07PM +0300, Peter Eisentraut wrote:
> > > On ons, 2011-08-31 at 13:12 -0500, Ross J. Reedstrom wrote:
> > > > Hmm, this thread seems to have petered out without a conclusion. Just
> > > > wanted to comment that there _are_ non-password storage uses for these
> > > > digests: I use them in a context of storing large files in a bytea
> > > > column, as a means to doing data deduplication, and avoiding pushing
> > > > files from clients to server and back.
> > >
> > > But I suppose you don't need the hash function in the database system
> > > for that.
> > >
> >
> > It is very useful to have the same hash function used internally by
> > PostgreSQL exposed externally. I know you can get the code and add an
> > equivalent one of your own...
> >
> Thanks for the support Ken, but Peter's right: the only backend use in
> my particular case is to let the backend do the hash calc during bulk
> loads: in the production code path, having the hash in two places
> doesn't save any work, since the client code has to calculate the hash
> in order to test for its existence in the backend. I suppose if the
> network cost was negligable, I could just push the files anyway, and
> have a before-insert trigger calculate the hash and do the dedup: then
> it'd be hidden in the backend completely. But as is, I can do all the
> work in the client.
>

While it is true that it doesn't save any work. My motivation for having
it exposed is that "good" hash functions are non-trivial to find. I have
dealt with computational artifacts produced by hash functions that seemed
at first to be good. We use a very well behaved function within the data-
base and exposing it will help prevent bad user hash function
implementations.

Regards,
Ken

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dimitri Fontaine 2011-09-03 20:49:51 Re: pg_restore --no-post-data and --post-data-only
Previous Message Bruce Momjian 2011-09-03 15:12:11 Re: pg_upgrade automatic testing