Re: sha1, sha2 functions into core?

From: Daniel Farina <daniel(at)heroku(dot)com>
To: "Ross J(dot) Reedstrom" <reedstrm(at)rice(dot)edu>
Cc: Marko Kreen <markokr(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sha1, sha2 functions into core?
Date: 2011-09-01 21:44:16
Message-ID: CAAZKuFYeJfHU2EpgNSPLB3-CMk1D1fmKGtAAhkE1-ESEvQVpcg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Aug 31, 2011 at 11:12 AM, Ross J. Reedstrom <reedstrm(at)rice(dot)edu> wrote:
> Hmm, this thread seems to have petered out without a conclusion. Just
> wanted to comment that there _are_ non-password storage uses for these
> digests: I use them in a context of storing large files in a bytea
> column, as a means to doing data deduplication, and avoiding pushing
> files from clients to server and back.

Yes, agreed: there is no decent content-addressing type in PostgreSQL,
so one rolls their own using shas and joins; I've seen this more than
once. It's a useful way to get non-bloated index on a series of
(larger than sha1) values where one only cares about the equality
operator (hash indexes, as unattractive as they were before in
PostgreSQL's implementation are even less so now with streaming
replication).

When that content to be addressed can be submitted from another
source, anything with md5 is correctly met with suspicion. We have
gone to the trouble of using pgcrypto to get sha1 access, but I know
of other applications that would have preferred to use sha but settle
for md5 simply because it's known to be bundled in core everywhere.

CREATE EXTENSION -- particularly if there is *any* way (is there? even
with ugliness like utility statement hooks) to configure it on the
provider end to not require superuser for common extensions like
'pgcrypto' -- could ablate this issue and one could get off the hash
"treadmill", including md5 -- but I think that would be a mistake.
Applications need a high quality digest to enable any kind of
principled content addressing use case, and I think making that any
harder than a builtin is going to negatively impact the state of
things at large. As a compromise, I'd also be happy with making
CREATE EXTENSION so trivial that everyone who has that use case can
get pgcrypto on any hosting provider.

--
fdr

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2011-09-01 22:02:22 Re: strange row number estimates in pg9.1rc1
Previous Message Tomas Vondra 2011-09-01 19:59:56 Re: PATCH: regular logging of checkpoint progress