From: | Merlin Moncure <mmoncure(at)gmail(dot)com> |
---|---|
To: | Dominique Devienne <ddevienne(at)gmail(dot)com> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Aggregate versions of hashing functions (md5, sha1, etc...) |
Date: | 2025-07-10 19:02:59 |
Message-ID: | CAHyXU0ztigAau2x5zut6tdjNVXENEGR00F-Mvx4X-20zFz-NnQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
On Thu, Jul 10, 2025 at 5:48 AM Dominique Devienne <ddevienne(at)gmail(dot)com>
wrote:
> We store scientific information in PostgreSQL, and some of that is
> bytea and large, thus we must "chunk it" both for performance, and not
> be limited to 1GB (we do exceed that, in rare occasions).
>
> Recently I added md5/sha1 hashing support for such values (for various
> reasons, to track corruptions in our ETLs, now fixed, but also in the
> future for custom smart sync and such), and was surprised to discover
> there are no aggregate versions of those functions, neither the
> built-in md5 one (now with bytea overload), nor for pgcrypto. Did I
> miss something?
>
> Any chance this might be added in the future?
>
> Seems so logical to me, that these hashing functions were available
> are aggregates, I can't be the first one to think of that, can it?
>
> Thanks, --DD
>
> PS: The holly-grail IMHO, would be built-in support for hashing, with
> intelligent lazy-compute and persistence correctly reset on changes.
> Probably opt-in.
>
> PPS: Built-in Support for xxh64, or at least in official pgcrypto,
> would also be nice. With aggregate!
>
> PPPS: I remember Oracle LOBs supporting a built-in implicit mime-type
> attribute on them, which was used as the content-type with served over
> the built-in WebDAV extension for the DB. I wish PostgreSQL had
> something like that too.
>
> PPPPS: the lo extension is not viable for us, FWIW. All the above
> should be opt-in on bytea columns IMHO.
>
Does it have to match the md5 of the 'unchunked' variant exactly? If not,
maybe you can rig a custom aggregate that would just "hash amend" the
chunks using the n-1 chunk has as salt, this would be fast and easy, at the
cost of not matching the unchunked variant.
I had to do something very similar with compression, I had a need to be
able to compress bytea values with lz4 at SQL level due to limited support
for extensions in the cloud. it works great...if a bit slow, and requires
the database to handle the decompression.
merlin
From | Date | Subject | |
---|---|---|---|
Next Message | Greg Hennessy | 2025-07-10 19:39:12 | optimizing number of workers |
Previous Message | David G. Johnston | 2025-07-10 18:04:35 | Re: having temp_tablespaces on less reliable storage |