Re: Aggregate versions of hashing functions (md5, sha1, etc...)

From: Dominique Devienne <ddevienne(at)gmail(dot)com>
To: Ron Johnson <ronljohnsonjr(at)gmail(dot)com>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: Aggregate versions of hashing functions (md5, sha1, etc...)
Date: 2025-07-11 12:11:44
Message-ID: CAFCRh-9dMQC99F22VreuOF9sv7kNjqVzXvaHZQerk0aBHUyhTA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, Jul 11, 2025 at 11:00 AM Dominique Devienne <ddevienne(at)gmail(dot)com> wrote:
> The current md5() and pgcrypto.digest() functions roll the x1
> init, xN process, and x1 finish into a single call, processing a
> single bytea (or perhaps more intelligently for TOAST'ed values, the
> 2K "rows" of those in streaming-fashion, hopefully. Can a dev confirm?)

FWIW, I've [asked ChatGPT about that][1], and assuming it's right (md5
and pgcrypto.digest not leveraging the "substring-optimization" on
TOASTED bytea), that's an unfortunate lost opportunity, especially for
byteas reaching close to the 1GB limit. And again (sorry to lay it on
thick...), when required to manually chunk for sizes > 1GB, the lack
of aggregate is a bit crippling, I'm afraid.

So again, can a dev confirm what ChatGPT blurted out?

And if true, any interest in improving that for better TOAST support
for true streaming hashing for current scalar digests?

And of course, the main point of this thread, add (true streaming)
aggregate support in a future version?

Thanks, --DD

[1]: https://chatgpt.com/share/6870fe03-416c-800e-8633-a76e478a794a

In response to

Browse pgsql-general by date

  From Date Subject
Next Message gzh 2025-07-11 12:20:38 Question Regarding COPY Command Handling of Line Breaks in PostgreSQL
Previous Message Ron Johnson 2025-07-11 11:46:45 Re: Aggregate versions of hashing functions (md5, sha1, etc...)