About Custom Aggregates, C Extensions and Memory

From: Marthin Laubscher <postgres(at)lobeshare(dot)co(dot)za>
To: <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: About Custom Aggregates, C Extensions and Memory
Date: 2025-08-15 15:01:44
Message-ID: CFBCA6B7-043D-4F16-B054-D669279550F8@lobeshare.co.za
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I’ll skip over context (until someone asks or I can showcase the results) and cut to the chase:

A custom aggregate seems the best vehicle for what I seek to implement. Given the processing involved, it’s probably best written in C.

That makes the aggregate and opaque value encoded and compressed to an internal format that allows direct equality testing and comparison. For everything else it needs to be decoded into memory, worked on and then encoded into a value as expected by the database ecosystem.

The challenge being that decoding and encoding presents a massive overhead (easily 2 orders of magnitude or more) compared to the lightning fast operations to e.g. add or remove a value from the aggregate while in memory, killing performance and limiting potential.

Naturally I’m looking for feasible options to retain and reuse accumulator values decoded in memory at least between successive calls when aggregating a set of values, and ideally, also when the aggregate is used again later in the same session or query, within reason of course.

I’ve tried but failed to gain the sufficient understanding of the life and limits of palloc/palloc0 memory in the aggregate and C extension context from the documentation to reformulate my algorithms for such environment around whatever opportunities might exist.

 

My project needs it, I cannot postpone much longer. But I need a sherpa, someone who knows the terrain, pathways and pitfalls, who’ll engage with me for a little while to alleviate my ignorance and uncertainties. If I knew in advance what all the questions were, they’d no longer be questions, but I don’t. Albeit as async and terse as you like, I need a conversation that can discard unworkable alternatives early and focus instead on whatever is compatible with the environment.

When we’re done, I’d be happy to write up and suggest the documentation updates I believe would have circumvented this cry for help.

Who’s willing to help me find my way up this mountain or turn it into a mole-hill?

Regards,

Marthin Laubscher

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2025-08-15 15:09:41 Re: index prefetching
Previous Message Sami Imseih 2025-08-15 14:17:22 shmem_startup_hook called twice on Windows