| From: | Lukas Fittl <lukas(at)fittl(dot)com> |
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
| Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_plan_advice |
| Date: | 2026-03-29 18:58:39 |
| Message-ID: | CAP53PkxJp=9PMzD8mKB3=4fK3Qp1iFZnA4pQSavP0nkpiNyoaA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Robert,
On Fri, Mar 27, 2026 at 8:44 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Mar 27, 2026 at 4:31 AM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
> > For the other one 0004 (pg_stash_advice) I feel its worth trying to
> > get it in, if we can figure out the remaining issues. I'll try to do
> > another pass on that tomorrow after some sleep.
>
> Let me just talk a little about the way that I see the space of
> possible designs here. Let's set aside what we do or do not have time
> to code for a moment and just think about what makes some kind of
> sense in theory. I believe we can divide this up along a few different
> axes:
>
> 1. Where is advice stored for on-the-fly retrieval? Possible answers
> include: in shared memory, in local memory, in a table, on disk. But
> realistically, I doubt that "in a table" is a realistic option. Even
> if we hard-coded a direct index lookup i.e. without going through the
> query planner, I think this would be a fairly expensive thing to do
> for every query, and if this is going to be usable on production
> systems, it has to be fast. I am absolutely certain that "on disk" is
> not a realistic option. Local memory is an option. It has the
> advantage of making server-lifespan memory leaks impossible, and the
> further advantage of avoiding contention between different backends,
> since each backend would have its own copy of the data. The big
> disadvantage is memory consumption. If we think the advice store is
> going to be small, then that might be fine, but somebody is
> potentially going to have thousands of advice strings stored,
> duplicating that across hundreds (or, gulp, thousands) of backends is
> pretty bad.
I've been pondering this particular question for the last few days
(because pg_hint_plan uses a table, so in a sense the question is, why
not do that), and I've come to the conclusion that I think your choice
of shared memory seems reasonable, especially in combination with the
stash GUC being a way to control which stash gets used.
I don't think local memory makes as much sense, because realistically
one would want their advice applied to all backends, and whilst we
could invent some synchronization mechanism, that seems more brittle
than managing shared memory.
The other problem of using a table (if we were to go that route),
besides performance overhead, is that advice would have to be tied to
individual databases where the extension was created - and for
multi-tenant use cases where you have a one-customer-per-database
setup, it'd basically be unusable. I think the combination of shared
memory and the GUC mechanism would work really well for that since you
could pick the advice stash to use for a given database using ALTER
DATABASE, without repeating the advice definitions.
It is worth noting in such a use case, one could still have a problem
with queryids being different between databases, but as of 787514b30bb
("Use relation name instead of OID in query jumbling for
RangeTblEntry") that should no longer be the case, if the
schemas/queries match.
>
> 2. What provision do we have for durability? Possible answers include:
> in a table, on disk, nothing. I went with nothing, partly on the
> theory that it gives users more flexibility. We don't really care
> where they store their query IDs and advice strings, as long as they
> have a way to feed them into the mechanism. But of course I was also
> trying to save myself implementation complexity. There are some tricky
> things about a table: as implemented, the advice stores are
> cluster-wide objects, but tables are database objects. If we're
> supposed to automatically load advice strings from a table that might
> be in another database into an in-memory store, well, we can't. That
> could be fixed by scoping stores to a specific database, which would
> be inconvenient only for users who have the same schema in multiple
> databases and would want to share stashes across DBs, which is
> probably not common. A disk file is also an option. It requires
> inventing some kind of custom format that we can generate and parse,
> which has some complexity, but reading from a table has some
> complexity too; I'm not sure which is messier.
I think a simple disk file is the way to go, similar to how
autoprewarm works with its "autoprewarm.blocks" file. Its a bit
awkward that that just sits in the main data directory, but since
pg_prewarm already does it today, I think its okay to have another
contrib module do the same. As noted I'm mainly worried about restarts
that the user didn't control, causing advice that was set to be lost.
I've attached a patch of how that could look like on top of your v23,
that copies the modified stash information to a
"pg_stash_advice.entries" file, and loads it after restarts.
> 3. How do we limit memory usage? One possible approach involves
> limiting the size of the hash table by entries or by memory
> consumption, but I don't think that's too valuable if that's all we
> do, because presumably all that's going to do is annoy people who hit
> the limit. It could make sense if we switched to a design where the
> superuser creates the stash, assigns it a memory limit, and then
> there's a way to give permission to use that stash to some particular
> user who is bound by the memory limit. In that kind of design, the
> person setting aside the memory has different and superior privileges
> to the person using it, so the cap serves a real purpose. A
> complicating factor here is that dshash doesn't seem to be well set up
> to enforce memory limits. You can do it if each dshash uses a separate
> DSA, but that's clunky, too. A completely different direction is to
> treat the in-memory component of the system as a cache, backed by a
> table or file from which cold entries are retrieved as needed. The
> problem with this - and I think it's a pretty big problem - is that
> performance will probably fall off a cliff as soon as you overflow the
> cache. I mean, it might not, if most of the requests can be satisfied
> from cache and a small percentage get fetched from cold storage, but
> if it's a case where a uniformly-accessed working set is 10% larger
> than the cache, the cache hit ratio will go down the tubes and so will
> performance.
Because the number of entries here is controlled by the user (i.e. its
not a function of the workload, but a function of how much advice you
as a user have set), I'm much less worried about memory usage, as long
as we document it clearly how to measure the amount of memory used.
We could also consider having a parameter that sets a maximum
size/number of entries, and require you to remove entries if that is
exceeded.
I agree on your concerns that a hybrid design (i.e. one that falls
back to on-disk) has a performance cliff that will be unexpected. I
don't think we need to solve for this use case of having that many
advice strings for now, as I think the main utility of pg_stash_advice
is to fix a handful of badly behaving queries through manual
intervention, not to automatically fix thousands of them.
> 4. How do we match advice strings to queries? The query ID is the
> obvious answer, but you could also think of having an option to match
> on query text, for cases when query IDs collide. You could do
> something like store a tag in the query string or in a GUC and look
> that up, instead of the query ID, but then I'm not sure why you don't
> just store the advice string instead of the tag, and then you don't
> need a hash table lookup anymore. You could do some kind of pattern
> matching on the query string rather than using the query ID, but that
> feels like it would be hard to get right, and also probably more
> expensive. There are probably other options here but I don't know what
> they are. I doubt that it makes sense from a performance standpoint to
> delegate out to a function written in a high-level language, and if
> you want to delegate to C code then you can just forget about
> pg_stash_advice and just use the add_advisor hook directly. I really
> feel like I'm probably missing some possible techniques, here. I
> wonder if other people will come up with some clever ideas.
I think for what makes sense in tree at this point, simple queryid
matching is the way to go.
I'm sure there are better ways to do matching of queries (i.e. make it
dependent on parameters in some way, be smart about significant table
size changes, etc), but we can let the community iterate on that out
of tree, since they can just build an extension like pg_stash_advice
that use the functions provided by pg_plan_advice.
> 5. What should the security model be? Right now it's as simple as can
> be: the superuser gets to decide who can use the mechanism. But also,
> not to be overlooked, an individual session can always opt out of
> automatic advice application by clearing the stash_name GUC. It
> shouldn't be possible to force wrong answers on another session even
> if you can impose arbitrary plan_advice, but there is a good chance
> that you could find a way to make their session catastrophically slow,
> so it's good that users can opt themselves out of the mechanism.
> Nonetheless, if we really want to have mutually untrusting users be
> able to use this facility, then stashes should have owners, and you
> should only be able to access a stash whose owner has authorized you
> to do so. This is all a bit awkward because there's no way for an
> extension to integrate with the built-in permissions mechanisms for
> handling e.g. dropped users, and in-memory objects are a poor fit
> anyway. Also, all of this is bound to have a performance cost: if
> every access to a stash involves permissions checking, that's going to
> add a possibly-significant amount of overhead to a code path that
> might be very hot. And, in many environments, that permissions check
> would be a complete waste of cycles. Things could maybe be simplified
> by deciding that stash access can't be delegated: a stash has one and
> only one owner, and it can affect only that user and not any other.
> That is unlike what we do for built-in objects, but it simplifies the
> permissions check to strcmp(), which is super-cheap compared to
> catalog lookups. All in all, I don't really know which way to jump
> here: what I've got right now looks almost stupidly primitive, and I
> suspect it is, but adding complexity along what seem to be the obvious
> lines isn't an obvious win, either.
I think the current trade-off is probably okay in terms of requiring a
superuser or its equivalent to grant access to create stash entries,
and allow unprivileged users to opt out of applied advice.
In practice for a good amount of our user base these days the question
will be "Does my cloud provider give me access to create stash
entries", so its maybe worth thinking about if we could also allow
pg_maintain to manage entries by default?
> 6. How do we tell users when there's a problem? Right now, the only
> available answer is to set pg_plan_advice.feedback_warnings, which I
> don't think is unreasonable. If you're using advice as intended, i.e.
> only for particular queries that really need it, then it really
> shouldn't be generating any output unless something's gone wrong, but
> I also don't think everyone is going to want this information going to
> the main log file. Somehow percolating the advice feedback back to the
> advisor that provided it -- pg_stash_advice or whatever -- feels like
> a thing that is probably worth doing. pg_stash_advice could then
> summarize the feedback and provide reports: hey, look, of the 100
> queries for which you stashed advice, 97 of them always showed /*
> matched */ for every item of advice, but the last 3 had varying
> numbers of other results. Being able to query that via SQL seems like
> it would be quite useful. I tend to think that it's almost a given
> that we're going to eventually want something like this, but the
> details aren't entirely clear to me. It could for example be developed
> as a separate extension that can work for any advisor, rather than
> being coupled to pg_stash_advice specifically -- but I'm also not
> saying that's better, and I think it might be worse. Figuring out
> exactly what we want to do here is a lot less critical than the items
> mentioned above because, no matter what we do about any of those
> things, this can always be added afterwards if and when desired. But,
> I'm mentioning it for completeness.
I think figuring out a feedback mechanism would be a good thing, but
we could definitely add that in a later release without a major design
change of pg_stash_advice, I think. And as you note,
pg_plan_advice.feedback_warnings exists as a mechanism today to
validate whether advice was applied as expected.
> If there are other design axes we should be talking about, I'm keen to
> hear them. This is just what came to mind off-hand. Of course, I'm
> also interested in everyone's views on what the right decisions are
> from among the available options (or other options they may have in
> mind).
I can't think of other angles at this point, at least not in the
context of what it makes sense to do in-tree for this release.
Thanks,
Lukas
--
Lukas Fittl
| Attachment | Content-Type | Size |
|---|---|---|
| nocfbot-0001-Make-pg_stash_advice-dump-advice-to-disk-wh.patch | application/octet-stream | 12.3 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2026-03-29 19:01:30 | Re: docs: clarify ALTER TABLE behavior on partitioned tables |
| Previous Message | Jim Jones | 2026-03-29 18:53:46 | Re: Add max_wal_replay_size connection parameter to libpq |