| From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
|---|---|
| To: | Lukas Fittl <lukas(at)fittl(dot)com> |
| Cc: | Jakub Wartak <jakub(dot)wartak(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: pg_plan_advice |
| Date: | 2026-04-02 16:15:09 |
| Message-ID: | CA+Tgmob87qsWa-VugofU6epuV0H5XjWZGMbQas4Q-ADKmvSyBg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Mar 31, 2026 at 10:25 PM Lukas Fittl <lukas(at)fittl(dot)com> wrote:
> To be clear, I think its okay to go ahead with merging pg_stash_advice
> without that and make it a best effort to get the file saving in too -
> but I think with the current design in this patch represents a
> reasonable solution to what we can do in terms of persistence across
> restarts in either 19 or 20.
Somewhat against my better judgement, I have attempted to put your
patch into committable shape. In the process, I rewrote pretty much
the whole thing. So here's v24, also dropping pg_collect_advice.
0001 is the pg_stash_advice patch from v23, but with a number of
changes motivated by your desire to add persistence. I split the code
into two files, because I felt that file was starting to get a little
bit large, and I didn't want to just add a whole bunch more stuff to
it. For the most part, this is just code movement, but I did make a
couple of substantive changes. First, I restricted stash names to
alphanumeric characters and underscores, basically looking like
identifier names. This is partly because I got thinking about the
escaping requirements for the persistence file, but it's also because
I realized that letting somebody name their stashes with spaces or
non-printable characters in the name or a bunch of random punctuation
was probably more confusing than useful. Second,
pgsa_set_advice_string() was previously taking a lock itself, but most
of its sister functions require the caller to do that; I changed it to
match. Third, lock is also now held when calling
pgsa_clear_advice_string(), which may not be entirely necessary, but
it seems safer and shouldn't cost anything meaningful.
0002 adds persistence. Here's a list of changes from your version:
- I changed the GUC name pg_stash_advice.save to
pg_stash_advice.persist, since it controls both whether advice is
saved automatically and also whether it's loaded automatically at
startup time.
- I added a GUC pg_stash_advice.persist_interval, so that the interval
between writes can be configured.
- Instead of a dump_requested flag, I added a pg_atomic_uint64
change_count. This avoids needing to take &pgsa_state->lock in
LW_EXCLUSIVE mode. Even leaving that aside, I don't think a Boolean is
adequate here. Your patch cleared the flag before dumping, but that
means if the act of dumping fails, you forget that it needed to be
done. If you instead clear the flag after dumping, then you don't
realize you need to do it again if any concurrent changes happen.
- I set the restart time to BGW_DEFAULT_RESTART_INTERVAL rather than
0. Restarting the worker in a tight loop is a bad plan.
- I added a function pg_start_stash_advice_worker(). You could do
something like add pg_stash_advice to session_preload_libraries, start
using it, and use this to kick off the worker. Then eventually you can
restart the server with pg_stash_advice moved to
shared_preload_libraries.
- As you had coded it, any interrupt that jostled the worker would
trigger an immediate write-out if one was pending. That behavior seems
hard to explain and document, so I made it work more like autoprewarm,
which always respects the configured interval even in case of
interrupts.
- I added a mechanism to prevent the user from manually manipulating
stashes or stash entries when persistence is enabled but before the
dump file has been reloaded. Without this, reloading the dump file
could error out if, for example, the user already managed to recreate
a stash with the same name as one that exists in the dump file.
- As you had coded it, data is inserted into the dynamic shared memory
structures as it's read from the disk file. I felt that could produce
rather odd behavior, especially in view of the lack of the lockout
mechanism mentioned in the previous point. We might partially process
the dump file and then die with some data loaded and other data not
loaded. Other backends could see the partial results. While the
lockout mechanism by itself is sufficient to prevent that, I felt
uncomfortable about relying on that completely. It means we start
consuming shared memory even before we know whether there's an error,
and continue to consume it after we've died from an error, and it also
means we have a very hard dependency on the lockout mechanism, which
really only works if there's only ever one load and save file and
loading only happens at startup time. I felt it was better to slurp
all the data into memory first, parse it completely, and then start
making changes to shared memory only if we don't find any problems, so
I made it work that way. We replace the tabs and newlines that end
fields and lines with \0 on the fly so that we can just point into
that buffer, instead of having to pstrdup() anything. (Note that, even
if we stuck with your approach of something based on pg_get_line(), it
would probably be better to use one of the other variants, e.g.
pg_get_line_buf(), to avoid allocating new buffers constantly.)
- I completely reworked the string escaping. Your pgsa_escape_string()
had a bug where the strpbrk call didn't check for \r. Also, I didn't
like the behavior of just ignoring a backslash when it was followed by
end of string or something unexpected; I felt those should be errors.
Given the decision to slurp the whole file, as mentioned in the
previous point, it also made sense to do the unescaping in place, so
that we didn't need to allocate additional memory. I particularly
didn't like the decision to sometimes allocate memory and sometimes
not. While it was economical in a sense, it meant that the memory
consumption could be very different depending on how many entries
needed escaping.
- I completely reworked the error reporting. Now, if we hit an error
parsing the file (or doing anything else), we just signal an ERROR,
and we rely on the fact that the postmaster will restart us. It's an
explicit goal not to apply incremental changes when the file overall
is not valid, which also means that we are only concerned about
reporting the first error that we detect, which also seems good for
avoiding log spam. On the other hand, the error reports are more
specific and detailed, and now all follow the same general pattern:
"syntax error in file \"%s\" line %u: problem description here". (I
was also not entirely happy with the fact that you could potentially
call fprintf() lots of times before finally reporting an error. While
you did save and restore errno around the calls to FreeFile() and
unlink(), I think it makes the code hard to reason about; e.g. what if
a later fprintf call hit a different error than an earlier one?)
- I felt it was a bit odd to install a zero length file, so I made the
persistence mechanism remove the existing file if there's nothing to
persist when it goes to write out. I am not totally sure this was the
right call.
- I added a message when saving entries symmetrical to the one you had
for loading entries, and also some DEBUG1/2 messages in case someone
needs more details.
- I added a TAP test. This isn't as comprehensive as it could be -- in
particular, it doesn't cover all the possible error cases that occur
when trying to reload data from disk. I could add that, but it would
mean stopping and restarting the server a bunch of times, and I wasn't
sure that it was a good idea to add that much overhead for a few lines
of code coverage.
- Your documentation changes still said that the stash data would be
"restored when the first session attaches after a server restart" but
that doesn't sound like something that a user will understand, and
also wasn't what actually happened since you added the background
worker. I rewrote this.
There's almost none of your code remaining at this point, but I listed
you as a co-author for 0002. I think a case could be made for Author
or for no listing. Let me know if you have an opinion. And, please
review!
--
Robert Haas
EDB: http://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v24-0002-pg_stash_advice-Allow-stashed-advice-to-be-persi.patch | application/octet-stream | 45.9 KB |
| v24-0001-Add-pg_stash_advice-contrib-module.patch | application/octet-stream | 62.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Nathan Bossart | 2026-04-02 16:17:00 | Re: vectorized CRC on ARM64 |
| Previous Message | Ashutosh Bapat | 2026-04-02 16:13:49 | Re: Shared hash table allocations |