PostgreSQL Weekly News - July 11, 2021

Posted on 2021-07-12 by PWN

PostgreSQL Weekly News - July 11, 2021

Person of the week

PostgreSQL Product News

pg-wrapper 1.0.0, a wrapper for PHP's pgsql extension, released.

JDBC 42.2.23 released

Ora2Pg 22.1, a tool for migrating Oracle databases to PostgreSQL, released.

credcheck 0.1.1, a password checking mechanism for plain text passwords, released

pg_builder 1.0.0 a PHP query builder for PostgreSQL, released.

PG-Strom 3.0, a PostgreSQL extension which uses GPUs and related hardware to accelerate OLAP queries, released.

Datasentinel Version 2021.05, an application that monitors and reports on, among other things, PostgreSQL, released

PostgreSQL Jobs for July

PostgreSQL in the News

Planet PostgreSQL:

PostgreSQL Weekly News is brought to you this week by David Fetter

Submit news and announcements by Sunday at 3:00pm PST8PDT to

Applied Patches

Amit Kapila pushed:

Peter Eisentraut pushed:

Dean Rasheed pushed:

  • Prevent numeric overflows in parallel numeric aggregates. Formerly various numeric aggregate functions supported parallel aggregation by having each worker convert partial aggregate values to Numeric and use numeric_send() as part of serializing their state. That's problematic, since the range of Numeric is smaller than that of NumericVar, so it's possible for it to overflow (on either side of the decimal point) in cases that would succeed in non-parallel mode. Fix by serializing NumericVars instead, to avoid the overflow risk and ensure that parallel and non-parallel modes work the same. A side benefit is that this improves the efficiency of the serialization/deserialization code, which can make a noticeable difference to performance with large numbers of parallel workers. No back-patch due to risk from changing the binary format of the aggregate serialization states, as well as lack of prior field complaints and low probability of such overflows in practice. Patch by me. Thanks to David Rowley for review and performance testing, and Ranier Vilela for an additional suggestion. Discussion:

  • Fix numeric_mul() overflow due to too many digits after decimal point. This fixes an overflow error when using the numeric * operator if the result has more than 16383 digits after the decimal point by rounding the result. Overflow errors should only occur if the result has too many digits before the decimal point. Discussion:

Tom Lane pushed:

  • Rethink blocking annotations in detach-partition-concurrently-[34]. In 741d7f104, I tried to make the reports from canceled steps come out after the pg_cancel_backend() steps, since that was the most common ordering before. However, that doesn't ensure that a canceled step doesn't report even later, as shown in a recent failure on buildfarm member idiacanthus. Rather than complicating things even more with additional annotations, let's just force the cancel's effect to be reported first. It's not that unnatural-looking. Back-patch to v14 where these test cases appeared. Report:

  • Reduce overhead of cache-clobber testing in LookupOpclassInfo(). Commit 03ffc4d6d added logic to bypass all caching behavior in LookupOpclassInfo when CLOBBER_CACHE_ALWAYS is enabled. It doesn't look like I stopped to think much about what that would cost, but recent investigation shows that the cost is enormous: it roughly doubles the time needed for cache-clobber test runs. There does seem to be value in this behavior when trying to test the opclass-cache loading logic itself, but for other purposes the cost is excessive. Hence, let's back off to doing this only when debug_invalidate_system_caches_always is at least 3; or in older branches, when CLOBBER_CACHE_RECURSIVELY is defined. While here, clean up some other minor issues in LookupOpclassInfo. Re-order the code so we aren't left with broken cache entries (leading to later core dumps) in the unlikely case that we suffer OOM while trying to allocate space for a new entry. (That seems to be my oversight in 03ffc4d6d.) Also, in >= v13, stop allocating one array entry too many. That's evidently left over from sloppy reversion in 851b14b0c. Back-patch to all supported branches, mainly to reduce the runtime of cache-clobbering buildfarm animals. Discussion:

  • Doc: add info about timestamps with fractional-minute UTC offsets. Our code has supported fractional-minute UTC offsets for ages, but there was no mention of the possibility in the main docs, and only a very indirect reference in Appendix B. Improve that. Discussion:

  • Avoid doing catalog lookups in postgres_fdw's conversion_error_callback. As in 50371df26, this is a bad idea since the callback can't really know what error is being thrown and thus whether or not it is safe to attempt catalog accesses. Rather than pushing said accesses into the mainline code where they'd usually be a waste of cycles, we can look at the query's rangetable instead. This change does mean that we'll be printing query aliases (if any were used) rather than the table or column's true name. But that doesn't seem like a bad thing: it's certainly a more useful definition in self-join cases, for instance. In any case, it seems unlikely that any applications would be depending on this detail, so it seems safe to change. Patch by me. Original complaint by Andres Freund; Bharath Rupireddy noted the connection to conversion_error_callback. Discussion:

  • Reduce the cost of planning deeply-nested views. Joel Jacobson reported that deep nesting of trivial (flattenable) views results in O(N^3) growth of planning time for N-deep nesting. It turns out that a large chunk of this cost comes from copying around the "subquery" sub-tree of each view's RTE_SUBQUERY RTE. But once we have successfully flattened the subquery, we don't need that anymore, because the planner isn't going to do anything else interesting with that RTE. We already zap the subquery pointer during setrefs.c (cf. add_rte_to_flat_rtable), but it's useless baggage earlier than that too. Clearing the pointer as soon as pull_up_simple_subquery is done with the RTE reduces the cost from O(N^3) to O(N^2); which is still not great, but it's quite a lot better. Further improvements will require rethinking of the RTE data structure, which is being considered in another thread. Patch by me; thanks to Dean Rasheed for review. Discussion:

  • Allow CustomScan providers to say whether they support projections. Previously, all CustomScan providers had to support projections, but there may be cases where this is inconvenient. Add a flag bit to say if it's supported. Important item for the release notes: this is non-backwards-compatible since the default is now to assume that CustomScan providers can't project, instead of assuming that they can. It's fail-soft, but could result in visible performance penalties due to adding unnecessary Result nodes. Sven Klemm, reviewed by Aleksander Alekseev; some cosmetic fiddling by me. Discussion:

  • Fix crash in postgres_fdw for provably-empty remote UPDATE/DELETE. In 86dc90056, I'd written find_modifytable_subplan with the assumption that if the immediate child of a ModifyTable is a Result, it must be a projecting Result with a subplan. However, if the UPDATE or DELETE has a provably-constant-false WHERE clause, that's not so: we'll generate a dummy subplan with a childless Result. Add the missing null-check so we don't crash on such cases. Per report from Alexander Pyhalov. Discussion:

  • Reject cases where a query in WITH rewrites to just NOTIFY. Since the executor can't cope with a utility statement appearing as a node of a plan tree, we can't support cases where a rewrite rule inserts a NOTIFY into an INSERT/UPDATE/DELETE command appearing in a WITH clause of a larger query. (One can imagine ways around that, but it'd be a new feature not a bug fix, and so far there's been no demand for it.) RewriteQuery checked for this, but it missed the case where the DML command rewrites to only a NOTIFY. That'd lead to crashes later on in planning. Add the missed check, and improve the level of testing of this area. Per bug #17094 from Yaoguang Chen. It's been busted since WITH was introduced, so back-patch to all supported branches. Discussion:

  • Update configure's probe for libldap to work with OpenLDAP 2.5. The separate libldap_r is gone and libldap itself is now always thread-safe. Unfortunately there seems no easy way to tell by inspection whether libldap is thread-safe, so we have to take it on faith that libldap is thread-safe if there's no libldap_r. That should be okay, as it appears that libldap_r was a standard part of the installation going back at least 20 years. Report and patch by Adrian Ho. Back-patch to all supported branches, since people might try to build any of them with a newer OpenLDAP. Discussion:

  • Avoid creating a RESULT RTE that's marked LATERAL. Commit 7266d0997 added code to pull up simple constant function results, converting the RTE_FUNCTION RTE to a dummy RTE_RESULT RTE since it no longer need be scanned. But I forgot to clear the LATERAL flag if the RTE has it set. If the function reduced to a constant, it surely contains no lateral references so this simplification is logically OK. It's needed because various other places will Assert that RESULT RTEs aren't LATERAL. Per bug #17097 from Yaoguang Chen. Back-patch to v13 where the faulty code came in. Discussion:

  • Un-break AIX build. In commit d0a02bdb8, I'd supposed that uniformly probing for ldap_bind would make the intent clearer. However, that seems not to work on AIX, for obscure reasons (maybe it's a macro there?). Revert to the former behavior of probing ldap_simple_bind for thread-safe cases and ldap_bind otherwise. Per buildfarm member hoverfly. Discussion:

  • Un-break AIX build, take 2. I incorrectly diagnosed the reason why hoverfly is unhappy. Looking closer, it appears that it fails to link libldap unless libssl is also present; so the problem was my idea of clearing LIBS before making the check. Revert to essentially the original coding, except that instead of failing when libldap_r isn't there, use libldap. Per buildfarm member hoverfly. Discussion:

  • Fix busted test for ldap_initialize. Sigh ... I was expecting AC_CHECK_LIB to do something it didn't, namely update LIBS. This led to not finding ldap_initialize. Fix by moving the probe for ldap_initialize. In some sense this is more correct anyway, since (at least for now) we care about whether ldap_initialize exists in libldap not libldap_r. Per buildfarm member elver and local testing. Discussion:

  • Lock the extension during ALTER EXTENSION ADD/DROP. Although we were careful to lock the object being added or dropped, we failed to get any sort of lock on the extension itself. This allowed the ALTER to proceed in parallel with a DROP EXTENSION, which is problematic for a couple of reasons. If both commands succeeded we'd be left with a dangling link in pg_depend, which would cause problems later. Also, if the ALTER failed for some reason, it might try to print the extension's name, and that could result in a crash or (in older branches) a silly error message complaining about extension "(null)". Per bug #17098 from Alexander Lakhin. Back-patch to all supported branches. Discussion:

Michaël Paquier pushed:

  • Use WaitLatch() instead of pg_usleep() at the end of backups. This concerns pg_stop_backup() and BASE_BACKUP, when waiting for the WAL segments required for a backup to be archived. This simplifies a bit the handling of the wait event used in this code path. Author: Bharath Rupireddy Reviewed-by: Michael Paquier, Stephen Frost Discussion:

  • Refactor SASL code with a generic interface for its mechanisms. The code of SCRAM and SASL have been tightly linked together since SCRAM exists in the core code, making hard to apprehend the addition of new SASL mechanisms, but these are by design different facilities, with SCRAM being an option for SASL. This refactors the code related to both so as the backend and the frontend use a set of callbacks for SASL mechanisms, documenting while on it what is expected by anybody adding a new SASL mechanism. The separation between both layers is neat, using two sets of callbacks for the frontend and the backend to mark the frontier between both facilities. The shape of the callbacks is now directly inspired from the routines used by SCRAM, so the code change is straight-forward, and the SASL code is moved into its own set of files. These will likely change depending on how and if new SASL mechanisms get added in the future. Author: Jacob Champion Reviewed-by: Michael Paquier Discussion:

  • Add forgotten LSN_FORMAT_ARGS() in xlogreader.c. These should have been part of 4035cd5, that introduced LZ4 support for wal_compression.

  • Add more sanity checks in SASL exchanges. The following checks are added, to make the SASL infrastructure more aware of defects when implementing new mechanisms: - Detect that no output is generated by a mechanism if an exchange fails in the backend, failing if there is a message waiting to be sent. - Handle zero-length messages in the frontend. The backend handles that already, and SCRAM would complain if sending empty messages as this is not authorized for this mechanism, but other mechanisms may want this capability (the SASL specification allows that). - Make sure that a mechanism generates a message in the middle of the exchange in the frontend. SCRAM, as implemented, respects all these requirements already, and the recent refactoring of SASL done in 9fd8557 helps in documenting that in a cleaner way. Analyzed-by: Jacob Champion Author: Michael Paquier Reviewed-by: Jacob Champion Discussion:

David Rowley pushed:

  • Reduce the number of pallocs when building partition bounds. In each of the create_*_bound() functions for LIST, RANGE and HASH partitioning, there were a large number of palloc calls which could be reduced down to a much smaller number. In each of these functions, an array was built so that we could qsort it before making the PartitionBoundInfo. For LIST and HASH partitioning, an array of pointers was allocated then each element was allocated within that array. Since the number of items of each dimension is known beforehand, we can just allocate a single chunk of memory for this. Similarly, with all partition strategies, we're able to reduce the number of allocations to build the ->datums field. This is an array of Datum pointers, but there's no need for the Datums that each element points to to be singly allocated. One big chunk will do. For RANGE partitioning, the PartitionBoundInfo->kind field can get the same treatment. We can apply the same optimizations to partition_bounds_copy(). Doing this might have a small effect on cache performance when searching for the correct partition during partition pruning or DML on a partitioned table. However, that's likely to be small and this is mostly about reducing palloc overhead. Author: Nitin Jadhav, Justin Pryzby, David Rowley Reviewed-by: Justin Pryzby, Zhihong Yu Discussion:

  • Fix typo in comment. Author: James Coleman Discussion:

  • Use a hash table to speed up NOT IN(values). Similar to 50e17ad28, which allowed hash tables to be used for IN clauses with a set of constants, here we add the same feature for NOT IN clauses. NOT IN evaluates the same as: WHERE a <> v1 AND a <> v2 AND a <> v3. Obviously, if we're using a hash table we must be exactly equivalent to that and return the same result taking into account that either side of the condition could contain a NULL. This requires a little bit of special handling to make work with the hash table version. When processing NOT IN, the ScalarArrayOpExpr's operator will be the <> operator. To be able to build and lookup a hash table we must use the <>'s negator operator. The planner checks if that exists and is hashable and sets the relevant fields in ScalarArrayOpExpr to instruct the executor to use hashing. Author: David Rowley, James Coleman Reviewed-by: James Coleman, Zhihong Yu Discussion:

  • Fix incorrect return value in pg_size_pretty(bigint). Due to how pg_size_pretty(bigint) was implemented, it's possible that when given a negative number of bytes that the returning value would not match the equivalent positive return value when given the equivalent positive number of bytes. This was due to two separate issues. 1. The function used bit shifting to convert the number of bytes into larger units. The rounding performed by bit shifting is not the same as dividing. For example -3 >> 1 = -2, but -3 / 2 = -1. These two operations are only equivalent with positive numbers. 2. The half_rounded() macro rounded towards positive infinity. This meant that negative numbers rounded towards zero and positive numbers rounded away from zero. Here we fix #1 by dividing the values instead of bit shifting. We fix #2 by adjusting the half_rounded macro always to round away from zero. Additionally, adjust the pg_size_pretty(numeric) function to be more explicit that it's using division rather than bit shifting. A casual observer might have believed bit shifting was used due to a static function being named numeric_shift_right. However, that function was calculating the divisor from the number of bits and performed division. Here we make that more clear. This change is just cosmetic and does not affect the return value of the numeric version of the function. Here we also add a set of regression tests both versions of pg_size_pretty() which test the values directly before and after the function switches to the next unit. This bug was introduced in 8a1fab36a. Prior to that negative values were always displayed in bytes. Author: Dean Rasheed, David Rowley Discussion: Backpatch-through: 9.6, where the bug was introduced.

  • Use a lookup table for units in pg_size_pretty and pg_size_bytes. We've grown 2 versions of pg_size_pretty over the years, one for BIGINT and one for NUMERIC. Both should output the same, but keeping them in sync is harder than needed due to neither function sharing a source of truth about which units to use and how to transition to the next largest unit. Here we add a static array which defines the units that we recognize and have both pg_size_pretty and pg_size_pretty_numeric use it. This will make adding any units in the future a very simple task. The table contains all information required to allow us to also modify pg_size_bytes to use the lookup table, so adjust that too. There are no behavioral changes here. Author: David Rowley Reviewed-by: Dean Rasheed, Tom Lane, David Christensen Discussion:

  • Teach pg_size_pretty and pg_size_bytes about petabytes. There was talk about adding units all the way up to yottabytes but it seems quite far-fetched that anyone would need those. Since such large units are not exactly commonplace, it seems unlikely that having pg_size_pretty outputting unit any larger than petabytes would actually be helpful to anyone. Since petabytes are on the horizon, let's just add those only. Maybe one day we'll get to add additional units, but it will likely be a while before we'll need to think beyond petabytes in regards to the size of a database. Author: David Christensen Discussion:

Álvaro Herrera pushed:

Fujii Masao pushed:

  • postgres_fdw: Tighten up allowed values for batch_size, fetch_size options. Previously the values such as '100$%$#$#', '9,223,372,' were accepted and treated as valid integers for postgres_fdw options batch_size and fetch_size. Whereas this is not the case with fdw_startup_cost and fdw_tuple_cost options for which an error is thrown. This was because endptr was not used while converting strings to integers using strtol. This commit changes the logic so that it uses parse_int function instead of strtol as it serves the purpose by returning false in case if it is unable to convert the string to integer. Note that this function also rounds off the values such as '100.456' to 100 and '100.567' or '100.678' to 101. While on this, use parse_real for fdw_startup_cost and fdw_tuple_cost options. Since parse_int and parse_real are being used for reloptions and GUCs, it is more appropriate to use in postgres_fdw rather than using strtol and strtod directly. Back-patch to v14. Author: Bharath Rupireddy Reviewed-by: Ashutosh Bapat, Tom Lane, Kyotaro Horiguchi, Fujii Masao Discussion:

  • doc: Fix description about pg_stat_statements.track_planning. This commit fixes wrong wording like "a fewer kinds" in the description about track_planning option. Back-patch to v13 where pg_stat_statements.track_planning was added. Author: Justin Pryzby Reviewed-by: Julien Rouhaud, Fujii Masao Discussion:

Daniel Gustafsson pushed:

Thomas Munro pushed:

Jeff Davis pushed:

Pending Patches

Jie Zhang sent in a patch to make libpq's PQsendFlushRequest return 0 as documented instead of false as coded.

Gilles Darold sent in another revision of a patch to add new events XACT_EVENT_COMMAND_START and SUBXACT_EVENT_COMMAND_START that can be caught in the xact callbacks when a new command is to be executed.

Ranier Vilela sent in a patch to fix a possible uninitialized variable declaration in src/backend/utils/adt/varlena.c.

Ronan Dunklau and Ranier Vilela traded patches to allow Sort nodes to use the fast "single datum" tuplesort.

Andrey V. Lepikhov sent in two more revisions of a patch to teach the optimizer to consider partitionwise join of non-partitioned tables with each partition of partitioned table, and disallow the asymmetric machinery for joining of two partitioned (or appended) relations because it could cause huge consumption of CPU and memory during reparameterization of the NestLoop path.

Victor Spirin sent in two revisions of a patch to make renames on Windows atomic.

Peter Smith sent in a patch to add more subtlety to psql's tab completion of CREATE PUBLICATION.

Amit Langote sent in another revision of a patch to export get_partition_for_tuple(), and use same to avoid using SPI for some RI checks.

Hou Zhijie sent in another revision of a patch to make it possible to annotate a table as safely allowing (or not) parallel DML, use same in making it possible to use parallel queries in INSERT ... SELECT, and add a pg_get_table_parallel_dml_safety(regclass) function.

Dilip Kumar sent in a patch to make CREATE DATABASE a WAL logged action in order to avert the checkpoints and storm of writes that made the previous implementation burdensome.

Zeng Wenjing sent in a patch to check for synchronous standbys earlier.

Vigneshwaran C sent in another revision of a patch to Identify missing publications from publisher during CREATE/ALTER SUBSCRIPTION.

Dipesh Pandit sent in a patch to mitigate the O(N^2) directory scan in the WAL archiver by maintaining the log segment number of current file which is being archived and incrementing it by '1' to get the next WAL file.

Justin Pryzby sent in a patch to add some ASSERTIONs in procarray.c, and add a new oldest-transaction-id optional argument to pg_resetwal.c.

Gilles Darold sent in three more revisions of a patch to add pushdown for CASE clauses to the PostgreSQL FDW.

Bharath Rupireddy and Peter Smith traded patches to make the parse_subscription_options function responsible for zapping the SubOpts param up-front, instead of hoping the caller will do it, and remove redundant condition checks for "supported_opts" where we already know the option must be supported.

Ajin Cherian, Amit Kapila, and Peter Smith traded patches to add support for prepared transactions to logical replication.

Vigneshwaran C sent in two more revisions of a patch to enhance error messages to include the option name in case of duplicate option errors.

Greg Nancarrow sent in another revision of a patch to add a new "client_connection" event and client connection trigger support for same.

Gurjeet Singh sent in a patch to state explicitly that --sync-only does not modify data.

Kyotaro HORIGUCHI sent in a patch to make FPI_FOR_HINT follow standard FPI emitting policy.

David Rowley sent in another revision of a patch to remove useless int64 range checks on BIGINT sequence MINVALUE/MAXVALUE values.

Bharath Rupireddy sent in another revision of a patch to improve publication error messages by being more specific about the reason an object can't be added to same.

Andy Fan sent in another revision of a patch to expand the way uniqueness is used in the planner.

Gurjeet Singh sent in a patch to issue a warning when initdb's --sync-only option is mixed with other options.

Dean Rasheed sent in another revision of a patch to fix a loss-of-precision bug and prevent some overflow errors in exponentiation on the NUMERIC type.

Bruce Momjian sent in three more revisions of a patch to fix a bug that corrupted the visibility map.

Dagfinn Ilmari Mannsåker sent in a patch to use the l*_node() family of functions where appropriate.

Li Japin and Ranier Vilela traded patches to validate slot_name in parse_subscription_options.

Yugo Nagata sent in another revision of a patch to avert some pgbench errors.

Euler Taveira de Oliveira and Greg Nancarrow traded patches to implement row filtering for logical replication.

Takamichi Osumi sent in a patch to fix a bug that manifested as failed transaction statistics to measure logical replication progress.

Fabien COELHO sent in two more revisions of a patch to replace rand48 with a better PRNG.

Masahiko Sawada sent in a patch to clarify the documentation of ALTER SUBSCRIPTION with respect to the refresh_option option.

Robert Haas sent in another revision of a patch to refactor basebackup.c

Seino Yuki sent in a patch to track statistics for materialized views.

Georgios Kokolatos sent in another revision of a patch to teach pg_receivewal to use lz4 compression.

Kyotaro HORIGUCHI sent in two revisions of a patch to be strict about rejecting that invalid numeric parameters on the command line, and complain about same in environment variables that are supposed to be numeric.

Alexander Lakhin and Michaël Paquier traded patches to fix pg_ls_dir.

Quan Zongliang sent in four revisions of a patch to fix a bug that caused the function page_header of pageinspect to return negative numbers when the blocksize is 32k.

Bharath Rupireddy sent in another revision of a patch to disambiguate error messages that use "non-negative".

Atsushi Torikoshi sent in another revision of a patch to add a function to log the untruncated query string and its plan for the query currently running on the backend with the specified process ID.

Hou Zhijie sent in another revision of a patch to add schema level support for publication.

Bertrand Drouvot sent in another revision of a patch to fix a bug that manifested as logical decoding of relation rewrite with toast does not reset toast_hash.

Georgios Kokolatos sent in a patch to test gzip compression in pg_receivewal.

Pavel Borisov sent in another revision of a patch to automatically generate partitions by LIST and HASH.

Amul Sul sent in another revision of a patch to add a RelationGetSmgr inline function.

David Rowley sent in another revision of a patch to track non-pruned partitions in RelOptInfo, and allow ordered partition scans in more cases.

Dagfinn Ilmari Mannsåker sent in a patch to add tab completion for CREATE SCHEMA to psql.

Zhihong Yu sent in a patch to shorten the test for needed columns in find_hash_columns().

Nathan Bossart sent in another revision of a patch to pre-allocate WAL segments.

Tomáš Vondra sent in another revision of a patch to make it possible to dump functions alone in pg_dump.

Peifeng Qiu sent in a patch to support kerberos authentication for postgres_fdw.

Erik Rijkers sent in a patch to include JSON operations as candidates for JIT compilation.

Fabien COELHO sent in another revision of a patch to factor out the echo code in psql.

David Rowley sent in a patch to change the name of the Result Cache node to Memoize.

Soumyadeep Chakraborty sent in another revision of a patch to display length and bounds histograms in pg_stats.

Ranier Vilela sent in a patch to protect against possible memory corruption src/backend/access/nbtree/nbtxlog.c by checking the maximum limit of array items.

Thomas Munro sent in another revision of a patch to add a PSQL_WATCH_PAGER setting for psql's \watch command.