|From:||David Fetter <david(at)fetter(dot)org>|
|To:||PostgreSQL Announce <pgsql-announce(at)postgresql(dot)org>|
|Subject:||== PostgreSQL Weekly News - January 6, 2019 ==|
|Views:||Raw Message | Whole Thread | Download mbox|
== PostgreSQL Weekly News - January 6, 2019 ==
== PostgreSQL Local ==
FOSDEM PGDay 2019, a one day conference held before the main FOSDEM event will
be held in Brussels, Belgium, on Feb 1st, 2019.
Prague PostgreSQL Developer Day 2019 (P2D2 2019) is a two-day
conference that will be held on February 13-14, 2019 in Prague, Czech Republic.
PGConf India 2019 will be on February 13-15, 2019 in Bengaluru, Karnataka.
pgDay Paris 2019 will be held in Paris, France on March 12, 2019
at 199bis rue Saint-Martin.
PGConf APAC 2019 will be held in Singapore March 19-21, 2019.
PGDay.IT 2019 will take place May 16th and May 17th in Bologna, Italy. Both the
CfP https://2019.pgday.it/en/blog/cfp and the Call for Workshops
https://2019.pgday.it/en/blog/cfw are open until January 15, 2019.
Swiss PGDay 2019 will take place in Rapperswil (near Zurich) on June 28, 2019.
The CfP is open January 17, 2019 through April 18, 2019, and registration will
open January 17, 2019.
== PostgreSQL in the News ==
Planet PostgreSQL: http://planet.postgresql.org/
PostgreSQL Weekly News is brought to you this week by David Fetter
Submit news and announcements by Sunday at 3:00pm PST8PDT to david(at)fetter(dot)org(dot)
== Applied Patches ==
Tom Lane pushed:
- Teach eval_const_expressions to constant-fold LEAST/GREATEST expressions.
Doing this requires an assumption that the invoked btree comparison function
is immutable. We could check that explicitly, but in other places such as
contain_mutable_functions we just assume that it's true, so we may as well do
likewise here. (If the comparison function's behavior isn't immutable, the
sort order in indexes built with it would be unstable, so it seems certainly
wrong for it not to be so.) Vik Fearing Discussion:
- Support parameterized TidPaths. Up to now we've not worried much about joins
where the join key is a relation's CTID column, reasoning that storing a
table's CTIDs in some other table would be pretty useless. However, there are
use-cases for this sort of query involving self-joins, so that argument
doesn't really hold water. This patch allows generating plans for joins on
CTID that use a nestloop with inner TidScan, similar to what we might do with
an index on the join column. This is the most efficient way to join when the
outer side of the nestloop is expected to yield relatively few rows. This
change requires upgrading tidpath.c and the generated TidPaths to work with
RestrictInfos instead of bare qual clauses, but that's long-postponed
technical debt anyway. Discussion:
- Add a hash opclass for type "tid". Up to now we've not worried much about
joins where the join key is a relation's CTID column, reasoning that storing a
table's CTIDs in some other table would be pretty useless. However, there are
use-cases for this sort of query involving self-joins, so that argument
doesn't really hold water. With larger relations, a merge or hash join is
desirable. We had a btree opclass for type "tid", allowing merge joins on
CTID, but no hash opclass so that hash joins weren't possible. Add the
missing infrastructure. This also potentially enables hash aggregation on
"tid", though the use-cases for that aren't too clear. Discussion:
- Fix oversight in commit b5415e3c2187ab304390524f5ae66b4bd2c58279. While
rearranging code in tidpath.c, I overlooked the fact that we ought to check
restriction_is_securely_promotable when trying to use a join clause as a TID
qual. Since tideq itself is leakproof, this wouldn't really allow any
interesting leak AFAICT, but it still seems like we had better check it. For
consistency with the corresponding logic in indxpath.c, also check
rinfo->pseudoconstant. I'm not sure right now that it's possible for that to
be set in a join clause, but if it were, a match couldn't be made anyway.
- Update leakproofness markings on some btree comparison functions. Mark pg_lsn
and oidvector comparison functions as leakproof. Per discussion, these
clearly are leakproof so we might as well mark them so. On the other hand,
remove leakproof markings from name comparison functions other than
equal/not-equal. Now that these depend on varstr_cmp, they can't be
considered leakproof if text comparison isn't. (This was my error in commit
586b98fdf.) While at it, add some opr_sanity queries to catch cases where
related functions do not have the same volatility and leakproof markings. This
would clearly be bogus for commutator or negator pairs. In the domain of
btree comparison functions, we do have some exceptions, because text equality
is leakproof but inequality comparisons are not. That's odd on first glance
but is reasonable (for now anyway) given the much greater complexity of the
inequality code paths. Discussion:
- Ensure link commands list *.o files before LDFLAGS. It's important for link
commands to list *.o input files before -l switches for libraries, as library
code may not get pulled into the link unless referenced by an earlier
command-line entry. This is certainly necessary for static libraries (.a
style). Apparently on some platforms it is also necessary for shared
libraries, as reported by Donald Dong. We often put -l switches for
within-tree libraries into LDFLAGS, meaning that link commands that list *.o
files after LDFLAGS are hazardous. Most of our link commands got this right,
but a few did not. In particular, places that relied on gmake's default
implicit link rule failed, because that puts LDFLAGS first. Fix that by
overriding the built-in rule with our own. The implicit link rules in
src/makefiles/Makefile.* for single-.o-file shared libraries mostly got this
wrong too, so fix them. I also changed the link rules for the backend and a
couple of other places for consistency, even though they are not (currently)
at risk because they aren't adding any -l switches to LDFLAGS. Arguably, the
real problem here is that we're abusing LDFLAGS by putting -l switches in it
and we should stop doing that. But changing that would be quite invasive, so
I'm not eager to do so. Perhaps this is a candidate for back-patching, but so
far it seems that problems can only be exhibited in test code we don't
normally build, and at least some of the problems are new in HEAD anyway. So
I'll refrain for now. Donald Dong and Tom Lane Discussion:
- Don't believe MinMaxExpr is leakproof without checking. MinMaxExpr invokes the
btree comparison function for its input datatype, so it's only leakproof if
that function is. Many such functions are indeed leakproof, but others are
not, and we should not just assume that they are. Hence, adjust
contain_leaked_vars to verify the leakproofness of the referenced function
explicitly. I didn't add a regression test because it would need to depend on
some particular comparison function being leaky, and that's a moving target,
per discussion. This has been wrong all along, so back-patch to supported
branches. Discussion: https://email@example.com
- Improve ANALYZE's handling of concurrent-update scenarios. This patch changes
the rule for whether or not a tuple seen by ANALYZE should be included in its
sample. When we last touched this logic, in commit 51e1445f1, we weren't
thinking very hard about tuples being UPDATEd by a long-running concurrent
transaction. In such a case, we might see the pre-image as either LIVE or
DELETE_IN_PROGRESS depending on timing; and we might see the post-image not at
all, or as INSERT_IN_PROGRESS. Since the existing code will not sample either
DELETE_IN_PROGRESS or INSERT_IN_PROGRESS tuples, this leads to
concurrently-updated rows being omitted from the sample entirely. That's not
very helpful, and it's especially the wrong thing if the concurrent
transaction ends up rolling back. The right thing seems to be to sample
DELETE_IN_PROGRESS rows just as if they were live. This makes the "sample it"
and "count it" decisions the same, which seems good for consistency. It's
clearly the right thing if the concurrent transaction ends up rolling back; in
effect, we are sampling as though IN_PROGRESS transactions haven't happened
yet. Also, this combination of choices ensures maximum robustness against the
different combinations of whether and in which state we might see the pre- and
post-images of an update. It's slightly annoying that we end up recording
immediately-out-of-date stats in the case where the transaction does commit,
but on the other hand the stats are fine for columns that didn't change in the
update. And the alternative of sampling INSERT_IN_PROGRESS rows instead seems
like a bad idea, because then the sampling would be inconsistent with the way
rows are counted for the stats report. Per report from Mark Chambers; thanks
to Jeff Janes for diagnosing what was happening. Back-patch to all supported
- Use symbolic references for pg_language OIDs in the bootstrap data. This patch
teaches genbki.pl to replace pg_language names by OIDs in much the same way as
it already does for pg_am names etc, and converts pg_proc.dat to use such
symbolic references in the prolang column. Aside from getting rid of a few
more magic numbers in the initial catalog data, this means that Gen_fmgrtab.pl
no longer needs to read pg_language.dat, since it doesn't have to know the OID
of the "internal" language; now it's just looking for the string "internal".
No need for a catversion bump, since the contents of postgres.bki don't
actually change at all. John Naylor Discussion:
- Move the built-in conversions into the initial catalog data. Instead of
running a SQL script to create the standard conversion functions and
pg_conversion entries, put those entries into the initial data in
postgres.bki. This shaves a few percent off the runtime of initdb, and also
allows accurate comments to be attached to the conversion functions; the
previous script labeled them with machine-generated comments that were not
quite right for multi-purpose conversion functions. Also, we can get rid of
the duplicative Makefile and MSVC perl implementations of the generation code
for that SQL script. A functional change is that these pg_proc and
pg_conversion entries are now "pinned" by initdb. Leaving them unpinned was
perhaps a good thing back while the conversions feature was under development,
but there seems no valid reason for it now. Also, the conversion functions
are now marked as immutable, where before they were volatile by virtue of
lacking any explicit specification. That seems like it was just an oversight.
To avoid using magic constants in pg_conversion.dat, extend genbki.pl to allow
encoding names to be converted, much as it does for language, access method,
etc names. John Naylor Discussion:
- Support plpgsql variable names that conflict with unreserved SQL keywords. A
variable name matching a statement-introducing keyword, such as "comment" or
"update", caused parse failures if one tried to write a statement using that
keyword. Commit bb1b8f69 already addressed this scenario for the case of
variable names matching unreserved plpgsql keywords, but we didn't think about
unreserved core-grammar keywords. The same heuristic (viz, it can't be a
variable name unless the next token is assignment or '[') should work fine for
that case too, and as a bonus the code gets shorter and less duplicative. Per
bug #15555 from Feike Steenbergen. Since this hasn't been complained of
before, and is easily worked around anyway, I won't risk a back-patch.
- Fix program build rule in src/bin/scripts/Makefile. Commit 69ae9dcb4 added a
globally-visible "%: %.o" rule, but we failed to notice that
src/bin/scripts/Makefile already had such a rule. Apparently, the later
occurrence of the same rule wins in nearly all versions of gmake ... but not
in the one used by buildfarm member jacana. jacana is evidently using the
global rule, which says to link "$<", ie just the first dependency. But the
scripts makefile needs to link "$^", ie all the dependencies listed for the
target. There is, fortunately, no good reason not to use "$^" in the global
version of the rule, so we can just do that and get rid of the local version.
- Replace the data structure used for keyword lookup. Previously,
ScanKeywordLookup was passed an array of string pointers. This had some
performance deficiencies: the strings themselves might be scattered all over
the place depending on the compiler (and some quick checking shows that at
least with gcc-on-Linux, they indeed weren't reliably close together). That
led to very cache-unfriendly behavior as the binary search touched strings in
many different pages. Also, depending on the platform, the string pointers
might need to be adjusted at program start, so that they couldn't be simple
constant data. And the ScanKeyword struct had been designed with an eye to
32-bit machines originally; on 64-bit it requires 16 bytes per keyword, making
it even more cache-unfriendly. Redesign so that the keyword strings
themselves are allocated consecutively (as part of one big char-string
constant), thereby eliminating the touch-lots-of-unrelated-pages syndrome.
And get rid of the ScanKeyword array in favor of three separate arrays: uint16
offsets into the keyword array, uint16 token codes, and uint8 keyword
categories. That reduces the overhead per keyword to 5 bytes instead of 16
(even less in programs that only need one of the token codes and categories);
moreover, the binary search only touches the offsets array, further reducing
its cache footprint. This also lets us put the token codes somewhere else
than the keyword strings are, which avoids some unpleasant build dependencies.
While we're at it, wrap the data used by ScanKeywordLookup into a struct that
can be treated as an opaque type by most callers. That doesn't change things
much right now, but it will make it less painful to switch to a hash-based
lookup method, as is being discussed in the mailing list thread. Most of the
change here is associated with adding a generator script that can build the
new data structure from the same list-of-PG_KEYWORD header representation we
used before. The PG_KEYWORD lists that plpgsql and ecpg used to embed in
their scanner .c files have to be moved into headers, and the Makefiles have
to be taught to invoke the generator script. This work is also necessary if
we're to consider hash-based lookup, since the generator script is what would
be responsible for constructing a hash table. Aside from saving a few
kilobytes in each program that includes the keyword table, this seems to speed
up raw parsing (flex+bison) by a few percent. So it's worth doing even as it
stands, though we think we can gain even more with a follow-on patch to switch
to hash-based lookup. John Naylor, with further hacking by me Discussion:
Peter Eisentraut pushed:
- Change "checkpoint starting" message to use "wal". This catches up with the
recent renaming of all user-facing mentions of "xlog" to "wal". Discussion:
- Convert unaccent tests to UTF-8. This makes it easier to add new tests that
are specific to Unicode features. The files were previously in KOI8-R.
- Switch pg_regress to output unified diffs by default. Author: Christoph Berg
- unaccent: Make generate_unaccent_rules.py Python 3 compatible. Python 2 is
still supported. Author: Hugh Ranalli <hugh(at)whtc(dot)ca> Discussion:
- Make sort-test.py Python 3 compatible. Python 2 is still supported.
Álvaro Herrera pushed:
- Remove some useless code. In commit 8b08f7d4820f I added member relationId to
IndexStmt struct. I'm now not sure why; DefineIndex doesn't need it, since the
relation OID is passed as a separate argument anyway. Remove it. Also remove
a redundant assignment to the relationId argument (it wasn't redundant when
added by commit e093dcdd285, but should have been removed in commit
5f173040e3), and use relationId instead of stmt->relation when locking the
relation in the second phase of CREATE INDEX CONCURRENTLY, which is not only
confusing but it means we resolve the name twice for no reason.
- Rename macro to RELKIND_HAS_STORAGE. The original name was an unfortunate
- Don't create relfilenode for relations without storage. Some relation kinds
had relfilenode set to some non-zero value, but apparently the actual files
did not really exist because creation was prevented elsewhere. Get rid of the
phony pg_class.relfilenode values. Catversion bumped, but only because the
sanity_test check will fail if run in a system initdb'd with the previous
version. Reviewed-by: Kyotaro HORIGUCHI, Michael Paquier Discussion:
Noah Misch pushed:
- pg_regress: Promptly detect failed postmaster startup. Detect it the way
pg_ctl's wait_for_postmaster() does. When pg_regress spawned a postmaster
that failed startup, we were detecting that only with "pg_regress: postmaster
did not respond within 60 seconds". Back-patch to 9.4 (all supported
versions). Reviewed by Tom Lane. Discussion:
- Send EXTRA_INSTALL errors to install.log, not stderr. We already redirected
other temp-install stderr and all temp-install stdout in this way. Back-patch
to v10, like the next commit. Discussion:
- Process EXTRA_INSTALL serially, during the first temp-install. This closes a
race condition in "make -j check-world"; the symptom was EEXIST errors.
Back-patch to v10, before which parallel check-world had worse problems.
Michaël Paquier pushed:
- Improve comments and logs in do_pg_stop/start_backup. The function name
pg_stop_backup() has been included for ages in some log messages when stopping
the backup, which is confusing for base backups taken with the replication
protocol because this function is never called. Some other comments and
messages in this area are improved while on it. The new wording is based on
input and suggestions from several people, all listed below. Author: Michael
Paquier Reviewed-by: Peter Eisentraut, Álvaro Herrera, Tom Lane Discussion:
- Fix generation of padding message before encrypting Elgamal in pgcrypto.
fe0a0b5, which has added a stronger random source in Postgres, has introduced
a thinko when creating a padding message which gets encrypted for Elgamal.
The padding message cannot have zeros, which are replaced by random bytes.
However if pg_strong_random() failed, the message would finish by being
considered in correct shape for encryption with zeros. Author: Tom Lane
Reviewed-by: Michael Paquier Discussion:
https://firstname.lastname@example.org Backpatch-through: 10
- Remove configure switch --disable-strong-random. This removes a portion of
infrastructure introduced by fe0a0b5 to allow compilation of Postgres in
environments where no strong random source is available, meaning that there is
no linking to OpenSSL and no /dev/urandom (Windows having its own CryptoAPI).
No systems shipped this century lack /dev/urandom, and the buildfarm is
actually not testing this switch at all, so just remove it. This simplifies
particularly some backend code which included a fallback implementation using
shared memory, and removes a set of alternate regression output files from
pgcrypto. Author: Michael Paquier Reviewed-by: Tom Lane Discussion:
Bruce Momjian pushed:
- Update copyright for 2019. Backpatch-through: certain files through 9.4
== Pending Patches ==
Dilip Kumar sent in another revision of a patch to implement undo worker and
Pavel Stěhule sent in two more revisions of a patch to implement schema
Noah Misch sent in a patch to pg_regress to ensure it promptly detects failed
Thomas Munro sent in three more revisions of a patch to add parameterized
vectors and sorting/searching support, and refactor the fsync machinery to
support future SMGR implementations.
Adam Brightwell sent in another revision of a patch to fix some infelicities
between function calls and SQL inlining.
Amit Kapila sent in two revisions of a patch to implement logical decoding for
operations on zheap tables.
Nikolay Shaplov sent in three revisions of a patch to check for ctags and etags
utilities in make_ctags and make_etags, respectively.
Álvaro Herrera sent in a patch to make it possible to track the progress of
CREATE INDEX CONCURRENTLY.
Andrey Borodin sent in another revision of a patch to add a GiST verification
function for amcheck.
Andreas Karlsson sent in another revision of a patch to inline CTEs where
Nikolay Shaplov sent in another revision of a patch to replace StdRdOptions with
individual binary reloptions representation for each relation kind.
Tomáš Vondra sent in another revision of a patch to make it possible for EXPLAIN
to include mention of relevant modified GUCs.
Amit Kapila and Dilip Kumar traded patches to add an UNDO log manager, provide
access to undo log data via the buffer manager, and provide an interface for
prepare, insert, or fetch the undo records.
Nikolay Shaplov and Álvaro Herrera traded patches to add an enum relation option
Justin Pryzby sent in a patch to uses correlation statistics in costing for
Alexey Klyukin sent in two more revisions of a patch to reserve connection slots
Surafel Temesgen sent in another revision of a patch to implement FETCH FIRST
... WITH TIES.
David Rowley and Surafel Temesgen traded patches to add a pg_dump option which
allows INSERTs, if chosen, to be multi-value.
Andrey Borodin and Heikki Linnakangas traded patches to add a Physical GiST scan
in VACUUM, and delete pages during GiST VACUUM.
Peter Eisentraut sent in another revision of a patch to implement chained
transactions per the SQL standard.
Chengchao Yu sent in another revision of a patch to fix a deadlock issue in
single user mode when an I/O failure occurs.
Noah Misch sent in a patch to fix the emacs configuration for new perltidy
Pavan Deolasee sent in another revision of a patch to implement MERGE.
Peter Eisentraut sent in a patch to configure to update the python search order.
Peter Eisentraut sent in another revision of a patch to implement REINDEX
Álvaro Herrera and Fabien COELHO traded patches to pgbench to let it store
select results into variables.
Joerg Sonnenberger and Tom Lane traded patches to make and use a minimal perfect
hash function to make ScanKeyword faster and more efficient.
Peter Eisentraut sent in another revision of a patch to add a documentation link
from sslinfo to pg_stat_ssl, add tests for pg_stat_ssl system view, fix
pg_stat_ssl.clientdn, and add more columns to pg_stat_ssl.
KaiGai Kohei sent in a patch to fix an issue where add_partial_path() may remove
dominated path but still leave it in use.
Nikhil Sontakke sent in another revision of a patch to implement logical
decoding of two-phase transactions.
Mithun Cy sent in two more revisions of a patch to avoid creation of the free
space map for small tables.
Etsuro Fujita sent in another revision of a patch to the PostgreSQL FDW to fix
an oddity in costing aggregate pushdown paths.
Andrey Borodin sent in a patch to implement a radix tree for GiST vacuum.
Pavel Stěhule sent in another revision of a patch to add a unique statement ID
Tom Lane sent in another revision of a patch to get rid of empty jointrees.
Nathan Bossart sent in another revision of a patch to add some new options to
Peter Eisentraut sent in another revision of a patch to use atexit() in
pg_basebackup, initdb, and isolationtester.
|Next Message||SwissPUG Info||2019-01-07 12:30:01||Swiss PGDay 2019 - Announcement|
|Previous Message||David Fetter||2018-12-30 22:21:49||== PostgreSQL Weekly News - December 30, 2018 ==|