== PostgreSQL Weekly News - December 02 2012 ==
== PostgreSQL Local ==
The FOSDEM PGDay conference that will be held before FOSDEM in
Brussels, Belgium, on Feb 1st, 2013. The CfP for both this event and
for the PG track of FOSDEM are open.
PGDay NYC 2013 will be held on March 22, 2013 in New York City. The
CfP submission deadline is January 7th, 2013 at noon eastern time.
papers AT nycpug DOT org.
== PostgreSQL in the News ==
Planet PostgreSQL: http://planet.postgresql.org/
PostgreSQL Weekly News is brought to you this week by David Fetter
Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to david(at)fetter(dot)org, German language
to pwn(at)pgug(dot)de, Italian language to pwn(at)itpug(dot)org(dot) Spanish language
== Applied Patches ==
Tom Lane pushed:
- Fix SELECT DISTINCT with index-optimized MIN/MAX on inheritance
trees. In a query such as "SELECT DISTINCT min(x) FROM tab", the
DISTINCT is pretty useless (there being only one output row), but
nonetheless it shouldn't fail. But it could fail if "tab" is an
inheritance parent, because planagg.c's code for fixing up
equivalence classes after making the index-optimized MIN/MAX
transformation wasn't prepared to find child-table versions of the
aggregate expression. The least ugly fix seems to be to add an
option to mutate_eclass_expressions() to skip child-table
equivalence class members, which aren't used anymore at this stage
of planning so it's not really necessary to fix them. Since child
members are ignored in many cases already, it seems plausible for
mutate_eclass_expressions() to have an option to ignore them too.
Per bug #7703 from Maxim Boguk. Back-patch to 9.1. Although the
same code exists before that, it cannot encounter child-table
aggregates AFAICS, because the index optimization transformation
cannot succeed on inheritance trees before 9.1 (for lack of
- Revert patch for taking fewer snapshots. This reverts commit
d573e239f03506920938bf0be56c868d9c3416da, "Take fewer snapshots".
While that seemed like a good idea at the time, it caused execution
to use a snapshot that had been acquired before locking any of the
tables mentioned in the query. This created user-visible anomalies
that were not present in any prior release of Postgres, as reported
by Tomas Vondra. While this whole area could do with a redesign
(since there are related cases that have anomalies anyway), it
doesn't seem likely that any future patch would be reasonably
back-patchable; and we don't want 9.2 to exhibit a behavior that's
subtly unlike either past or future releases. Hence, revert to
prior code while we rethink the problem.
- Add explicit casts in ilist.h's inline functions. Needed to silence
C++ errors, per report from Peter Eisentraut. Andres Freund
- Fix assorted bugs in CREATE/DROP INDEX CONCURRENTLY. Commit
8cb53654dbdb4c386369eb988062d0bbb6de725e, which introduced DROP
INDEX CONCURRENTLY, managed to break CREATE INDEX CONCURRENTLY via a
poor choice of catalog state representation. The pg_index state for
an index that's reached the final pre-drop stage was the same as the
state for an index just created by CREATE INDEX CONCURRENTLY. This
meant that the (necessary) change to make RelationGetIndexList
ignore about-to-die indexes also made it ignore freshly-created
indexes; which is catastrophic because the latter do need to be
considered in HOT-safety decisions. Failure to do so leads to
incorrect index entries and subsequently wrong results from queries
depending on the concurrently-created index. To fix, add an
additional boolean column "indislive" to pg_index, so that the
freshly-created and about-to-die states can be distinguished. (This
change obviously is only possible in HEAD. This patch will need to
be back-patched, but in 9.2 we'll use a kluge consisting of
overloading the formerly-impossible state of indisvalid = true and
indisready = false.) In addition, change CREATE/DROP INDEX
CONCURRENTLY so that the pg_index flag changes they make without
exclusive lock on the index are made via heap_inplace_update()
rather than a normal transactional update. The latter is not very
safe because moving the pg_index tuple could result in concurrent
SnapshotNow scans finding it twice or not at all, thus possibly
resulting in index corruption. This is a pre-existing bug in CREATE
INDEX CONCURRENTLY, which was copied into the DROP code. In
addition, fix various places in the code that ought to check to make
sure that the indexes they are manipulating are valid and/or ready
as appropriate. These represent bugs that have existed since 8.2,
since a failed CREATE INDEX CONCURRENTLY could leave a corrupt or
invalid index behind, and we ought not try to do anything that might
fail with such an index. Also fix RelationReloadIndexInfo to ensure
it copies all the pg_index columns that are allowed to change after
initial creation. Previously we could have been left with stale
values of some fields in an index relcache entry. It's not clear
whether this actually had any user-visible consequences, but it's at
least a bug waiting to happen. In addition, do some code and docs
review for DROP INDEX CONCURRENTLY; some cosmetic code cleanup but
mostly addition and revision of comments. This will need to be
back-patched, but in a noticeably different form, so I'm committing
it to HEAD before working on the back-patch. Problem reported by
Amit Kapila, diagnosis by Pavan Deolassee, fix by Tom Lane and
- Suppress parallel build in interfaces/ecpg/preproc/. This is to see
if it will stop intermittent build failures on buildfarm member
okapi. We know that gmake 3.82 has some problems with sometimes not
honoring dependencies in parallel builds, and it seems likely that
this is more of the same. Since the vast bulk of the work in the
preproc directory is associated with creating preproc.c and then
preproc.o, parallelism buys us hardly anything here anyway. Also,
make both this .NOTPARALLEL and the one previously added in
interfaces/ecpg/Makefile be conditional on "ifeq
($(MAKE_VERSION),3.82)". The known bug in gmake is fixed upstream
and should not be present in 3.83 and up, and there's no reason to
think it affects older releases.
- Produce a more useful error message for over-length Unix socket
paths. The length of a socket path name is constrained by the size
of struct sockaddr_un, and there's not a lot we can do about it
since that is a kernel API. However, it would be a good thing if we
produced an intelligible error message when the user specifies a
socket path that's too long --- and getaddrinfo's standard API is
too impoverished to do this in the natural way. So insert explicit
tests at the places where we construct a socket path name. Now
you'll get an error that makes sense and even tells you what the
limit is, rather than something generic like "Non-recoverable
failure in name resolution". Per trouble report from Jeremy Drake
and a fix idea from Andrew Dunstan.
- Add missing buffer lock acquisition in GetTupleForTrigger(). If we
had not been holding buffer pin continuously since the tuple was
initially fetched by the UPDATE or DELETE query, it would be
possible for VACUUM or a page-prune operation to move the tuple
while we're trying to copy it. This would result in a garbage "old"
tuple value being passed to an AFTER ROW UPDATE or AFTER ROW DELETE
trigger. The preconditions for this are somewhat improbable, and
the timing constraints are very tight; so it's not so surprising
that this hasn't been reported from the field, even though the bug
has been there a long time. Problem found by Andres Freund.
Back-patch to all active branches.
- Take buffer lock while inspecting btree index pages in
contrib/pageinspect. It's not safe to examine a shared buffer
without any lock.
- Allow adding values to an enum type created in the current
transaction. Normally it is unsafe to allow ALTER TYPE ADD VALUE in
a transaction block, because instances of the value could be added
to indexes later in the same transaction, and then they would still
be accessible even if the transaction rolls back. However, we can
allow this if the enum type itself was created in the current
transaction, because then any such indexes would have to go away
entirely on rollback. The reason for allowing this is to support
pg_upgrade's new usage of pg_restore --single-transaction: in
--binary-upgrade mode, pg_dump emits enum types as a succession of
ALTER TYPE ADD VALUE commands so that it can preserve the values'
OIDs. The support is a bit limited, so we'll leave it undocumented.
- Make sure sharedir/extension/ directory is created when needed. The
previous coding worked as long as MODULEDIR wasn't set explicitly,
because we create sharedir/$(datamoduledir) and the default value of
that is "extension". But if some other value is specified for
MODULEDIR then the installation directory needed for the control
file wasn't made. Cédric Villemain
- Prevent passing gmake's environment variables down through
pg_regress. When we do "make install" to create a temp
installation, we don't want that instance of make to try to
communicate with any instance of make that might be calling us.
This is known to cause problems if the upper make has a -jN flag,
and in principle could cause problems even without that. Unset the
relevant environment variables to prevent such issues. Andres
- Don't advance checkPoint.nextXid near the end of a checkpoint
sequence. This reverts commit
c11130690d6dca64267201a169cfb38c1adec5ef in favor of actually fixing
the problem: namely, that we should never have been modifying the
checkpoint record's nextXid at this point to begin with. The
nextXid should match the state as of the checkpoint's logical WAL
position (ie the redo point), not the state as of its physical
position. It's especially bogus to advance it in some wal_levels
and not others. In any case there is no need for the checkpoint
record to carry the same nextXid shown in the XLOG_RUNNING_XACTS
record just emitted by LogStandbySnapshot, as any replay operation
will already have adopted that value as current. This fixes bug
#7710 from Tarvi Pillessaar, and probably also explains bug #6291
from Daniel Farina, in that if a checkpoint were in progress at the
instant of XID wraparound, the epoch bump would be lost as reported.
(And, of course, these days there's at least a 50-50 chance of a
checkpoint being in progress at any given instant.) Diagnosed by me
and independently by Andres Freund. Back-patch to all branches
supporting hot standby.
- Recommend triggers, not rules, in the CREATE VIEW reference page.
We've generally recommended use of INSTEAD triggers over rules since
that feature was added; but this old text in the CREATE VIEW
reference page didn't get the memo. Noted by Thomas Kellerer.
- Update time zone data files to tzdata release 2012j. DST law
changes in Cuba, Israel, Jordan, Libya, Palestine, Western Samoa,
and portions of Brazil.
Heikki Linnakangas pushed:
- Add OpenTransientFile, with automatic cleanup at end-of-xact. Files
opened with BasicOpenFile or PathNameOpenFile are not automatically
cleaned up on error. That puts unnecessary burden on callers that
only want to keep the file open for a short time. There is
AllocateFile, but that returns a buffered FILE * stream, which in
many cases is not the nicest API to work with. So add function
called OpenTransientFile, which returns a unbuffered fd that's
cleaned up like the FILE* returned by AllocateFile(). This plugs a
few rare fd leaks in error cases: 1. copy_file() - fixed by by using
OpenTransientFile instead of BasicOpenFile 2. XLogFileInit() - fixed
by adding close() calls to the error cases. Can't use
OpenTransientFile here because the fd is supposed to persist over
transaction boundaries. 3. lo_import/lo_export - fixed by using
OpenTransientFile instead of PathNameOpenFile. In addition to
plugging those leaks, this replaces many BasicOpenFile() calls with
OpenTransientFile() that were not leaking, because the code
meticulously closed the file on error. That wasn't strictly
necessary, but IMHO it's good for robustness. The same leaks exist
in older versions, but given the rarity of the issues, I'm not
backpatching this. Not yet, anyway - it might be good to backpatch
later, after this mechanism has had some more testing in master
- If we don't have a backup-end-location, don't claim we've reached
it. This was apparently a typo, which caused recovery to think that
it immediately reached the end of backup, and allowed the database
to start up too early. Reported by Jeff Janes. Backpatch to 9.2,
where this code was introduced.
Alvaro Herrera pushed:
- Split out rmgr rm_desc functions into their own files This is
necessary (but not sufficient) to have them compilable outside of a
- Change test ExceptionalCondition to return void. Commit 81107282a
changed it in assert.c, but overlooked this other file.
Michael Meskes pushed:
- When processing nested structure pointer variables ecpg always
expected an array datatype which of course is wrong. Applied patch
by Muhammad Usama <m(dot)usama(at)gmail(dot)com> to fix this.
Robert Haas pushed:
- Basic binary heap implementation. There are probably other places
where this can be used, but for now, this just makes MergeAppend use
it, so that this code will have test coverage. There is other work
in the queue that will use this, as well. Abhijit Menon-Sen,
reviewed by Andres Freund, Robert Haas, Álvaro Herrera, Tom Lane,
Simon Riggs pushed:
- Cleanup VirtualXact at end of Hot Standby.
- Correctly init fast path fields on PGPROC
- COPY FREEZE and mark committed on fresh tables. When a relfilenode
is created in this subtransaction or a committed child transaction
and it cannot otherwise be seen by our own process, mark tuples
committed ahead of transaction commit for all COPY commands in same
transaction. If FREEZE specified on COPY and pre-conditions met then
rows will also be frozen. Both options designed to avoid revisiting
rows after commit, increasing performance of subsequent commands
after data load and upgrade. pg_restore changes later. Simon Riggs,
review comments from Heikki Linnakangas, Noah Misch and design input
from Tom Lane, Robert Haas and Kevin Grittner
- Tweak tests in COPY FREEZE
- Second tweak of COPY FREEZE
- Clarify operation of online checkpoints. Previous comments left,
but were too obscure for such an important aspect of the system.
- XidEpoch++ if wraparound during checkpoint. If wal_level =
hot_standby we update the checkpoint nextxid, though in the case
where a wraparound occurred half-way through a checkpoint we would
neglect updating the epoch also. Updating the nextxid is arguably
the wrong thing to do, but changing that may introduce subtle bugs
into hot standby startup, while updating the value doesn't cause any
known bugs yet. Minimal fix now to HEAD and backbranches, wider fix
later in HEAD. Bug reported in #6291 by Daniel Farina and slightly
differently in Cause analysis and recommended fixes from Tom Lane
and Andres Freund. Applied patch is minimal version of Andres
- Rearrange storage of data in xl_running_xacts. Previously we stored
all xids mixed together. Now we store top-level xids first,
followed by all subxids. Also skip logging any subxids if the
snapshot is suboverflowed, since there are potentially large numbers
of them and they are not useful in that case anyway. Has value in
the envisaged design for decoding of WAL. No planned effect on Hot
Standby. Andres Freund, reviewed by me
- Reduce scope of changes for COPY FREEZE. Allow support only for
freezing tuples by explicit command. Previous coding mistakenly
extended slightly beyond what was agreed as correct on -hackers. So
essentially a partial revoke of earlier work, leaving just the COPY
Magnus Hagander pushed:
- Add libpq function PQconninfo(). This allows a caller to get back
the exact conninfo array that was used to create a connection,
including parameters read from the environment. In doing this,
restructure how options are copied from the conninfo to the actual
connection. Zoltan Boszormenyi and Magnus Hagander
Andrew Dunstan pushed:
- Clean environment for pg_upgrade test. This removes exisiting PG
settings from the environment for pg_upgrade tests, just like
- Add mode where contrib installcheck runs each module in a separately
named database. Normally each module is tested in aq database named
contrib_regression, which is dropped and recreated at the beginhning
of each pg_regress run. This mode, enabled by adding
USE_MODULE_DB=1 to the make command line, runs most modules in a
database with the module name embedded in it. This will make
testing pg_upgrade on clusters with the contrib modules a lot
easier. Still to be done: adapt to the MSVC build system.
Backpatch to 9.0, which is the earliest version it is reasonably
possible to test upgrading from.
Bruce Momjian pushed:
- Move long_options structures to the top of main() functions, for
consistency. Per suggestion from Tom.
- In pg_upgrade, dump each database separately and use
--single-transaction to restore each database schema. This yields
performance improvements for databases with many tables. Also,
remove split_old_dump() as it is no longer needed.
- Split initdb.c main() code into multiple functions, for easier
- In pg_upgrade, improve status wording now that we have per-database
status output for dump/restore.
- Remove pg_restore's --single-transaction option, as it throws errors
in certain cases.
- Revert: In pg_upgrade, remove pg_restore's --single-transaction
option, as it throws errors in certain cases.
- In pg_upgrade, remove 'set -x' from test script.
Peter Eisentraut pushed:
- doc: Fix broken links to DocBook wiki
Tatsuo Ishii pushed:
- Fix psql crash while parsing SQL file whose encoding is different
from client encoding and the client encoding is not *safe* one. Such
an example is, file encoding is UTF-8 and client encoding SJIS.
Patch contributed by Jiang Guiqing.
== Rejected Patches (for now) ==
No one was disappointed this week :-)
== Pending Patches ==
Etsuro Fujita sent in another revision of the patch to add PRE- and
POST-processor options to COPY.
Alvaro Herrera and Kevin Grittner traded patches to implement foreign
Bruce Momjian sent in three more revisions of a patch to fix an
infelicity in pg_upgrade when there are many tables.
Heikki Linnakangas and Alexander Korotkov traded patches and Erik
Rijkers sent in a series of tests for adding index support to certain
kinds of regular expression searches.
Alvaro Herrera sent in another revision of a patch to create a generic
wal reading facility dubbed XLogReader.
Dimitri Fontaine sent in another revision of a patch to add event
Andres Freund sent in a patch to fix an infelicity in ilist.h with
respect to C++ compilers.
Pavel Stehule sent in two more revisions of a patch to implement
Amit Kapila sent in another revision of a patch to compute the max LSN
of data pages.
Shigeru HANADA sent in another revision of a patch to create a FDW for
Amit Kapila sent in another revision of a patch to enable changing
configuration parameters from SQL.
Tom Lane sent in another revision of a patch to refactor the flex and
Heikki Linnakangas sent in a patch to refactor the standby mode logic.
Andres Freund sent in two revisions of a patch to fix "make -jN".
Zoltan Boszormenyi sent in another revision of a patch to enable
pg_basebackup configure and start a standby.
KaiGai Kohei sent in another revision of a patch to refactor the ALTER
Bruce Momjian sent in a patch to address infelicities in pg_upgrade
for the case of large number of tables.
KaiGai Kohei sent in another revision of a patch to create
OAT_POST_ALTER object access hooks.
Pavel Stehule sent in a patch to fix a corner use case of variadic
Phil Sorber sent in another revision of a patch to create a pg_ping
Alastair Turner sent in a patch to allow checking file parameters to
psql before password prompt.
Jan Wieck sent in another revision of a patch to help fix a
performance issue by using an autovacuum truncate exclusive lock.
Andrew Dunstan sent in a patch to allow using a separate database for
contrib module testing in "make check".
Tomas Vondra sent in another revision of a patch allow people to
reduce the load stats collection puts on the system.
pgsql-announce by date
|Next:||From: Guillaume Lelarge||Date: 2012-12-06 10:10:08|
|Subject: PostgreSQL Session #5 : Call for Papers|
|Previous:||From: Daniele Varrazzo||Date: 2012-12-01 21:15:15|
|Subject: pg_repack 1.1.8 released|