== PostgreSQL Weekly News - November 06 2011 ==
== PostgreSQL Product News ==
ezNcrypt for Databases now supports PostgreSQL.
pgpool-II 3.0.5, a connection pooler and more, released.
RHQ 4.2, a systems management and monitoring tool that runs atop
== PostgreSQL Local ==
PGConf.DE 2011, the German-speaking PostgreSQL Conference, will
take place on November 11th in the Rheinisches Industriemuseum in
Oberhausen, Germany. The schedule is now available, and registration
The fifth edition of the Italian PostgreSQL Day (PGDay.IT 2011) will
be held on November 25, 2011 in Prato, Italy.
The Call for Papers is open for PostgreSQL Session #3, which will be
held in Paris, Feb 2nd, 2012. The deadline for proposals is the 30th
November 2011 and selected speakers will be notified by the 14th
December 2011. Proposals (in French or English) should be submitted
to call-for-paper AT postgresql-sessions DOT org.
More information at: http://www.postgresql-sessions.org/en/3/
The Call for Papers for is open for FLOSS UK, which will be held in
Edinburgh from the 20th to the 22nd March 2012. The deadline for
proposals is the 18th November 2011 and selected speakers will be
notified by the 25th November 2011. Proposals should be submitted to
postgresql2012 AT flossuk DOT org. More information at:
== PostgreSQL in the News ==
Planet PostgreSQL: http://planet.postgresql.org/
PostgreSQL Weekly News is brought to you this week by David Fetter
Submit news and announcements by Sunday at 3:00pm Pacific time.
Please send English language ones to david(at)fetter(dot)org, German language
to pwn(at)pgug(dot)de, Italian language to pwn(at)itpug(dot)org(dot) Spanish language
== Applied Patches ==
Tom Lane pushed:
- Stop btree indexscans upon reaching nulls in either direction. The
existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to
do so. Per bug #6278 from Maksym Boguk. Back-patch to 8.3, which
is as far back as the patch applies easily. Doesn't seem worth
sweating over a relatively minor performance issue in 8.2 at this
late date. (But note that this was a performance regression from
8.1 and before, so 8.2 is being left as an outlier.)
- Fix race condition with toast table access from a stale syscache
entry. If a tuple in a syscache contains an out-of-line toasted
field, and we try to fetch that field shortly after some other
transaction has committed an update or deletion of the tuple, there
is a race condition: vacuum could come along and remove the toast
tuples before we can fetch them. This leads to transient failures
like "missing chunk number 0 for toast value NNNNN in
pg_toast_2619", as seen in recent reports from Andrew Hammond and
Tim Uckun. The design idea of syscache is that access to stale
syscache entries should be prevented by relation-level locks, but
that fails for at least two cases where toasted fields are possible:
ANALYZE updates pg_statistic rows without locking out sessions that
might want to plan queries on the same table, and CREATE OR REPLACE
FUNCTION updates pg_proc rows without any meaningful lock at all.
The least risky fix seems to be an idea that Heikki suggested when
we were dealing with a related problem back in August: forcibly
detoast any out-of-line fields before putting a tuple into syscache
in the first place. This avoids the problem because at the time we
fetch the parent tuple from the catalog, we should be holding an
MVCC snapshot that will prevent removal of the toast tuples, even if
the parent tuple is outdated immediately after we fetch it. (Note:
I'm not convinced that this statement holds true at every instant
where we could be fetching a syscache entry at all, but it does
appear to hold true at the times where we could fetch an entry that
could have a toasted field. We will need to be a bit wary of adding
toast tables to low-level catalogs that don't have them already.)
An additional benefit is that subsequent uses of the syscache entry
should be faster, since they won't have to detoast the field.
Back-patch to all supported versions. The problem is significantly
harder to reproduce in pre-9.0 releases, because of their
willingness to flush every entry in a syscache whenever the
underlying catalog is vacuumed (cf CatalogCacheFlushRelation); but
there is still a window for trouble.
- Preserve Var location information during flatten_join_alias_vars.
This allows us to give correct syntax error pointers when
complaining about ungrouped variables in a join query with
aggregates or GROUP BY. It's pretty much irrelevant for the
planner's use of the function, though perhaps it might aid debugging
- Revert "Stop btree indexscans upon reaching nulls in either
direction." This reverts commit
048fffed55ff1d6d346130e4a6b7be434e81e82c. As pointed out by Naoya
Anzai, we need to do more work to make that idea handle end-of-index
cases, and it is looking like too much risk for a back-patch. So
bug #6278 is only going to be fixed in HEAD.
- Fix btree stop-at-nulls logic properly. As pointed out by Naoya
Anzai, my previous try at this was a few bricks shy of a load,
because I had forgotten that the initial-positioning logic might not
try to skip over nulls at the end of the index the scan will start
from. We ought to fix that, because it represents an unnecessary
inefficiency, but first let's get the scan-stop logic back to a safe
state. With this patch, we preserve the performance benefit
requested in bug #6278 for the case of scanning forward into NULLs
(in a NULLS LAST index), but the reverse case of scanning backward
across NULLs when there's no suitable initial-positioning qual is
- Avoid scanning nulls at the beginning of a btree index scan. If we
have an inequality key that constrains the other end of the index,
it doesn't directly help us in doing the initial positioning ... but
it does imply a NOT NULL constraint on the index column. If the
index stores nulls at this end, we can use the implied NOT NULL
condition for initial positioning, just as if it had been stated
explicitly. This avoids wasting time when there are a lot of nulls
in the column. This is the reverse of the examples given in bugs
#6278 and #6283, which were about failing to stop early when we
encounter nulls at the end of the indexscan.
- Fix handling of PlaceHolderVars in nestloop parameter management.
If we use a PlaceHolderVar from the outer relation in an inner
indexscan, we need to reference the PlaceHolderVar as such as the
value to be passed in from the outer relation. The previous code
effectively tried to reconstruct the PHV from its component
expression, which doesn't work since (a) the Vars therein aren't
necessarily bubbled up far enough, and (b) it would be the wrong
semantics anyway because of the possibility that the PHV is supposed
to have gone to null at some point before the current join. Point
(a) led to "variable not found in subplan target list" planner
errors, but point (b) would have led to silently wrong answers. Per
report from Roger Niederland.
- Fix inline_set_returning_function() to allow multiple OUT
parameters. inline_set_returning_function failed to distinguish
functions returning generic RECORD (which require a column list in
the RTE, as well as run-time type checking) from those with multiple
OUT parameters (which do not). This prevented inlining from
happening. Per complaint from Jay Levitt. Back-patch to 8.4 where
this capability was introduced.
- Improve comments for TSLexeme data structure. Mostly, clean up
long-ago pgindent damage.
- Fix bogus code in contrib/ tsearch dictionary examples. Both
dict_int and dict_xsyn were blithely assuming that whatever memory
palloc gives back will be pre-zeroed. This would typically work for
just about long enough to run their regression tests, and no longer
:-(. The pre-9.0 code in dict_xsyn was even lamer than that, as it
would happily give back a pointer to the result of palloc(0),
encouraging its caller to access off the end of memory. Again, this
would just barely fail to fail as long as memory contained nothing
but zeroes. Per a report from Rodrigo Hjort that code based on
these examples didn't work reliably.
- Don't assume that a tuple's header size is unchanged during
toasting. This assumption can be wrong when the toaster is passed a
raw on-disk tuple, because the tuple might pre-date an ALTER TABLE
ADD COLUMN operation that added columns without rewriting the table.
In such a case the tuple's natts value is smaller than what we
expect from the tuple descriptor, and so its t_hoff value could be
smaller too. In fact, the tuple might not have a null bitmap at
all, and yet our current opinion of it is that it contains some
trailing nulls. In such a situation, toast_insert_or_update did the
wrong thing, because to save a few lines of code it would use the
old t_hoff value as the offset where heap_fill_tuple should start
filling data. This did not leave enough room for the new nulls
bitmap, with the result that the first few bytes of data could be
overwritten with null flag bits, as in a recent report from Hubert
Depesz Lubaczewski. The particular case reported requires ALTER
TABLE ADD COLUMN followed by CREATE TABLE AS SELECT * FROM ... or
INSERT ... SELECT * FROM ..., and further requires that there be
some out-of-line toasted fields in one of the tuples to be copied;
else we'll not reach the troublesome code. The problem can only
manifest in this form in 8.4 and later, because before commit
a77eaa6a95009a3441e0d475d1980259d45da072, CREATE TABLE AS or
INSERT/SELECT wouldn't result in raw disk tuples getting passed
directly to heap_insert --- there would always have been at least a
junkfilter in between, and that would reconstitute the tuple header
with an up-to-date t_natts and hence t_hoff. But I'm backpatching
the tuptoaster change all the way anyway, because I'm not convinced
there are no older code paths that present a similar risk.
- Un-break horology regression test. Adjust ill-considered
timezone-dependent tests added in commit
8a3d33c8e6c681d512f79af4a521ee0c02befcef so that they won't fail on
DST transition days. Per all-pink buildfarm.
Magnus Hagander pushed:
- Document that multiple LDAP servers can be specified
- Pre-pad WAL files when streaming transaction log. Instead of
filling files as they appear, pre-pad the WAL files received when
streaming xlog the same way that the server does. Data is streamed
into a .partial file which is then renamed()d into palce when it's
complete, but it will always be 16MB. This also means that the
starting position for pg_receivexlog is now simply right after the
last complete segment, and we never need to deal with partial
segments there. Patch by me, review by Fujii Masao
- Properly close replication connection in pg_receivexlog
- Add missing space in comment
- Make psql \d on a sequence show the table/column owning it
- Show statistics target for columns in \d+ on a table
- Update regression tests for \d+ modification. Noted by Tom
Simon Riggs pushed:
- Split work of bgwriter between 2 processes: bgwriter and
checkpointer. bgwriter is now a much less important process,
responsible for page cleaning duties only. checkpointer is now
responsible for checkpoints and so has a key role in shutdown. Later
patches will correct doc references to the now old idea that
bgwriter performs checkpoints. Has beneficial effect on performance
at high write rates, but mainly refactoring to more easily allow
changes for power reduction by simplifying previously tortuous code
around required to allow page cleaning and checkpointing to time
slice in the same process. Patch by me, Review by Dickson Guedes
- Add new file for checkpointer.c
- Have checkpointer send stats once each processing loop. Noted by
- Comment changes to show bgwriter no longer performs checkpoints.
- Fix timing of Startup CLOG and MultiXact during Hot Standby. Patch
by me, bug report by Chris Redekop, analysis by Florian Pflug
- Start Hot Standby faster when initial snapshot is incomplete. If
the initial snapshot had overflowed then we can start whenever the
latest snapshot is empty, not overflowed or as we did already, start
when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data. Bug report by
- Remove spurious entry from missed catch while patch juggling
- Derive oldestActiveXid at correct time for Hot Standby. There was a
timing window between when oldestActiveXid was derived and when it
should have been derived that only shows itself under heavy load.
Move code around to ensure correct timing of derivation. No change
to StartupSUBTRANS() code, which is where this failed. Bug report
by Chris Redekop
- Refactor xlog.c to create src/backend/postmaster/startup.c. Startup
process now has its own dedicated file, just like all other
special/background processes. Reduces role and size of xlog.c
- Reduce checkpoints and WAL traffic on low activity database server
Previously, we skipped a checkpoint if no WAL had been written since
last checkpoint, though this does not appear in user documentation.
As of now, we skip a checkpoint until we have written at least one
enough WAL to switch the next WAL file. This greatly reduces the
level of activity and number of WAL messages generated by a very low
activity server. This is safe because the purpose of a checkpoint is
to act as a starting place for a recovery, in case of crash. This
patch maintains minimal WAL volume for replay in case of crash, thus
maintaining very low crash recovery time.
- Update more comments about checkpoints being done by bgwriter
- Improve docs for timing and skipping of checkpoints. Greg Smith
- Move user functions related to WAL into xlogfuncs.c
Bruce Momjian pushed:
- Allow pg_upgrade to upgrade an old cluster that doesn't have a
- Update pg_upgrade comment on missing 'postgres' database.
- Adjust pg_upgrade "new database skip" code, e.g. 'postgres', to more
cleanly handle old/new database mismatches.
Peter Eisentraut pushed:
- Clean up whitespace and indentation in parser and scanner files.
These are not touched by pgindent, so clean them up a bit manually.
- Add note about using GNU tar warning options for base backups
- Fix archive_command example. The given archive_command example
didn't use %p or %f, which wouldn't really work in practice.
Robert Haas pushed:
- Initialize myProcLocks queues just once, at postmaster startup. In
assert-enabled builds, we assert during the shutdown sequence that
the queues have been properly emptied, and during process startup
that we are inheriting empty queues. In non-assert enabled builds,
we just save a few cycles.
- Check the return value of getcwd(), instead of assuming success.
- Silence bogus compiler warning.
Heikki Linnakangas pushed:
- Support range data types. Selectivity estimation functions are
missing for some range type operators, which is a TODO. Jeff Davis
- Oops, forgot to fix the catversion when I committed the range types
patch. It was inadvertently changed to 201111111, which is a wrong
date. Change it to current date, and remove the comment that was
supposed to remind me to fix it before committing.
Andrew Dunstan pushed:
- Do not treat a superuser as a member of every role for HBA purposes.
This makes it possible to use reject lines with group roles. Andrew
Dunstan, reviewed by Robert Haas.
- Role membership of superusers is only by explicit membership for
HBA. Document that this rule applies to 'samerole' as well as to
named roles. Per gripe from Tom Lane.
Alvaro Herrera pushed:
- Implement a dry-run mode for isolationtester. This mode prints out
the permutations that would be run by the given spec file, in the
same format used by the permutation lines in spec files. This helps
in building new spec files. Author: Alexander Shulgin, with some
tweaks by me
- Unbreak isolationtester on Win32. I broke it in a previous commit
because I neglected to install the necessary incantations to have
getopt() work on Windows. Per red blots in buildfarm.
== Rejected Patches (for now) ==
No one was disappointed this week :-)
== Pending Patches ==
Scott Mead sent in two revisions of a patch to see some context around
<IDLE> IN TRANSACTION.
Shigeru HANADA sent in another revision of the patch to add a
Peter Eisentraut sent in another revision of the patch to enable psql
to switch automatically between normal and \x mode depending on the
width of the output.
Robert Haas sent in three revisions of a patch to drop the "=>"
notation from hstore.
Andrew Dunstan sent in another revision of the patch to add an
--exclude-table-data option to pg_dump.
KaiGai Kohei sent in two more revisions of the patch to fix certain
types of information leaks in VIEWs.
Andrew Dunstan sent in another revision of the patch to add a \setenv
command to psql.
KaiGai Kohei sent in a patch to add checks for INSERT permission on
new tables constructed by SELECT INTO or CREATE TABLE AS.
Simon Riggs and Robert Haas traded revisions of a patch to skip busy
pages during VACUUM.
Alvaro Herrera sent in another revision of the patch to add foreign
Pavan Deolasee sent in a patch to store hot members of PGPROC out of
band, a performance optimization.
Gabriele Bartolini sent in a WIP patch to allow arrays to be foreign
keys to scalar primary keys.
Tomas Vondra sent in a patch that would allow optional "cleaning" of
queries tracked in pg_stat_statements, compressing the result and
making it more readable.
Greg Smith sent in a patch adds a new function to the pageinspect
extension for measuring total free space, in either tables or indexes.
It returns the free space as a percentage, so higher numbers mean more
J Smith sent in a fix to some corner-case bugs in the unaccent module.
pgsql-announce by date
|Next:||From: Gary Carter||Date: 2011-11-08 20:16:15|
|Subject: EnterpriseDB Announces Postgres Plus Connector for Hadoop|
|Previous:||From: Heidi Farris||Date: 2011-11-03 14:35:59|
|Subject: Encrypt PostgreSQL - No Performance Impact|