Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [doc] remove reference to pg_dump pre-8.1 switch behaviour
Date: 2020-10-23 20:09:26
Message-ID: fd93f1c5-7818-a02c-01e5-1075ac0d4def@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 23/10/2020 17:51, Tom Lane wrote:
> But anyway, this was about documentation not code. What I'm wondering
> about is when to drop things like, say, this bit in the regex docs:
>
> Two significant incompatibilities exist between AREs and the ERE syntax
> recognized by pre-7.4 releases of <productname>PostgreSQL</productname>:
> (etc etc)
>
> Seems like we could have gotten rid of that by now, but when exactly
> does it become fair game? And can we have a non-ad-hoc process for
> getting rid of such cruft?

Let's try to zoom in on a rule:

Anything that talks about 9.4 or above (min supported version - 1)
should definitely be left in place.

Something around 9.0 is possibly still useful to someone upgrading or
updating an application. Or someone might still bump into old blog posts
from that era.

Before that, I don't see much value. Although you could argue that I
jumped the gun on the notice about pre-8.2 pg_dump -t behavior. pg_dump
still supports servers down to 8.0, so someone might also have an 8.0
pg_dump binary lying around, and might be confused that -t behaves
differently. On the whole though, I think removing it was fair game.

I did some grepping for strings like "version 7", "pre-8" and so on. I
couldn't come up with a clear rule on what could be removed. Context
matters. In text that talks about protocol versions or libpq functions
like PQlibVersion() it seems sensible to go back as far as possible, for
the completeness. And subtle user-visible differences in behavior are
more important to document than changes in internal C APIs that cause a
compiler failure, for example.

Other notices are about old syntax that's kept for backwards
compatibility, but still works. It makes sense to mention the old
version in those cases, even if it's very old, because the alternative
would be to just say something like "very old version", which is not any
shorter, just less precise.

Findings in detail follow. And attached is a patch about the stuff that
I think can be removed pretty straightforwardly.

array.sgml:
<para>
If the value written for an element is <literal>NULL</literal> (in
any case
variant), the element is taken to be NULL. The presence of any quotes
or backslashes disables this and allows the literal string value
<quote>NULL</quote> to be entered. Also, for backward compatibility
with
pre-8.2 versions of <productname>PostgreSQL</productname>, the <xref
linkend="guc-array-nulls"/> configuration parameter can be turned
<literal>off</literal> to suppress recognition of
<literal>NULL</literal> as a NULL.
</para>

The GUC still exists, so we should keep this.

catalogs.sgml:
<para>
The view <structname>pg_group</structname> exists for backwards
compatibility: it emulates a catalog that existed in
<productname>PostgreSQL</productname> before version 8.1.
It shows the names and members of all roles that are marked as not
<structfield>rolcanlogin</structfield>, which is an approximation to
the set
of roles that are being used as groups.
</para>

pg_group still exists, and that paragraph explains why. We should keep
it. (There's a similar paragraph for pg_shadow)

config.sgml (on synchronized_scans):

<para>
This allows sequential scans of large tables to synchronize
with each
other, so that concurrent scans read the same block at about the
same time and hence share the I/O workload. When this is enabled,
a scan might start in the middle of the table and then <quote>wrap
around</quote> the end to cover all rows, so as to synchronize
with the
activity of scans already in progress. This can result in
unpredictable changes in the row ordering returned by queries that
have no <literal>ORDER BY</literal> clause. Setting this
parameter to
<literal>off</literal> ensures the pre-8.3 behavior in which a
sequential
scan always starts from the beginning of the table. The default
is <literal>on</literal>.
</para>

We could remove the reference to 8.3 version. I'm inclined to keep it
though.

func.sgml (String Functions and Operators):
<note>
<para>
Before <productname>PostgreSQL</productname> 8.3, these functions
would
silently accept values of several non-string data types as well,
due to
the presence of implicit coercions from those data types to
<type>text</type>. Those coercions have been removed because they
frequently
caused surprising behaviors. However, the string concatenation
operator
(<literal>||</literal>) still accepts non-string input, so long as
at least one
input is of a string type, as shown in <xref
linkend="functions-string-sql"/>. For other cases, insert an explicit
coercion to <type>text</type> if you need to duplicate the
previous behavior.
</para>
</note>

Could remove the reference to 8.3, but the information about || still
makes sense. I'm inclined to just keep it.

func.sgml:
<note>
<para>
Before <productname>PostgreSQL</productname> 8.2, the containment
operators <literal>@&gt;</literal> and <literal>&lt;@</literal>
were respectively
called <literal>~</literal> and <literal>@</literal>. These names
are still
available, but are deprecated and will eventually be removed.
</para>
</note>

The old names are still available, so should keep this.

func.sgml:
<para>
Before <productname>PostgreSQL</productname> 8.1, the arguments of the
sequence functions were of type <type>text</type>, not
<type>regclass</type>, and
the above-described conversion from a text string to an OID value would
happen at run time during each call. For backward compatibility, this
facility still exists, but internally it is now handled as an implicit
coercion from <type>text</type> to <type>regclass</type> before the
function is
invoked.
</para>

Let's remove this.

func.sqml:
<para>
<xref linkend="array-operators-table"/> shows the specialized operators
available for array types.
In addition to those, the usual comparison operators shown in <xref
linkend="functions-comparison-op-table"/> are available for
arrays. The comparison operators compare the array contents
element-by-element, using the default B-tree comparison function for
the element data type, and sort based on the first difference.
In multidimensional arrays the elements are visited in row-major order
(last subscript varies most rapidly).
If the contents of two arrays are equal but the dimensionality is
different, the first difference in the dimensionality information
determines the sort order. (This is a change from versions of
<productname>PostgreSQL</productname> prior to 8.2: older versions
would claim
that two arrays with the same contents were equal, even if the
number of dimensions or subscript ranges were different.)
</para>

Could remove it.

<note>
<para>
There are two differences in the behavior of
<function>string_to_array</function>
from pre-9.1 versions of <productname>PostgreSQL</productname>.
First, it will return an empty (zero-element) array rather
than <literal>NULL</literal> when the input string is of zero length.
Second, if the delimiter string is <literal>NULL</literal>, the
function
splits the input into individual characters, rather than
returning <literal>NULL</literal> as before.
</para>
</note>

Feels too early to remove.

<note>
<para>
Prior to <productname>PostgreSQL</productname> 8.2, the
<literal>&lt;</literal>, <literal>&lt;=</literal>,
<literal>&gt;</literal> and <literal>&gt;=</literal>
cases were not handled per SQL specification. A comparison like
<literal>ROW(a,b) &lt; ROW(c,d)</literal>
was implemented as
<literal>a &lt; c AND b &lt; d</literal>
whereas the correct behavior is equivalent to
<literal>a &lt; c OR (a = c AND b &lt; d)</literal>.
</para>
</note>

Important incompatibility. Although very old. I'm inclined to keep it.
If we remove it, it'd still be useful to explain the new behavior.

gin.sqml:
<title>GIN Tips and Tricks</title>

<variablelist>
<varlistentry>
<term>Create vs. insert</term>
<listitem>
<para>
Insertion into a <acronym>GIN</acronym> index can be slow
due to the likelihood of many keys being inserted for each item.
So, for bulk insertions into a table it is advisable to drop the GIN
index and recreate it after finishing bulk insertion.
</para>

<para>
As of <productname>PostgreSQL</productname> 8.4, this advice is less
necessary since delayed indexing is used (see <xref
linkend="gin-fast-update"/> for details). But for very large updates
it may still be best to drop and recreate the index.
</para>
</listitem>
</varlistentry>

I think that's old enough, but the paragraph would need some
copy-editing, not just removal.

high-availability.sgml (Record-based log shipping)
<sect2 id="warm-standby-record">
<title>Record-Based Log Shipping</title>

<para>
It is also possible to implement record-based log shipping using this
alternative method, though this requires custom development, and
changes
will still only become visible to hot standby queries after a full WAL
file has been shipped.
</para>

<para>
An external program can call the
<function>pg_walfile_name_offset()</function>
function (see <xref linkend="functions-admin"/>)
to find out the file name and the exact byte offset within it of
the current end of WAL. It can then access the WAL file directly
and copy the data from the last known end of WAL through the
current end
over to the standby servers. With this approach, the window for data
loss is the polling cycle time of the copying program, which can be
very
small, and there is no wasted bandwidth from forcing partially-used
segment files to be archived. Note that the standby servers'
<varname>restore_command</varname> scripts can only deal with whole
WAL files,
so the incrementally copied data is not ordinarily made available to
the standby servers. It is of use only when the primary dies &mdash;
then the last partial WAL file is fed to the standby before allowing
it to come up. The correct implementation of this process requires
cooperation of the <varname>restore_command</varname> script with
the data
copying program.
</para>

<para>
Starting with <productname>PostgreSQL</productname> version 9.0,
you can use
streaming replication (see <xref linkend="streaming-replication"/>) to
achieve the same benefits with less effort.
</para>
</sect2>

I think we should remove this whole section. Writing your own
record-level log shipping by polling pg_walfile_name_offset() is
malpractice on modern versions, when you could use streaming replication
instead. The whole "Alternative Method for Log Shipping" section is
pretty outdated.

indexam.sgml:
<para>
As of <productname>PostgreSQL</productname> 8.4,
<function>amvacuumcleanup</function> will also be called at
completion of an
<command>ANALYZE</command> operation. In this case
<literal>stats</literal> is always
NULL and any return value will be ignored. This case can be
distinguished
by checking <literal>info-&gt;analyze_only</literal>. It is recommended
that the access method do nothing except post-insert cleanup in such a
call, and that only in an autovacuum worker process.
</para>

Let's remove the "As of PostgreSQL 8.4".

<para>
The standard installation provides all the header files needed for
client
application development as well as for server-side program
development, such as custom functions or data types written in C.
(Prior to <productname>PostgreSQL</productname> 8.0, a separate
<literal>make
install-all-headers</literal> command was needed for the latter,
but this
step has been folded into the standard install.)
</para>

Remove.

<listitem>
<para>
Interrogates the frontend/backend protocol being used.
<synopsis>
int PQprotocolVersion(const PGconn *conn);
</synopsis>
Applications might wish to use this function to determine
whether certain
features are supported. Currently, the possible values are 2 (2.0
protocol), 3 (3.0 protocol), or zero (connection bad). The
protocol version will
not change after connection startup is complete, but it could
theoretically change during a connection reset. The 3.0 protocol
will normally be used when communicating with
<productname>PostgreSQL</productname> 7.4 or later servers;
pre-7.4 servers
support only protocol 2.0. (Protocol 1.0 is obsolete and not
supported by <application>libpq</application>.)
</para>
</listitem>

Talking about old versions, even very old ones, seems appropriate for a
function like PQprotocolVersion().

libpq.sgml, on PQlibVersion():
<note>
<para>
This function appeared in <productname>PostgreSQL</productname>
version 9.1, so
it cannot be used to detect required functionality in earlier
versions, since calling it will create a link dependency
on version 9.1 or later.
</para>
</note>

Seems appropriate to keep.

libpq.sgml:
<para>
<xref linkend="libpq-PQinitSSL"/> has been present since
<productname>PostgreSQL</productname> 8.0, while <xref
linkend="libpq-PQinitOpenSSL"/>
was added in <productname>PostgreSQL</productname> 8.4, so <xref
linkend="libpq-PQinitSSL"/>
might be preferable for applications that need to work with older
versions of <application>libpq</application>.
</para>

Keep.

lobj.sgml:
<para>
<indexterm><primary>lo_creat</primary></indexterm>
The function
<synopsis>
Oid lo_creat(PGconn *conn, int mode);
</synopsis>
creates a new large object.
The return value is the OID that was assigned to the new large object,
or <symbol>InvalidOid</symbol> (zero) on failure.

<replaceable class="parameter">mode</replaceable> is unused and
ignored as of <productname>PostgreSQL</productname> 8.1; however, for
backward compatibility with earlier releases it is best to
set it to <symbol>INV_READ</symbol>, <symbol>INV_WRITE</symbol>,
or <symbol>INV_READ</symbol> <literal>|</literal>
<symbol>INV_WRITE</symbol>.
(These symbolic constants are defined
in the header file <filename>libpq/libpq-fs.h</filename>.)
</para>

We need to say something about 'mode'. Keep.

pgfreespacemap.sgml:
<note>
<para>
The interface was changed in version 8.4, to reflect the new FSM
implementation introduced in the same version.
</para>
</note>

Remove.

pgstandby.sgml:
<para>
<application>pg_standby</application> is designed to work with
<productname>PostgreSQL</productname> 8.2 and later.
</para>

IMHO we should remove pg_standby altogether. Until we get around to
that, I think we should keep that note because it gives you a hint that
it's old :-).

pgarchivecleanup.sgml:
<para>
<application>pg_archivecleanup</application> is designed to work with
<productname>PostgreSQL</productname> 8.0 and later when used as a
standalone utility,
or with <productname>PostgreSQL</productname> 9.0 and later when
used as an
archive cleanup command.
</para>

Ditto.

planstats.sgml:
<para>
The examples shown below use tables in the
<productname>PostgreSQL</productname>
regression test database.
The outputs shown are taken from version 8.3.
The behavior of earlier (or later) versions might vary.

Should refresh the outputs..

plpgsql.sgml:
<para>
When used with a
<literal>BEGIN</literal> block, <literal>EXIT</literal> passes
control to the next statement after the end of the block.
Note that a label must be used for this purpose; an unlabeled
<literal>EXIT</literal> is never considered to match a
<literal>BEGIN</literal> block. (This is a change from
pre-8.4 releases of <productname>PostgreSQL</productname>, which
would allow an unlabeled <literal>EXIT</literal> to match
a <literal>BEGIN</literal> block.)
</para>

Maybe keep for a couple more years.

protocol.sgml:
<para>
This document describes version 3.0 of the protocol, implemented in
<productname>PostgreSQL</productname> 7.4 and later. For descriptions
of the earlier protocol versions, see previous releases of the
<productname>PostgreSQL</productname> documentation. A single server
can support multiple protocol versions. The initial startup-request
message tells the server which protocol version the client is
attempting to
use. If the major version requested by the client is not supported by
the server, the connection will be rejected (for example, this would
occur
if the client requested protocol version 4.0, which does not exist as of
this writing). If the minor version requested by the client is not
supported by the server (e.g., the client requests version 3.1, but the
server supports only 3.0), the server may either reject the connection or
may respond with a NegotiateProtocolVersion message containing the
highest
minor protocol version which it supports. The client may then choose
either
to continue with the connection using the specified protocol version or
to abort the connection.
</para>

Keep.

<varlistentry>
<term>AuthenticationSCMCredential</term>
<listitem>
<para>
This response is only possible for local Unix-domain connections
on platforms that support SCM credential messages. The frontend
must issue an SCM credential message and then send a single data
byte. (The contents of the data byte are uninteresting; it's
only used to ensure that the server waits long enough to receive
the credential message.) If the credential is acceptable,
the server responds with an
AuthenticationOk, otherwise it responds with an ErrorResponse.
(This message type is only issued by pre-9.1 servers. It may
eventually be removed from the protocol specification.)
</para>
</listitem>
</varlistentry>

Keep. It's surely still referred to in client libraries.

<para>
Data of a particular data type might be transmitted in any of several
different <firstterm>formats</firstterm>. As of
<productname>PostgreSQL</productname> 7.4
the only supported formats are <quote>text</quote> and
<quote>binary</quote>,
but the protocol makes provision for future extensions. The desired
format for any value is specified by a <firstterm>format
code</firstterm>.
Clients can specify a format code for each transmitted parameter value
and for each column of a query result. Text has format code zero,
binary has format code one, and all other format codes are reserved
for future definition.
</para>

Could replace the "as of PostgreSQL 7.4" with "Currently", but it's not
much shorter.

<para>
For a <command>COPY</command> command, the tag is
<literal>COPY <replaceable>rows</replaceable></literal> where
<replaceable>rows</replaceable> is the number of rows copied.
(Note: the row count appears only in
<productname>PostgreSQL</productname> 8.2 and later.)
</para>

I think we should keep, since we mentioned earlier that the protocol
documentation is for 7.4 and later.

alter_opfamily.sgml and create_opclass.sgml:
<para>
Before <productname>PostgreSQL</productname> 8.4, the
<literal>OPERATOR</literal>
clause could include a <literal>RECHECK</literal> option. This is
no longer
supported because whether an index operator is <quote>lossy</quote>
is now
determined on-the-fly at run time. This allows efficient handling of
cases where an operator might or might not be lossy.
</para>

Keep, since the syntax is still supported (but ignored).

cluster.sgml:
<para>
The syntax
<synopsis>
CLUSTER <replaceable class="parameter">index_name</replaceable> ON
<replaceable class="parameter">table_name</replaceable>
</synopsis>
is also supported for compatibility with pre-8.3
<productname>PostgreSQL</productname>
versions.
</para>

Keep, since the syntax is still supported.

copy.sgml:
<para>
The following syntax was used before
<productname>PostgreSQL</productname>
version 9.0 and is still supported:
...
<para>
The following syntax was used before
<productname>PostgreSQL</productname>
version 7.3 and is still supported:

Keep, since the syntax is still supported.

create_function.sgml:
<para>
Before <productname>PostgreSQL</productname> version 8.3, the
<literal>SET</literal> clause was not available, and so older
functions may
contain rather complicated logic to save, set, and restore
<varname>search_path</varname>. The <literal>SET</literal> clause
is far easier
to use for this purpose.
</para>

Keep, those old functions with complicated might still exist in the wild.

create_type.sgml:
<para>
Before <productname>PostgreSQL</productname> version 8.3, the name of
a generated array type was always exactly the element type's name
with one
underscore character (<literal>_</literal>) prepended. (Type names were
therefore restricted in length to one less character than other names.)
While this is still usually the case, the array type name may vary from
this in case of maximum-length names or collisions with user type names
that begin with underscore. Writing code that depends on this
convention
is therefore deprecated. Instead, use
<structname>pg_type</structname>.<structfield>typarray</structfield>
to locate the array type
associated with a given type.
</para>

Let's keep it. We could remove the reference to 8.3, but would still
need to explain the behaviour, and I think it's easiest to explain
through its history.

create_type.sgml:
<para>
Before <productname>PostgreSQL</productname> version 8.2, the shell-type
creation syntax
<literal>CREATE TYPE <replaceable>name</replaceable></literal> did
not exist.
The way to create a new base type was to create its input function
first.
In this approach, <productname>PostgreSQL</productname> will first see
the name of the new data type as the return type of the input function.
The shell type is implicitly created in this situation, and then it
can be referenced in the definitions of the remaining I/O functions.
This approach still works, but is deprecated and might be disallowed in
some future release. Also, to avoid accidentally cluttering
the catalogs with shell types as a result of simple typos in function
definitions, a shell type will only be made this way when the input
function is written in C.
</para>

The deprecated way still works, so keep.

grant.sgml:
<para>
Since <productname>PostgreSQL</productname> 8.1, the concepts of
users and
groups have been unified into a single kind of entity called a role.
It is therefore no longer necessary to use the keyword
<literal>GROUP</literal>
to identify whether a grantee is a user or a group.
<literal>GROUP</literal>
is still allowed in the command, but it is a noise word.
</para>

The GROUP keyword is still accepted, so let's keep it.

pg_config-ref.sgml:
<para>
The options <option>--docdir</option>, <option>--pkgincludedir</option>,
<option>--localedir</option>, <option>--mandir</option>,
<option>--sharedir</option>, <option>--sysconfdir</option>,
<option>--cc</option>, <option>--cppflags</option>,
<option>--cflags</option>, <option>--cflags_sl</option>,
<option>--ldflags</option>, <option>--ldflags_sl</option>,
and <option>--libs</option> were added in
<productname>PostgreSQL</productname> 8.1.
The option <option>--htmldir</option> was added in
<productname>PostgreSQL</productname> 8.4.
The option <option>--ldflags_ex</option> was added in
<productname>PostgreSQL</productname> 9.0.
</para>

Let's keep these. This could still be relevant if someone is maintaining
an extension that's backwards compatible to old versions.

pg_dumpall.sgml:
<varlistentry>
<term><option>--lock-wait-timeout=<replaceable
class="parameter">timeout</replaceable></option></term>
<listitem>
<para>
Do not wait forever to acquire shared table locks at the
beginning of
the dump. Instead, fail if unable to lock a table within the
specified
<replaceable class="parameter">timeout</replaceable>. The
timeout may be
specified in any of the formats accepted by <command>SET
statement_timeout</command>. Allowed values vary depending on
the server
version you are dumping from, but an integer number of milliseconds
is accepted by all versions since 7.3. This option is ignored when
dumping from a pre-7.3 server.
</para>
</listitem>
</varlistentry>

pg_dump no longer supports pre-8.0 versions, so this is definitely
obsolete. Remove.

psql-ref.sgml:
<listitem>
<para>
Before <productname>PostgreSQL</productname> 8.4,
<application>psql</application> allowed the
first argument of a single-letter backslash command to start
directly after the command, without intervening whitespace.
Now, some whitespace is required.
</para>
</listitem>

Keep for a few more years.

psql-ref.sgml:
<para><literal>old-ascii</literal> style uses plain
<acronym>ASCII</acronym>
characters, using the formatting style used
in <productname>PostgreSQL</productname> 8.4 and earlier.
Newlines in data are shown using a <literal>:</literal>
symbol in place of the left-hand column separator.
When the data is wrapped from one line
to the next without a newline character, a <literal>;</literal>
symbol is used in place of the left-hand column separator.
</para>

Keep, as long as we keep the format.

<note>
<para>
Before <productname>PostgreSQL</productname> 8.2, the
<literal>.*</literal> syntax was not expanded in row constructors, so
that writing <literal>ROW(t.*, 42)</literal> created a two-field
row whose first
field was another row value. The new behavior is usually more useful.
If you need the old behavior of nested row values, write the inner
row value without <literal>.*</literal>, for instance
<literal>ROW(t, 42)</literal>.
</para>
</note>

I'm inclined to keep this, someone might still need that behaviour, not
necessary for backwards-compatibility but because you might want to do
that in an application. Or rewrite without the reference to 8.2.

<para>
For comparison, the <productname>PostgreSQL</productname> 8.1
documentation
contained 10,441 unique words, a total of 335,420 words, and the most
frequent word <quote>postgresql</quote> was mentioned 6,127 times in 655
documents.
</para>

<!-- TODO we need to put a date on these numbers? -->
<para>
Another example &mdash; the <productname>PostgreSQL</productname>
mailing
list archives contained 910,989 unique words with 57,491,343 lexemes in
461,020 messages.
</para>

Refresh the numbers.

<note>
<para>
In the SQL standard, there is a clear distinction between users and
roles,
and users do not automatically inherit privileges while roles do. This
behavior can be obtained in <productname>PostgreSQL</productname>
by giving
roles being used as SQL roles the <literal>INHERIT</literal>
attribute, while
giving roles being used as SQL users the
<literal>NOINHERIT</literal> attribute.
However, <productname>PostgreSQL</productname> defaults to giving
all roles
the <literal>INHERIT</literal> attribute, for backward
compatibility with pre-8.1
releases in which users always had use of permissions granted to groups
they were members of.
</para>
</note>

Keep, since that's still how it behaves.

xindex.sgml:
<note>
<para>
Prior to <productname>PostgreSQL</productname> 8.3, there was no
concept
of operator families, and so any cross-data-type operators intended
to be
used with an index had to be bound directly into the index's operator
class. While this approach still works, it is deprecated because it
makes an index's dependencies too broad, and because the planner can
handle cross-data-type comparisons more effectively when both data
types
have operators in the same operator family.
</para>
</note>

Keep, because the old method still works.

- Heikki

Attachment Content-Type Size
remove-doc-mentions-of-old-incompatibilies.patch text/x-patch 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-10-23 20:52:18 Re: new heapcheck contrib module
Previous Message Peter Geoghegan 2020-10-23 18:56:13 Re: new heapcheck contrib module