Re: Add A Glossary

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Jürgen Purtz <juergen(at)purtz(dot)de>, Roger Harkavy <rogerharkavy(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Michael Paquier <michael(at)paquier(dot)xyz>
Subject: Re: Add A Glossary
Date: 2020-03-20 19:58:41
Message-ID: 20200320195841.GA13662@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On Thu, Mar 19, 2020 at 09:11:22PM -0300, Alvaro Herrera wrote:
> + <glossterm>Aggregate</glossterm>
> + <glossdef>
> + <para>
> + To combine a collection of data values into a single value, whose
> + value may not be of the same type as the original values.
> + <glossterm>Aggregate</glossterm> <glossterm>Functions</glossterm>
> + combine multiple <glossterm>Rows</glossterm> that share a common set
> + of values into one <glossterm>Row</glossterm>, which means that the
> + only data visible in the values in common, and the aggregates of the

IS the values in common ?
(or, "is the shared values")

> + <glossterm>Analytic</glossterm>
> + <glossdef>
> + <para>
> + A <glossterm>Function</glossterm> whose computed value can reference
> + values found in nearby <glossterm>Rows</glossterm> of the same
> + <glossterm>Result Set</glossterm>.

> + <glossterm>Archiver</glossterm>

Can you change that to archiver process ?

> + <glossterm>Atomic</glossterm>
..
> + <para>
> + In reference to an operation: An event that cannot be completed in
> + part: it must either entirely succeed or entirely fail. A series of

Can you say: "an action which is not allowed to partially succed and then fail,
..."

> + <glossterm>Autovacuum</glossterm>

Say autovacuum process ?

> + <glossdef>
> + <para>
> + Processes that remove outdated <acronym>MVCC</acronym>

I would say "A set of processes that remove..."

> + <glossterm>Records</glossterm> of the <glossterm>Heap</glossterm> and

I'm not sure, can you say "tuples" ?

> + <glossterm>Backend Process</glossterm>
> + <glossdef>
> + <para>
> + Processes of an <glossterm>Instance</glossterm> which act on behalf of

Say DATABASE instance

> + <glossterm>Backend Server</glossterm>
> + <glossdef>
> + <para>
> + See <glossterm>Instance</glossterm>.
same

> + <glossterm>Background Worker</glossterm>
> + <glossdef>
> + <para>
> + Individual processes within an <glossterm>Instance</glossterm>, which
same

> + run system- or user-supplied code. Typical use cases are processes
> + which handle parts of an <acronym>SQL</acronym> query to take
> + advantage of parallel execution on servers with multiple
> + <acronym>CPUs</acronym>.

I would say "A typical use case is"

> + <glossterm>Background Writer</glossterm>

Add "process" ?

> + <glossdef>
> + <para>
> + Writes continuously dirty pages from <glossterm>Shared

Say "Continuously writes"

> + Memory</glossterm> to the file system. It starts periodically, but

Hm, maybe "wakes up periodically"

> + <glossterm>Cast</glossterm>
> + <glossdef>
> + <para>
> + A conversion of a <glossterm>Datum</glossterm> from its current data
> + type to another data type.

maybe just say
A conversion of a <glossterm>Datum</glossterm> another data type.

> + <glossterm>Catalog</glossterm>
> + <glossdef>
> + <para>
> + The <acronym>SQL</acronym> standard uses this standalone term to
> + indicate what is called a <glossterm>Database</glossterm> in
> + <productname>PostgreSQL</productname>'s terminology.

Maybe remove "standalone" ?

> + <glossterm>Checkpointer</glossterm>

Process

> + A process that writes dirty pages and <glossterm>WAL
> + Records</glossterm> to the file system and creates a special

Does the chckpointer actually write WAL ?

> + checkpoint record. This process is initiated when predefined
> + conditions are met, such as a specified amount of time has passed, or
> + a certain volume of records have been collected.

collected or written?

I would say:
> + A checkpoint is usually initiated by
> + a specified amount of time having passed, or
> + a certain volume of records having been written.

> + <glossterm>Checkpoint</glossterm>
> + <glossdef>
> + <para>
> + A <link linkend="sql-checkpoint"> Checkpoint</link> is a point in time

Extra space

> + <glossentry id="glossary-connection">
> + <glossterm>Connection</glossterm>
> + <glossdef>
> + <para>
> + A <acronym>TCP/IP</acronym> or socket line for inter-process

I don't know if I've ever heard the phase "socket line"
I guess you mean a unix socket.

> + <glossterm>Constraint</glossterm>
> + <glossdef>
> + <para>
> + A concept of restricting the values of data allowed within a
> + <glossterm>Table</glossterm>.

Just say: "A restriction on the values..."?

> + <glossterm>Data Area</glossterm>

Remove this ? I've never heard this phrase before.

> + <glossdef>
> + <para>
> + The base directory on the filesystem of a
> + <glossterm>Server</glossterm> that contains all data files and
> + subdirectories associated with a <glossterm>Cluster</glossterm> with
> + the exception of tablespaces. The environment variable

Should add an entry for "tablespace".

> + <glossterm>Datum</glossterm>
> + <glossdef>
> + <para>
> + The internal representation of a <acronym>SQL</acronym> data type.

I'm not sure if should use "a SQL" or "an SQL", but not both.

> + <glossterm>Delete</glossterm>
> + <glossdef>
> + <para>
> + A <acronym>SQL</acronym> command whose purpose is to remove

just say "which removes"

> + <glossentry id="glossary-file-segment">
> + <glossterm>File Segment</glossterm>
> + <glossdef>
> + <para>
> + If a heap or index file grows in size over 1 GB, it will be split

1GB is the default "segment size", which you should define.

> + <glossentry id="glossary-foreign-data-wrapper">
> + <glossterm>Foreign Data Wrapper</glossterm>
> + <glossdef>
> + <para>
> + A means of representing data that is not contained in the local
> + <glossterm>Database</glossterm> as if were in local
> + <glossterm>Table</glossterm>(s).

I'd say:

+ A means of representing data as a <glossterm>Table</glossterm>(s) even though
+ it is not contained in the local <glossterm>Database</glossterm>

> + <glossentry id="glossary-foreign-key">
> + <glossterm>Foreign Key</glossterm>
> + <glossdef>
> + <para>
> + A type of <glossterm>Constraint</glossterm> defined on one or more
> + <glossterm>Column</glossterm>s in a <glossterm>Table</glossterm> which
> + requires the value in those <glossterm>Column</glossterm>s to uniquely
> + identify a <glossterm>Row</glossterm> in the specified
> + <glossterm>Table</glossterm>.

An FK doesn't require the values in its table to be unique, right ?
I'd say something like: "..which enforces that the values in those Columns are
also present in an(other) table."
Reference Referential Integrity?

> + <glossterm>Function</glossterm>
> + <glossdef>
> + <para>
> + Any pre-defined transformation of data. Many
> + <glossterm>Functions</glossterm> are already defined within
> + <productname>PostgreSQL</productname> itself, but can also be
> + user-defined.

I would remove "pre-", since you mentioned that it can be user-defined.

> + <glossterm>Global SQL Object</glossterm>
> + <glossdef>
> + <para>
> + <!-- FIXME -->
> + Not all <glossterm>SQL Objects</glossterm> belong to a certain
> + <glossterm>Schema</glossterm>. Some belong to the complete
> + <glossterm>Database</glossterm>, or even to the complete
> + <glossterm>Cluster</glossterm>. These are referred to as
> + <glossterm>Global SQL Objects</glossterm>. Collations and Extensions
> + such as <glossterm>Foreign Data Wrappers</glossterm> reside at the
> + <glossterm>Database</glossterm> level; <glossterm>Database</glossterm>
> + names, <glossterm>Roles</glossterm>,
> + <glossterm>Tablespaces</glossterm>, <glossterm>Replication</glossterm>
> + origins, and subscriptions for logical
> + <glossterm>Replication</glossterm> at the
> + <glossterm>Cluster</glossterm> level.

I think "complete" is the wrong world.
I would say:
"An object which is not specific to a given database, but instead shared across
the entire Cluster".

> + <glossentry id="glossary-grant">
> + <glossterm>Grant</glossterm>
> + <glossdef>
> + <para>
> + A <acronym>SQL</acronym> command that is used to enable

I'd say "allow"

> + <glossentry id="glossary-heap">
> + <glossterm>Heap</glossterm>
> + <glossdef>
> + <para>
> + Contains the original values of <glossterm>Row</glossterm> attributes

I'm not sure what "original" means here ?

> + (i.e. the data). The <glossterm>Heap</glossterm> is realized within
> + <glossterm>Database</glossterm> files and mirrored in
> + <glossterm>Shared Memory</glossterm>.

I wouldn't say mirrored, and probably just remove at least the part after "and".

> + <glossentry id="glossary-host">
> + <glossterm>Host</glossterm>
> + <glossdef>
> + <para>
> + See <glossterm>Server</glossterm>.

Or client. Or proxy at some layer or other intermediate thing. Maybe just
remove this.

> + <glossentry id="glossary-index">
> + <glossterm>Index</glossterm>
> + <glossdef>
> + <para>
> + A <glossterm>Relation</glossterm> that contains data derived from a
> + <glossterm>Table</glossterm> (or <glossterm>Relation</glossterm> such
> + as a <glossterm>Materialized View</glossterm>). It's internal

Its

> + structure supports very fast retrieval of and access to the original
> + data.

> + <glossterm>Instance</glossterm>
> + <glossdef>
> + <para>
...
> + <para>
> + Many <glossterm>Instances</glossterm> can run on the same server as
> + long as they use different <acronym>IP</acronym> ports and manage

I would say "as long as their TCP/IP ports or sockets don't conflict, and manage..."

> + <glossterm>Join</glossterm>
> + <glossdef>
> + <para>
> + A technique used with <command>SELECT</command> statements for
> + correlating data in one or more <glossterm>Relations</glossterm>.

I would refer to this as a SQL keyword allowing to combine data from multiple
relations.

> + <glossterm>Lock</glossterm>
> + <glossdef>
> + <para>
> + A mechanism for one process temporarily preventing data from being
> + manipulated by any other process.

I'd say:

+ A mechanism by which a process protects simultaneous access to a resource
+ by other processes.

(I said "protects" since shared locks don't prevent all access, and it's easier
than explaining "unsafe access").

> + <glossentry id="glossary-log-file">
> + <glossterm>Log File</glossterm>
> + <glossdef>
> + <para>
> + <link linkend="logfile-maintenance">LOG files</link> contain readable
> + text lines about serious and non-serious events, e.g.: use of wrong
> + password, long-running queries, ... .

Serious and non-serious?

> + <glossterm>Log Writer</glossterm>

process

> + <glossdef>
> + <para>
> + If activated and parameterized, the

I don't know what parameterized means here

> + <link linkend="runtime-config-logging">Log Writer</link> process
> + writes information about database events into the current
> + <glossterm>Log file</glossterm>. When reaching certain time- or
> + volume-dependent criterias, he <!-- FIXME "he"? --> creates a new

I think criteria is the plural..

> + <glossterm>Log Record</glossterm>

Can we remove this ?
Couple releases ago, "pg_xlog" was renamed to pg_wal.
I'd prefer to avoid defining something called "Log Record" about WAL that's
right next to text logs.

> + <glossterm>Logged</glossterm>
> + <glossdef>
> + <para>
> + A <glossterm>Table</glossterm> is considered
> + <glossterm>Logged</glossterm> if changes to it are sent to the
> + <glossterm>WAL Log</glossterm>. By default, all regular
> + <glossterm>Tables</glossterm> are <glossterm>Logged</glossterm>. A
> + <glossterm>Table</glossterm> can be speficied as unlogged either at
> + creation time or via the <command>ALTER TABLE</command> command. The
> + primary use of unlogged <glossterm>Tables</glossterm> is for storing
> + transient work data that must be shared across processes, but with a
> + final result stored in logged <glossterm>Tables</glossterm>.
> + <glossterm>Temporary Tables</glossterm> are always unlogged.
> + </para>
> + </glossdef>
> + </glossentry>

Maybe it's be better to define "unlogged", since 1) logged is the default; and
2) it's right next to text logs.

> + <glossterm>Master</glossterm>
> + <glossdef>
> + <para>
> + When two or more <glossterm>Databases</glossterm> are linked via
> + <glossterm>Replication</glossterm>, the <glossterm>Server</glossterm>
> + that is considered the authoritative source of information is called
> + the <glossterm>Master</glossterm>.

I think it'd actually be the <<instance>> which is authoritative, in case they're
running on the same <<Server>>

> + <glossentry id="glossary-materialized">
> + <glossterm>Materialized</glossterm>
> + <glossdef>
> + <para>
> + The act of storing information rather than just the means of accessing

remove "means of" ?

> + the information. This term is used in <glossterm>Materialized
> + Views</glossterm> meaning that the data derived from the
> + <glossterm>View</glossterm> is actually stored on disk separate from

separately

> + the sources of that data. When the term
> + <glossterm>Materialized</glossterm> is used in speaking about
> + mulit-step queries, it means that the data of a given step is stored

multi

> + (in memory, but that storage may spill over onto disk).
> + </para>
> + </glossdef>
> + </glossentry>
> +
> + <glossentry id="glossary-materialized-view">
> + <glossterm>Materialized View</glossterm>
> + <glossdef>
> + <para>
> + A <glossterm>Relation</glossterm> that is defined in the same way that
> + a <glossterm>View</glossterm> is, but it stores data in the same way

change "it stores" to stores

> + <glossentry id="glossary-partition">
> + <glossterm>Partition</glossterm>
> + <glossdef>
> + <para>
> + <!-- FIXME should this use the style used in "atomic"? -->
> + a) A <glossterm>Table</glossterm> that can be queried independently by
> + its own name, but can also be queried via another

just say "on its own" or "directly"

> + <glossterm>Table</glossterm>, a partitionend

partitioned
also, put it in parens, like "via another table (a partitioned table)..."

> + <glossterm>Table</glossterm>, which is a collection of

Say "set" here since you later talk about "subsets" and sets.

> + <glossentry id="glossary-primary-key">
> + <glossterm>Primary Key</glossterm>
> + <glossdef>
> + <para>
> + A special case of <glossterm>Unique Index</glossterm> defined on a
> + <glossterm>Table</glossterm> or other <glossterm>Relation</glossterm>
> + that also guarantees that all of the <glossterm>Attributes</glossterm>
> + within the <glossterm>Primary Key</glossterm> do not have
> + <glossterm>Null</glossterm> values. As the name implies, there can be
> + only one <glossterm>Primary Key</glossterm> per
> + <glossterm>Table</glossterm>, though it is possible to have multiple
> + <glossterm>Unique Indexes</glossterm> that also have no
> + <glossterm>Null</glossterm>-capable <glossterm>Attributes</glossterm>.

I would say "multiple >>unique indexes<< on >>attributes<< defined as not
nullable.

> + <glossterm>Procedure</glossterm>
> + <glossdef>
> + <para>
> + A defined set of instructions for manipulating data within a
> + <glossterm>Database</glossterm>. <glossterm>Procedure</glossterm> can

"procedures" or "a procedure"

> + <glossterm>Record</glossterm>
> + <glossdef>
> + <para>
> + See <link linkend="sql-revoke">Tupple</link>.

Tupple is back. And again below.

> + A single <glossterm>Row</glossterm> of a <glossterm>Table</glossterm>
> + or other Relation.

I think it's commonly used to mean "an instance of a row" (in an MVCC sense),
but maybe that's too much detail for here.

> + <glossterm>Referential Integrity</glossterm>
> + <glossdef>
> + <para>
> + The means of restricting data in one <glossterm>Relation</glossterm>

A means

> + <glossentry id="glossary-relation">
> + <glossterm>Relation</glossterm>
> + <glossdef>
> + <para>
> + The generic term for all objects in a <glossterm>Database</glossterm>

"A generic term for any object in a >>database<< that has a name and..."

> + <glossentry id="glossary-result-set">
> + <glossterm>Result Set</glossterm>
> + <glossdef>
> + <para>
> + A data structure transmitted from a <glossterm>Server</glossterm> to
> + client program upon the completion of a <acronym>SQL</acronym>
> + command, usually a <command>SELECT</command> but it can be an
> + <command>INSERT</command>, <command>UPDATE</command>, or
> + <command>DELETE</command> command if the <literal>RETURNING</literal>
> + clause is specified.

I'd remove everything in that sentence after "usually".

> + <glossterm>Revoke</glossterm>
> + <glossdef>
> + <para>
> + A command to reduce access to a named set of

s/reduce/prevent/ ?

> + <glossterm>Row</glossterm>
> + <glossdef>
> + <para>
> + See <link linkend="sql-revoke">Tupple</link>.

tuple

> + <glossentry id="glossary-savepoint">
> + <glossterm>Savepoint</glossterm>
> + <glossdef>
> + <para>
> + A special mark (such as a timestamp) inside a
> + <glossterm>Transaction</glossterm>. Data modifications after this
> + point in time may be rolled back to the time of the savepoint.

I don't think "timestamp" is a useful or accurate analogy for this.

> + <glossterm>Schema</glossterm>
> + <glossdef>
> + <para>
> + A <link linkend="ddl-schemas">schema</link> is a namespace for
> + <glossterm>SQL objects</glossterm>, which all reside in the same
> + <glossterm>database</glossterm>. Each <glossterm>SQL
> + object</glossterm> must reside in exactly one
> + <glossterm>Schema</glossterm>.
> + </para>

> + <para>
> + In general, the names of <glossterm>SQL objects</glossterm> in the
> + schema are unique - even across different types of objects. The lone
> + exception is the case of <glossterm>Unique</glossterm>
> + <glossterm>Constraint</glossterm>s, in which case there
> + <emphasis>must</emphasis> be a <glossterm>Unique Index</glossterm>
> + with the same name and <glossterm>Schema</glossterm> as the
> + <glossterm>Constraint</glossterm>. There is no restriction on having
> + a name used in multiple <glossterm>Schema</glossterm>s.

I think there's some confusion. Constraints are not objects, right ?

But, constraints do have an exception (not just unique constraints, though):
the constraint is only unique on its table, not in its database/schema.

"pg_constraint_conrelid_contypid_conname_index" UNIQUE, btree (conrelid, contypid, conname) CLUSTER

> + <glossterm>Select</glossterm>
> + <glossdef>
> + <para>
> + The command used to query a <glossterm>Database</glossterm>. Normally,
> + <command>SELECT</command>s are not expected to modify the
> + <glossterm>Database</glossterm> in any way, but it is possible that
> + <glossterm>Functions</glossterm> invoked within the query could have
> + side-effects that do modify data. </para>

I think there should be references to the sql-* pages for this and others.

> + <glossentry id="glossary-serializable">
> + <glossterm>Serializable</glossterm>
> + <glossdef>
> + <para>
> + Transactions defined as <literal>SERIALIZABLE</literal> are unable to
> + see changes made within other transactions. In effect, for the
> + initializing session the entire <glossterm>Database</glossterm>
> + appears to be frozen duration such a
> + <glossterm>Transaction</glossterm>.

Do you mean "for the duration of the >>Transaction<<"

> + <glossentry id="glossary-session">
> + <glossterm>Session</glossterm>
> + <glossdef>
> + <para>
> + A <glossterm>Connection</glossterm> to the <glossterm>Database</glossterm>.
> + </para>
> + <para>
> + A description of the commands that were issued in the life cycle of a
> + particular <glossterm>Connection</glossterm> to the
> + <glossterm>Database</glossterm>.

I'm not sure what this <para> means.

> + <glossterm>Sequence</glossterm>
> + <glossdef>
> + <para>
> + <!-- sounds excessively complicated a definition -->
> + An <glossterm>Database</glossterm> object which represents the

A not An

> + mathematical concept of a numerical integral sequence. It can be
> + thought of as a <glossterm>Table</glossterm> with exactly one
> + <glossterm>Row</glossterm> and one <glossterm>Column</glossterm>. The
> + value stored is known as the current value. A
> + <glossterm>Sequence</glossterm> has a defined direction (almost always
> + increasing) and an interval step (usually 1). Whenever the
> + <literal>NEXTVAL</literal> pseudo-column of a
> + <glossterm>Sequence</glossterm> is accessed, the current value is moved
> + in the defined direction by the defined interval step, and that value

say "given interval step"

> + <glossterm>Shared Memory</glossterm>
> + <glossdef>
> + <para>
> + <acronym>RAM</acronym> which is used by the processes common to an
> + <glossterm>Instance</glossterm>. It mirrors parts of
> + <glossterm>Database</glossterm> files, provides an area for
> + <glossterm>WAL Records</glossterm>,

Do we use shared_buffers for WAL ?

> + <glossentry id="glossary-table">
> + <glossterm>Table</glossterm>
> + <glossdef>
> + <para>
> + A collection of <glossterm>Tuples</glossterm> (also known as
> + <glossterm>Rows</glossterm> or <glossterm>Records</glossterm>) having
> + a common data structure (the same number of
> + <glossterm>Attributes</glossterm>s, in the same order, having the same

Attributes has two esses.

> + name and type per position). A <glossterm>Table</glossterm> is the

I don't think you need to say here that the columns of a table all have the
same type and order.

> + <glossterm>Temporary Tables</glossterm>
> + <glossdef>
> + <para>
> + <glossterm>Table</glossterm>s that exist either for the lifetime of a
> + <glossterm>Session</glossterm> or a
> + <glossterm>Transaction</glossterm>, as defined at creation time. The

I would say "as specified at the time of its creation".

> + <glossterm>Transaction</glossterm>
> + <glossdef>
> + <para>
> + A combination of one or more commands that must act as a single

Remove "one or more"

> + <glossterm>Trigger</glossterm>
> + <glossdef>
> + <para>
> + A <glossterm>Function</glossterm> which can be defined to execute
> + whenever a certain operation (<command>INSERT</command>,
> + <command>UPDATE</command>, or <command>DELTE</command>) is applied to
> + that <glossterm>Relation</glossterm>. A <glossterm>Trigger</glossterm>

s/that/a/

> + <glossentry id="glossary-unique">
> + <glossterm>Unique</glossterm>
> + <glossdef>
> + <para>
> + The condition of having no matching values in the same

s/matching/duplicate/

> + <glossterm>Relation</glossterm>. Most often used in the concept of

s/concept/context/

> + <glossentry id="glossary-update">
> + <glossterm>Update</glossterm>
> + <glossdef>
> + <para>
> + A command used to modify <glossterm>Rows</glossterm> that already

or 'may already'

> + <glossterm>WAL File</glossterm>
...
> + <para>
> + The sequence of <glossterm>WAL Records</glossterm> in combination with
> + the sequence of <glossterm>WAL Files</glossterm> represents the

Remove "in combination with the sequence of >WAL Files<"

> + <glossentry id="glossary-wal-log">
> + <glossterm>WAL Log</glossterm>

Can you just say WAL or "write-ahead log".

> + <glossdef>
> + <para>
> + A <glossterm>WAL Record</glossterm> contains either new or changed
> + <glossterm>Heap</glossterm> or <glossterm>Index</glossterm> data or
> + information about a <command>COMMIT</command>,
> + <command>ROLLBACK</command>, <command>SAVEPOINT</command>, or
> + <glossterm>Checkpointer</glossterm> operation. WAL records use a
> + non-printabe binary format.

non-printable
Or just remove it.
Or just remove the sentence.

> + <glossterm>WAL Writer</glossterm>

process

> + <glossentry id="glossary-window-function">
> + <glossterm>Window Function</glossterm>
> + <glossdef>
> + <para>
> + A type of <glossterm>Function</glossterm> similar to an
> + <glossterm>Aggregate</glossterm> in that can derive its value from a

in that IT

> + set of <glossterm>Rows</glossterm> in a <glossterm>Result
> + Set</glossterm>, but still retaining the original source data.

--
Justin

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Jürgen Purtz 2020-03-20 22:32:17 Re: Add A Glossary
Previous Message Corey Huinker 2020-03-20 18:16:06 Re: Add A Glossary

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2020-03-20 20:30:00 Re: Internal key management system
Previous Message Pavel Stehule 2020-03-20 19:34:12 Re: SQL/JSON: functions