Re: Add A Glossary

From: Jürgen Purtz <juergen(at)purtz(dot)de>
To: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, pgsql-hackers(at)postgresql(dot)org, Pg Docs <pgsql-docs(at)lists(dot)postgresql(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Roger Harkavy <rogerharkavy(at)gmail(dot)com>
Subject: Re: Add A Glossary
Date: 2020-05-20 11:17:29
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-docs pgsql-hackers

On 19.05.20 08:17, Laurenz Albe wrote:
> On Mon, 2020-05-18 at 18:08 +0200, Jürgen Purtz wrote:
>> cluster/instance: PG (mainly) consists of a group of processes that commonly
>> act on shared buffers. The processes are very closely related to each other
>> and with the buffers. They exist altogether or not at all. They use a common
>> initialization file and are incarnated by one command. Everything exists
>> solely in RAM and therefor has a fluctuating nature. In summary: they build
>> a unit and this unit needs to have a name of itself. In some pages we used
>> to use the term *instance* - sometimes in extended forms: *database instance*,
>> *PG instance*, *standby instance*, *standby server instance*, *server instance*,
>> or *remote instance*. For me, the term *instance* makes sense, the extensions
>> *standby instance* and *remote instance* in their context too.
> FWIW, I feel somewhat like Alvaro on that point; I use those terms synonymously,
> perhaps distinguishing between a "started cluster" and a "stopped cluster".
> After all, "cluster" refers to "a cluster of databases", which are there, regardless
> if you start the server or not.
> The term "cluster" is unfortunate, because to most people it suggests a group of
> machines, so the term "instance" is better, but that ship has sailed long ago.
> The static part of a cluster to me is the "data directory".

cluster/instance: The different nature (static/dynamic) of what I call
"cluster" and "instance" as well as the existence of the two commands
"initdb — create a new PostgreSQL database cluster" and "pg_ctl —
initialize, start, stop, or control a PostgreSQL server" confirms me in
my opinion that we need two different terms for them. Those two terms
shall not be synonym to each other, they label distinct things. If
people prefer "data directory" instead of "cluster", this is ok for me.

There are situations where we need a single term for both of them.
"Instance and its data directory" or "Instance and its cluster" are too
wordy. In many cases we use "database server" or "server" in this sense.
Imo "Server" is too short and ambiguous. "database server", the plural
form "databases server", or the new term "cluster server", which is more
accurate, would be ok for me. (Similar to "server", the term "cluster"
is also used in many different contexts - but only outside of the PG
world; within our context "cluster" is not ambiguous.)

>> server/host: We need a term to describe the underlying hardware respectively
>> the virtual machine or container, where PG is running. I suggest to use both
>> *server* and *host*. In computer science, both have their eligibility and are
>> widely used. Everybody understands *client/server architecture* or *host* in
>> TCP/IP configuration. We cannot change such matter of course. I suggest to
>> use both depending on the context, but with the same meaning: "real hardware,
>> a container, or a virtual machine".
> On this I have a strong opinion because of my Unix mindset.
> "machine" and "host" are synonyms, and it doesn't matter to the database if they
> are virtualized or not. You can always disambiguate by adding "virtual" or "physical".
> A "server" is a piece of software that responds to client requests, never a machine.
> In my book, this is purely Windows jargon. The term "client-server architecture"
> that you quote emphasized that.
> Perhaps "machine" would be the preferable term, because "host" is more prone to
> misunderstandings (except in a networking context).
server/host: I agree that we are not interested in the question whether
there is real hardware or any virtualization container. We are even not
interested in the operating system. Our primary concern is the existence
of a port of the Internet Protocol. But is the term "server" appropriate
to name an IP-port? Additionally, "server" is used for other meanings:
a) the previously mentioned "database server" b) a (virtual) machine:
"server-side", "... the file ... loaded by the server ..." c) binaries
"... the server must be built with SSL support ..." d) whenever it seems
to be appropriate: "standby server", "... the server parses query ...",
"server configuration", "server process".

Because of its ambiguous usage, the definition of "server" must clarify
the allowed meanings. What's about:


server: Depending on the context, the term *server* denotes:

* An IP-port which is offered by any OS.   ?????
* A - possibly virtualized - machine
* An abbreviation for the slightly longer term "database(s)/cluster
server"  ??? this will support the readability, but not the clarity ???
* More ?


The term "host" is used mainly for IP configuration "host name", "host
address" and in the context of compiling "host language", "host
variable". These are clear situations and can be defined easily.

In response to


Browse pgsql-docs by date

  From Date Subject
Next Message Laurenz Albe 2020-05-20 11:38:28 Re: Add A Glossary
Previous Message PG Doc comments form 2020-05-20 10:07:03 Change JOIN tutorial to focus more on explicit joins

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2020-05-20 11:38:28 Re: Add A Glossary
Previous Message Ranier Vilela 2020-05-20 11:02:42 Re: Parallel Seq Scan vs kernel read ahead