Re: Add A Glossary

From: Jürgen Purtz <juergen(at)purtz(dot)de>
To: pgsql-hackers(at)postgresql(dot)org, Pg Docs <pgsql-docs(at)lists(dot)postgresql(dot)org>
Cc: Erik Rijkers <er(at)xs4all(dot)nl>, Fabien COELHO <coelho(at)cri(dot)ensmp(dot)fr>, Corey Huinker <corey(dot)huinker(at)gmail(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Roger Harkavy <rogerharkavy(at)gmail(dot)com>
Subject: Re: Add A Glossary
Date: 2020-05-18 16:08:01
Message-ID: e32802cd-7795-f3c5-db66-0e262b232132@purtz.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs pgsql-hackers

On 17.05.20 17:28, Alvaro Herrera wrote:
> On 2020-May-17, Erik Rijkers wrote:
>
>> On 2020-05-17 08:51, Alvaro Herrera wrote:
>>> I don't think that's the general understanding of those terms. For all
>>> I know, they*are* synonyms, and there's no specific term for "the
>>> fluctuating objects" as you call them. The instance is either running
>>> (in which case there are processes and RAM) or it isn't.
>> For what it's worth, I've also always understood 'instance' as 'a running
>> database'. I admit it might be a left-over from my oracle years:
>>
>> https://docs.oracle.com/cd/E11882_01/server.112/e40540/startup.htm#CNCPT601
>>
>> There, 'instance' clearly refers to a running database. When that database
>> is stopped, it ceases to be an instance.
> I've never understood it that way, but I'm open to having my opinion on
> it changed. So let's discuss it and maybe gather opinions from others.
>
> I think the terms under discussion are just
>
> * cluster
> * instance
> * server
>
> We don't have "host" (I just made it a synonym for server), but perhaps
> we can add that too, if it's useful. It would be good to be consistent
> with historical Postgres usage, such as the initdb usage of "cluster"
> etc.
>
> Perhaps we should not only define what our use of each term is, but also
> explain how each term is used outside PostgreSQL and highlight the
> differences. (This would be particularly useful for "cluster" ISTM.)

In fact, we have reached a point where we don't have a common
understanding of a group of terms. I'm sure that we will meet some more
situations like this in the future. Such discussions, subsequent
decisions, and implementations in the docs are necessary to gain a solid
foundation - primarily for newcomers (what is my first motivation) as
well as for more complex discussions among experts. Obviously, each of
us will include his previous understanding of terms. But we also should
be open to sometimes revise old terms.

Here are my two cents.

cluster/instance: PG (mainly) consists of a group of processes that
commonly act on shared buffers. The processes are very closely related
to each other and with the buffers. They exist altogether or not at all.
They use a common initialization file and are incarnated by one command.
Everything exists solely in RAM and therefor has a fluctuating nature.
In summary: they build a unit and this unit needs to have a name of
itself. In some pages we used to use the term *instance* - sometimes in
extended forms: *database instance*, *PG instance*, *standby instance*,
*standby server instance*, *server instance*, or *remote instance*.  For
me, the term *instance* makes sense, the extensions *standby instance*
and *remote instance* in their context too.

The next essential component is the data itself. It is organized as a
group of databases plus some common management information (global,
pg_wal, pg_xact, pg_tblspc, ...). The complete data must be treated as a
whole because the management information concerns all databases. Its
nature is different from the processes and shared buffers. Of course,
its content changes, but it has a steady nature. It even survives a
'power down'. There is one command to instantiate a new incarnation of
the directory structure and all files. In summary, it's something of its
own and should have its own name. 'database' is not possible because it
consists of databases and other things. My favorite is *cluster*;
*database cluster* is also possible.

server/host: We need a term to describe the underlying hardware
respectively the virtual machine or container, where PG is running. I
suggest to use both *server* and *host*. In computer science, both have
their eligibility and are widely used. Everybody understands
*client/server architecture* or *host* in TCP/IP configuration. We
cannot change such matter of course. I suggest to use both depending on
the context, but with the same meaning: "real hardware, a container, or
a virtual machine".

--

Jürgen Purtz

(PS: I added the docs mailing list)

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Weatherby,Gerard 2020-05-18 19:08:21 26.2.4 Setting Up a Standby Server
Previous Message Michael Paquier 2020-05-18 06:16:19 Re: Missing comma?

Browse pgsql-hackers by date

  From Date Subject
Next Message Luke Porter 2020-05-18 16:21:35 PostgresSQL project
Previous Message Bruce Momjian 2020-05-18 15:39:34 Re: factorial function/phase out postfix operators?