Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

From: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Remi Colinet <remi(dot)colinet(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [Patch] Make block and file size for WAL and relations defined at cluster creation
Date: 2018-01-25 12:26:47
Message-ID: CAPpHfduUvR-HnO2XD=BQu5PvDr=r=oxPJwRmqQsPjcMHufV9+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 3, 2018 at 12:26 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Dec 31, 2017 at 12:00 PM, Remi Colinet <remi(dot)colinet(at)gmail(dot)com>
> wrote:
> > Below patch makes block and file sizes defined at cluster creation for
> both
> > the WAL and the relations. This avoids having different server builds for
> > each possible combination of block size and file sizes.\
>
> The email thread where we discussed making the WAL segment size
> configurable at initdb time contained a detailed rationale, explaining
> why it was useful to be able to make such a change. The very short
> version is that, if a system is generating WAL at a very high rate,
> being able to group that WAL into fewer, larger files makes life
> easier since, for example, the latency requirements for
> archive_command are not as tight, and "ls pg_wal" doesn't have to go
> into the tank just trying to read the directory contents.
>
> Your email doesn't seem to contain a rationale explaining why the
> block and file sizes should be run-time configurable. There may be a
> very good reason, but can you explain what it is?
>

I'd like add my 2 cents regarding larger relation file sizes. While
dealing with large
multi-terabyte databases, user may operate with number of files greater than
max_files_per_process. In this case, fetching blocks from relation may
appear
to be quite inefficient. For example, fetching of another relation block
may frequently
cause open of one file description and eviction and close of another.
Assuming
that modern servers operate multi-terabyte of RAM, that may happen even
while dealing with data fitting to OS cache. Thus, this overhead is really
pretty
sensitive.

Possible solution might be to increase max_files_per_process parameter.
However, assuming that hundreds or even thousands of backends are
running, user may easily hit the OS limit over number of open file
descriptors.
Even if user increases that limit, it may cause a performance degradation,
because kernel don't operate that large number of file descriptors
efficiently.

This problem will go away if we switch to threaded model with pread/pwrite.
Then we wouldn't have per-backend file descriptor for every file.
However, that doesn't seem to be a close future. This is why, larger file
sizes seem to be a valid approach to mitigate this problem meantime.
Experimental research on this subject is required before considering
committing any patches though.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-25 12:38:37 Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
Previous Message Ildus Kurbangaliev 2018-01-25 12:26:46 Re: [PROPOSAL] Shared Ispell dictionaries