Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?

From: Alban Hertroys <haramrae(at)gmail(dot)com>
To: HECTOR INGERTO <HECTOR_25E(at)hotmail(dot)com>
Cc: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-general(at)postgresql(dot)org <pgsql-general(at)postgresql(dot)org>" <pgsql-general(at)postgresql(dot)org>
Subject: Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?
Date: 2023-01-17 08:05:00
Message-ID: 9CCDF037-BB5F-4369-AE12-37B25C20B0EF@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


> On 16 Jan 2023, at 15:37, HECTOR INGERTO <HECTOR_25E(at)hotmail(dot)com> wrote:
>
> > The database relies on the data being consistent when it performs crash recovery.
> > Imagine that a checkpoint is running while you take your snapshot. The checkpoint
> > syncs a data file with a new row to disk. Then it writes a WAL record and updates
> > the control file. Now imagine that the table with the new row is on a different
> > file system, and your snapshot captures the WAL and the control file, but not
> > the new row (it was still sitting in the kernel page cache when the snapshot was taken).
> > You end up with a lost row.
> >
> > That is only one scenario. Many other ways of corruption can happen.
>
> Can we say then that the risk comes only from the possibility of a checkpoint running inside the time gap between the non-simultaneous snapshots?

I recently followed a course on distributed algorithms and recognised one of the patterns here.

The problem boils down to a distributed snapshotting algorithm, where both ZFS filesystem processes each initiate their own snapshot independently.

Without communicating with each other and with the database which messages (in this case traffic to and from the database to each FS) are part of their snapshots (sent or received), there are chances of lost messages, where either none of the process snapshots know that a 'message' was sent or none received it.

Algorithms like Tarry, Lai-Yang or the Echo algorithm solve this by adding communication between those processes about messages in transit.

Alban Hertroys
--
There is always an exception to always.

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2023-01-17 08:19:50 Re: Are ZFS snapshots unsafe when PGSQL is spreading through multiple zpools?
Previous Message pran d 2023-01-17 06:15:57 pg_stat_all_tables: n_live_tup column value not persisting