From: | Seref Arikan <serefarikan(at)gmail(dot)com> |
---|---|
To: | Pierre Barre <pierre(at)barre(dot)sh> |
Cc: | pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Date: | 2025-07-18 12:55:56 |
Message-ID: | CAG1bHGP6WOFYrjvntwQap8gWWsgtZoFHPQsvBeDkORoZFErP+Q@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Thanks, I learned something else: I didn't know Hetzner offered S3
compatible storage.
The interesting thing is, a few searches about the performance return
mostly negative impressions about their object storage in comparison to the
original S3.
Finding out what kind of performance your benchmarks would yield on a pure
AWS setting would be interesting. I am not asking you to do that, but you
may get even better performance in that case :)
Cheers,
Seref
On Fri, Jul 18, 2025 at 11:58 AM Pierre Barre <pierre(at)barre(dot)sh> wrote:
> Hi Seref,
>
> For the benchmarks, I used Hetzner's cloud service with the following
> setup:
>
> - A Hetzner s3 bucket in the FSN1 region
> - A virtual machine of type ccx63 48 vCPU 192 GB memory
> - 3 ZeroFS nbd devices (same s3 bucket)
> - A ZFS stripped pool with the 3 devices
> - 200GB zfs L2ARC
> - Postgres configured accordingly memory-wise as well as with
> synchronous_commit = off, wal_init_zero = off and wal_recycle = off.
>
> Best,
> Pierre
>
> On Fri, Jul 18, 2025, at 12:42, Seref Arikan wrote:
>
> Sorry, this was meant to go to the whole group:
>
> Very interesting!. Great work. Can you clarify how exactly you're running
> postgres in your tests? A specific AWS service? What's the test
> infrastructure that sits above the file system?
>
> On Thu, Jul 17, 2025 at 11:59 PM Pierre Barre <pierre(at)barre(dot)sh> wrote:
>
> Hi everyone,
>
> I wanted to share a project I've been working on that enables PostgreSQL
> to run on S3 storage while maintaining performance comparable to local
> NVMe. The approach uses block-level access rather than trying to map
> filesystem operations to S3 objects.
>
> ZeroFS: https://github.com/Barre/ZeroFS
>
> # The Architecture
>
> ZeroFS provides NBD (Network Block Device) servers that expose S3 storage
> as raw block devices. PostgreSQL runs unmodified on ZFS pools built on
> these block devices:
>
> PostgreSQL -> ZFS -> NBD -> ZeroFS -> S3
>
> By providing block-level access and leveraging ZFS's caching capabilities
> (L2ARC), we can achieve microsecond latencies despite the underlying
> storage being in S3.
>
> ## Performance Results
>
> Here are pgbench results from PostgreSQL running on this setup:
>
> ### Read/Write Workload
>
> ```
> postgres(at)ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 example
> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))
> starting vacuum...end.
> transaction type: <builtin: TPC-B (sort of)>
> scaling factor: 50
> query mode: simple
> number of clients: 50
> number of threads: 15
> maximum number of tries: 1
> number of transactions per client: 100000
> number of transactions actually processed: 5000000/5000000
> number of failed transactions: 0 (0.000%)
> latency average = 0.943 ms
> initial connection time = 48.043 ms
> tps = 53041.006947 (without initial connection time)
> ```
>
> ### Read-Only Workload
>
> ```
> postgres(at)ubuntu-16gb-fsn1-1:/root$ pgbench -c 50 -j 15 -t 100000 -S
> example
> pgbench (16.9 (Ubuntu 16.9-0ubuntu0.24.04.1))
> starting vacuum...end.
> transaction type: <builtin: select only>
> scaling factor: 50
> query mode: simple
> number of clients: 50
> number of threads: 15
> maximum number of tries: 1
> number of transactions per client: 100000
> number of transactions actually processed: 5000000/5000000
> number of failed transactions: 0 (0.000%)
> latency average = 0.121 ms
> initial connection time = 53.358 ms
> tps = 413436.248089 (without initial connection time)
> ```
>
> These numbers are with 50 concurrent clients and the actual data stored in
> S3. Hot data is served from ZFS L2ARC and ZeroFS's memory caches, while
> cold data comes from S3.
>
> ## How It Works
>
> 1. ZeroFS exposes NBD devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can
> use like any other block device
> 2. Multiple cache layers hide S3 latency:
> a. ZFS ARC/L2ARC for frequently accessed blocks
> b. ZeroFS memory cache for metadata and hot dataZeroFS exposes NBD
> devices (e.g., /dev/nbd0) that PostgreSQL/ZFS can use like any other block
> device
> c. Optional local disk cache
> 3. All data is encrypted (ChaCha20-Poly1305) before hitting S3
> 4. Files are split into 128KB chunks for insertion into ZeroFS' LSM-tree
>
> ## Geo-Distributed PostgreSQL
>
> Since each region can run its own ZeroFS instance, you can create
> geographically distributed PostgreSQL setups.
>
> Example architectures:
>
> Architecture 1
>
>
> PostgreSQL Client
> |
> | SQL queries
> |
> +--------------+
> | PG Proxy |
> | (HAProxy/ |
> | PgBouncer) |
> +--------------+
> / \
> / \
> Synchronous Synchronous
> Replication Replication
> / \
> / \
> +---------------+ +---------------+
> | PostgreSQL 1 | | PostgreSQL 2 |
> | (Primary) |◄------►| (Standby) |
> +---------------+ +---------------+
> | |
> | POSIX filesystem ops |
> | |
> +---------------+ +---------------+
> | ZFS Pool 1 | | ZFS Pool 2 |
> | (3-way mirror)| | (3-way mirror)|
> +---------------+ +---------------+
> / | \ / | \
> / | \ / | \
> NBD:10809 NBD:10810 NBD:10811 NBD:10812 NBD:10813 NBD:10814
> | | | | | |
> +--------++--------++--------++--------++--------++--------+
> |ZeroFS 1||ZeroFS 2||ZeroFS 3||ZeroFS 4||ZeroFS 5||ZeroFS 6|
> +--------++--------++--------++--------++--------++--------+
> | | | | | |
> | | | | | |
> S3-Region1 S3-Region2 S3-Region3 S3-Region4 S3-Region5 S3-Region6
> (us-east) (eu-west) (ap-south) (us-west) (eu-north) (ap-east)
>
> Architecture 2:
>
> PostgreSQL Primary (Region 1) ←→ PostgreSQL Standby (Region 2)
> \ /
> \ /
> Same ZFS Pool (NBD)
> |
> 6 Global ZeroFS
> |
> S3 Regions
>
>
> The main advantages I see are:
> 1. Dramatic cost reduction for large datasets
> 2. Simplified geo-distribution
> 3. Infinite storage capacity
> 4. Built-in encryption and compression
>
> Looking forward to your feedback and questions!
>
> Best,
> Pierre
>
> P.S. The full project includes a custom NFS filesystem too.
>
>
>
From | Date | Subject | |
---|---|---|---|
Next Message | Pierre Barre | 2025-07-18 13:11:40 | Re: PostgreSQL on S3-backed Block Storage with Near-Local Performance |
Previous Message | KK CHN | 2025-07-18 12:52:55 | PgBouncer Prepared Statement ERROR |