Re: design for parallel backup

From: Andres Freund <andres(at)anarazel(dot)de>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: design for parallel backup
Date: 2020-04-21 06:44:20
Message-ID: 20200421064420.z7eattzqbunbutz3@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2020-04-20 22:31:49 -0700, Andres Freund wrote:
> On 2020-04-21 10:20:01 +0530, Amit Kapila wrote:
> > It is quite likely that compression can benefit more from parallelism
> > as compared to the network I/O as that is mostly a CPU intensive
> > operation but I am not sure if we can just ignore the benefit of
> > utilizing the network bandwidth. In our case, after copying from the
> > network we do write that data to disk, so during filesystem I/O the
> > network can be used if there is some other parallel worker processing
> > other parts of data.
>
> Well, as I said, network and FS IO as done by server / pg_basebackup are
> both fully buffered by the OS. Unless the OS throttles the userland
> process, a large chunk of the work will be done by the kernel, in
> separate kernel threads.
>
> My workstation and my laptop can, in a single thread each, get close
> 20GBit/s of network IO (bidirectional 10GBit, I don't have faster - it's
> a thunderbolt 10gbe card) and iperf3 is at 55% CPU while doing so. Just
> connecting locally it's 45Gbit/s. Or over 8GBbyte/s of buffered
> filesystem IO. And it doesn't even have that high per-core clock speed.
>
> I just don't see this being the bottleneck for now.

FWIW, I just tested pg_basebackup locally.

Without compression and a stock postgres I get:
unix tcp tcp+ssl:
1.74GiB/s 1.02GiB/s 699MiB/s

That turns out to be bottlenecked by the backup manifest generation.

Without compression and a stock postgres I get, and --no-manifest
unix tcp tcp+ssl:
2.51GiB/s 1.63GiB/s 1.00GiB/s

I.e. all of them area already above 10Gbit/s network.

Looking at a profile it's clear that our small output buffer is the
bottleneck:
64kB Buffers + --no-manifest:
unix tcp tcp+ssl:
2.99GiB/s 2.56GiB/s 1.18GiB/s

At this point the backend is not actually the bottleneck anymore,
instead it's pg_basebackup. Which is in part due to the small buffer
used for output data (i.e. libc's FILE buffering), and in part because
we spend too much time memmove()ing data, because of the "left-justify"
logic in pqCheckInBufferSpace().

- Andres

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-04-21 06:48:02 Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2
Previous Message Michael Paquier 2020-04-21 06:36:22 Re: Remove non-fast promotion Re: Should we remove a fallback promotion? take 2