Re: Inefficiency in parallel pg_restore with many tables

From: Nathan Bossart <nathandbossart(at)gmail(dot)com>
To: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Inefficiency in parallel pg_restore with many tables
Date: 2023-07-20 19:06:44
Message-ID: 20230720190644.GA1724613@nathanxps13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Here is a work-in-progress patch set for converting ready_list to a
priority queue. On my machine, Tom's 100k-table example [0] takes 11.5
minutes without these patches and 1.5 minutes with them.

One item that requires more thought is binaryheap's use of Datum. AFAICT
the Datum definitions live in postgres.h and aren't available to frontend
code. I think we'll either need to move the Datum definitions to c.h or to
adjust binaryheap to use "void *".

[0] https://postgr.es/m/3612876.1689443232%40sss.pgh.pa.us

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-misc-binaryheap-fixes.patch text/x-diff 3.0 KB
v2-0002-make-binaryheap-available-to-frontend.patch text/x-diff 4.0 KB
v2-0003-expand-binaryheap-api.patch text/x-diff 2.0 KB
v2-0004-use-priority-queue-for-pg_restore-ready_list.patch text/x-diff 11.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tristan Partin 2023-07-20 19:22:51 Re: Use COPY for populating all pgbench tables
Previous Message Gurjeet Singh 2023-07-20 18:36:08 Re: There should be a way to use the force flag when restoring databases