Re: Inefficiency in parallel pg_restore with many tables

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Inefficiency in parallel pg_restore with many tables
Date: 2023-07-15 18:19:16
Message-ID: 20230715181916.qluqwiuh5ckg4f75@awork3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2023-07-15 13:47:12 -0400, Tom Lane wrote:
> I wonder if we could replace the sorted ready-list with a priority heap,
> although that might be complicated by the fact that pop_next_work_item
> has to be capable of popping something that's not necessarily the
> largest remaining job. Another idea could be to be a little less eager
> to sort the list every time; I think in practice scheduling wouldn't
> get much worse if we only re-sorted every so often.

Perhaps we could keep track of where the newly inserted items are, and use
insertion sort or such when the number of new elements is much smaller than
the size of the already sorted elements?

As you say, a straight priority heap might not be easy. But we could just open
code using two sorted arrays, one large, one for recent additions that needs
to be newly sorted. And occasionally merge the small array into the big array,
once it has gotten large enough that sorting becomes expensive. We could go
for a heap of N>2 such arrays, but I doubt it would be worth much.

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikita Malakhov 2023-07-15 20:57:30 Protect extension' internal tables - how?
Previous Message Tom Lane 2023-07-15 17:47:12 Inefficiency in parallel pg_restore with many tables