Quick Links

Re: Experimenting with hash tables inside pg_dump

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject:	Re: Experimenting with hash tables inside pg_dump
Date:	2021-10-22 18:30:38
Message-ID:	6D552E76-EA56-421D-961C-F8781523958A@anarazel.de
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On October 22, 2021 8:54:13 AM PDT, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>Andres Freund <andres(at)anarazel(dot)de> writes:
>> On 2021-10-22 10:53:31 -0400, Tom Lane wrote:
>>> I'm skeptical of that, mainly because it doesn't work in old servers,
>
>> I think we can address that, if we think it's overall a promising approach to
>> pursue. E.g. if we don't need the indexes, we can make it = ANY().
>
>Hmm ... yeah, I guess we could get away with that. It might not scale
>as nicely to a huge database, but probably dumping a huge database
>from an ancient server isn't all that interesting.

I think compared to the overhead of locking that many tables and sending O(N) queries it shouldn't be a huge factor.

One think that looks like it might be worth doing, and not hard, is to use single row mode. No need to materialize all that data twice in memory.

At a later stage it might be worth sending the array separately as a parameter. Perhaps even binary encoded.

>I'm inclined to think that it could be sane to make getTableAttrs
>and getIndexes use this style, but we probably still want functions
>and such to use per-object queries. In those other catalogs there
>are many built-in objects that we don't really care about. The
>prepared-queries hack I was working on last night is probably plenty
>good enough there, and it's a much less invasive patch.

Yes, that seems reasonable. I think the triggers query would benefit from the batch approach though - I see that taking a long time in aggregate on a test database with many tables I had around (partially due to the self join), and we already materialize it.

>Were you planning to pursue this further, or did you want me to?

It seems too nice an improvement to drop on the floor. That said, I don't really have the mental bandwidth to pursue this beyond the POC stage - it seemed complicated enough that suggestion accompanied by a prototype was a good idea. So I'd be happy for you to incorporate this into your other changes.

>I'd want to layer it on top of the work I did at [1], else there's
>going to be lots of merge conflicts.

Makes sense. Even if nobody else were doing anything in the area I'd probably want to split it into one commit creating the query once, and then separately implement the batching.

Regards,

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

In response to

Re: Experimenting with hash tables inside pg_dump at 2021-10-22 15:54:13 from Tom Lane

Responses

Re: Experimenting with hash tables inside pg_dump at 2021-10-22 18:36:44 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2021-10-22 18:34:50	Re: [PATCH] Fix memory corruption in pg_shdepend.c
Previous Message	Daniel Gustafsson	2021-10-22 18:22:24	Re: [PATCH] Fix memory corruption in pg_shdepend.c