Re: Unlinking Parallel Hash Join inner batch files sooner

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Andrei Lepikhov <a(dot)lepikhov(at)postgrespro(dot)ru>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>, John Naylor <johncnaylorls(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Unlinking Parallel Hash Join inner batch files sooner
Date: 2024-02-21 23:42:09
Message-ID: CA+hUKGLKu0ZJa0K9o70_Q9iA6X++L8Ldcc402S1VKir7+ciMgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 21, 2024 at 7:34 PM Andrei Lepikhov
<a(dot)lepikhov(at)postgrespro(dot)ru> wrote:
> I see in [1] that the reporter mentioned a delay between the error
> message in parallel HashJoin and the return control back from PSQL. Your
> patch might reduce this delay.
> Also, I have the same complaint from users who processed gigabytes of
> data in parallel HashJoin. Presumably, they also stuck into the unlink
> of tons of temporary files. So, are you going to do something with this
> code?

Yeah, right. I will aim to get this into the tree next week. First,
there are a couple of minor issues to resolve around freeing that
Heikki mentioned. Then there is the question of whether we think this
might be a candidate for back-patching, given the complaints you
mention. Opinions?

I would add that the problems you reach when you get to very large
number of partitions are hard (see several very long threads about
extreme skew for one version of the problem, but even with zero/normal
skewness and perfect estimation of the number of partitions, if you
ask a computer to partition 42TB of data into partitions that fit in a
work_mem suitable for a Commodore 64, it's gonna hurt on several
levels) and this would only slightly improve one symptom. One idea
that might improve just the directory entry and file descriptor
aspect, would be to scatter the partitions into (say) 1MB chunks
within the file, and hope that the file system supports holes (a bit
like logtape.c's multiplexing but I wouldn't do it quite like that).

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Matthias van de Meent 2024-02-21 23:43:11 Re: Experiments with Postgres and SSL
Previous Message Maiquel Grassi 2024-02-21 23:30:17 RE: Psql meta-command conninfo+