Quick Links

Re: Avoiding hash join batch explosions with extreme skew and weird stats

From:	David Kimura <david(dot)g(dot)kimura(at)gmail(dot)com>
To:	Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc:	Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesse Zhang <sbjesse(at)gmail(dot)com>, dkimura(at)pivotal(dot)io
Subject:	Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date:	2020-05-04 20:39:36
Message-ID:	CAHnPFjQiYN83NjQ4KvjX19Wti==uzyw8D24va56zJKzOt+B51A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Apr 29, 2020 at 4:44 PM David Kimura <david(dot)g(dot)kimura(at)gmail(dot)com> wrote:
>
> Following patch adds logic to create a batch 0 file for serial hash join so
> that even in pathalogical case we do not need to exceed work_mem.

Updated the patch to spill batch 0 tuples after it is marked as fallback.

A couple questions from looking more at serial code:

1) Does the current pattern to repartition batches *after* the previous
hashtable insert exceeds work_mem still make sense?

In that case we'd allow ourselves to exceed work_mem by one tuple. If that
doesn't seem correct anymore then I think we can move the space exceeded
check in ExecHashTableInsert() *before* actual hashtable insert.

2) After batch 0 is marked fallback, does the logic to insert into its batch
file fit more in MultiExecPrivateHash() or ExecHashTableInsert()?

The latter already has logic to decide whether to insert into hashtable or
batchfile

Thanks,
David

Attachment	Content-Type	Size
v6-0002-Implement-fallback-of-batch-0-for-serial-adaptive.patch	application/x-patch	5.5 KB

In response to

Re: Avoiding hash join batch explosions with extreme skew and weird stats at 2020-04-29 23:44:53 from David Kimura

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2020-05-04 21:22:01	Re: Poll: are people okay with function/operator table redesign?
Previous Message	Andres Freund	2020-05-04 19:41:45	Re: design for parallel backup