Re: DBT-3 with SF=20 got failed

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndQuadrant(dot)com>
Cc: Kohei KaiGai <kaigai(at)kaigai(dot)gr(dot)jp>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: DBT-3 with SF=20 got failed
Date: 2015-08-22 00:37:02
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801136A5F@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Hello KaiGai-san,
>
> On 08/21/2015 02:28 AM, Kouhei Kaigai wrote:
> ...
> >>
> >> But what is the impact on queries that actually need more than 1GB
> >> of buckets? I assume we'd only limit the initial allocation and
> >> still allow the resize based on the actual data (i.e. the 9.5
> >> improvement), so the queries would start with 1GB and then resize
> >> once finding out the optimal size (as done in 9.5). The resize is
> >> not very expensive, but it's not free either, and with so many
> >> tuples (requiring more than 1GB of buckets, i.e. ~130M tuples) it's
> >> probably just a noise in the total query runtime. But I'd be nice
> >> to see some proofs of that ...
> >>
> > The problem here is we cannot know exact size unless Hash node
> > doesn't read entire inner relation. All we can do is relying
> > planner's estimation, however, it often computes a crazy number of
> > rows. I think resizing of hash buckets is a reasonable compromise.
>
> I understand the estimation problem. The question I think we need to
> answer is how to balance the behavior for well- and poorly-estimated
> cases. It'd be unfortunate if we lower the memory consumption in the
> over-estimated case while significantly slowing down the well-estimated
> ones.
>
> I don't think we have a clear answer at this point - maybe it's not a
> problem at all and it'll be a win no matter what threshold we choose.
> But it's a separate problem from the bugfix.
>
I agree with this is a separate (and maybe not easy) problem.

If somebody know previous research in academic area, please share with us.

> >> I believe the patch proposed by KaiGai-san is the right one to fix
> >> the bug discussed in this thread. My understanding is KaiGai-san
> >> withdrew the patch as he wants to extend it to address the
> >> over-estimation issue.
> >>
> >> I don't think we should do that - IMHO that's an unrelated
> >> improvement and should be addressed in a separate patch.
> >>
> > OK, it might not be a problem we should conclude within a few days,
> > just before the beta release.
>
> I don't quite see a reason to wait for the over-estimation patch. We
> probably should backpatch the bugfix anyway (although it's much less
> likely to run into that before 9.5), and we can't really backpatch the
> behavior change there (as there's no hash resize).
>
I don't argue this bugfix anymore.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-08-22 00:59:27 Re: [PATCH] postgres_fdw extension support
Previous Message Tom Lane 2015-08-22 00:33:56 Re: Memory allocation in spi_printtup()