Re: Macro customizable hashtable / bitmapscan & aggregation perf

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Macro customizable hashtable / bitmapscan & aggregation perf
Date: 2016-10-11 15:56:25
Message-ID: 20161011155625.kexqknqvls5yii3b@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016-10-11 04:29:31 +0200, Tomas Vondra wrote:
> On 10/11/2016 04:07 AM, Andres Freund wrote:
> > On 2016-10-10 17:46:22 -0700, Andres Freund wrote:
> > > > TPC-DS (tpcds.ods)
> > > > ------------------
> > > >
> > > > In this case, I'd say the results are less convincing. There are quite a few
> > > > queries that got slower by ~10%, which is well above - for example queries
> > > > 22 and 67. There are of course queries that got ~10% faster, and in total
> > > > the patched version executed more queries (so overall the result is slightly
> > > > positive, but not significantly).
> > >
> > > That's interesting. I wonder whether that's plan changes just due to the
> > > changing memory estimates, or what's causing that. I'll look into it.
> >
> > Hm. Based on an initial look those queries aren't planned with any of
> > the affected codepaths. Could this primarily be a question of
> > randomness? Would it perhaps make sense to run the tests in a comparable
> > order? Looking at tpcds.py and the output files, it seems that the query
> > order differes between the runs, that can easily explain bigger
> > difference than the above. For me a scale=1 run creates a database of
> > approximately 4.5GB, thus with shared_buffers=1GB execution order is
> > likely to have a significant performance impact.
> >
>
> Yes, I see similar plans (no bitmap index scans or hash aggregates). But the
> difference is there, even when running the query alone (so it's not merely
> due to the randomized ordering).

> I wonder whether this is again due to compiler moving stuff around.

It looks like that. I looked through a significant set of plans where
there we time differences (generated on my machine), and none of them
had bitmap or hash groupings to any significant degree. Comparing
profiles in those cases usually shows a picture like:
24.98% +0.32% postgres [.] slot_deform_tuple
16.58% -0.05% postgres [.] ExecMakeFunctionResultNoSets
12.41% -0.01% postgres [.] slot_getattr
6.10% +0.39% postgres [.] heap_getnext
4.41% -0.37% postgres [.] ExecQual
3.08% +0.12% postgres [.] ExecEvalScalarVarFast
2.85% -0.11% postgres [.] check_stack_depth
2.48% +0.42% postgres [.] ExecEvalConst
2.44% -0.33% postgres [.] heapgetpage
2.34% +0.11% postgres [.] ExecScan
2.14% -0.20% postgres [.] ExecStoreTuple

I.e. pretty random performance changes. This indeed looks like binary
layout changes. Looking at these plans and at profiles spanning a run
of all queries shows that bitmap scans and hash aggregations, while
present, account for a very small amount of time in total. So tpc-ds
doesn't look particularly interesting to evaluate these patches - but
vey interesting for my slot deforming and qual evaluation patches.

Btw, query_14.sql as generated by your templates (in pgperffarm) doesn't
seem to work here. And I never had the patience to run query_1.sql to
completion... Looks like we could very well use some planner
improvements here.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-10-11 17:23:56 Re: Is it time to kill support for very old servers?
Previous Message Pavel Stehule 2016-10-11 15:21:21 Re: autonomous transactions