Quick Links

Re: POC: converting Lists into arrays

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: POC: converting Lists into arrays
Date:	2019-03-04 19:01:33
Message-ID:	20190304190133.vtv7vifuhkaqwh67@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2019-03-02 18:11:43 -0500, Tom Lane wrote:
> On test cases like "pg_bench -S" it seems to be pretty much within the
> noise level of being the same speed as HEAD.

I think that might be because it's bottleneck is just elsewhere
(e.g. very context switch heavy, very few lists of any length).

FWIW, even just taking context switches out of the equation leads to
a ~5-6 %benefit in a simple statement:

DO $f$BEGIN FOR i IN 1..500000 LOOP EXECUTE $s$SELECT aid, bid, abalance, filler FROM pgbench_accounts WHERE aid = 2045530;$s$;END LOOP;END;$f$;

master:
+ 6.05% postgres postgres [.] AllocSetAlloc
+ 5.52% postgres postgres [.] base_yyparse
+ 2.51% postgres postgres [.] palloc
+ 1.82% postgres postgres [.] hash_search_with_hash_value
+ 1.61% postgres postgres [.] core_yylex
+ 1.57% postgres postgres [.] SearchCatCache1
+ 1.43% postgres postgres [.] expression_tree_walker.part.4
+ 1.09% postgres postgres [.] check_stack_depth
+ 1.08% postgres postgres [.] MemoryContextAllocZeroAligned

patch v3:
+ 5.77% postgres postgres [.] base_yyparse
+ 4.88% postgres postgres [.] AllocSetAlloc
+ 1.95% postgres postgres [.] hash_search_with_hash_value
+ 1.89% postgres postgres [.] core_yylex
+ 1.64% postgres postgres [.] SearchCatCache1
+ 1.46% postgres postgres [.] expression_tree_walker.part.0
+ 1.45% postgres postgres [.] palloc
+ 1.18% postgres postgres [.] check_stack_depth
+ 1.13% postgres postgres [.] MemoryContextAllocZeroAligned
+ 1.04% postgres libc-2.28.so [.] _int_malloc
+ 1.01% postgres postgres [.] nocachegetattr

And even just pgbenching the EXECUTEd statement above gives me a
reproducible ~3.5% gain when using -M simple, and ~3% when using -M
prepared.

Note than when not using prepared statement (a pretty important
workload, especially as long as we don't have a pooling solution that
actually allows using prepared statement across connections), even after
the patch most of the allocator overhead is still from list allocations,
but it's near exclusively just the "create a new list" case:

+ 5.77% postgres postgres [.] base_yyparse
- 4.88% postgres postgres [.] AllocSetAlloc
- 80.67% AllocSetAlloc
- 68.85% AllocSetAlloc
- 57.65% palloc
- 50.30% new_list (inlined)
- 37.34% lappend
+ 12.66% pull_var_clause_walker
+ 8.83% build_index_tlist (inlined)
+ 8.80% make_pathtarget_from_tlist
+ 8.73% get_quals_from_indexclauses (inlined)
+ 8.73% distribute_restrictinfo_to_rels
+ 8.68% RewriteQuery
+ 8.56% transformTargetList
+ 8.46% make_rel_from_joinlist
+ 4.36% pg_plan_queries
+ 4.30% add_rte_to_flat_rtable (inlined)
+ 4.29% build_index_paths
+ 4.23% match_clause_to_index (inlined)
+ 4.22% expression_tree_mutator
+ 4.14% transformFromClause
+ 1.02% get_index_paths
+ 17.35% list_make1_impl
+ 16.56% list_make1_impl (inlined)
+ 15.87% lcons
+ 11.31% list_copy (inlined)
+ 1.58% lappend_oid
+ 12.90% expression_tree_mutator
+ 9.73% get_relation_info
+ 4.71% bms_copy (inlined)
+ 2.44% downcase_identifier
+ 2.43% heap_tuple_untoast_attr
+ 2.37% add_rte_to_flat_rtable (inlined)
+ 1.69% btbeginscan
+ 1.65% CreateTemplateTupleDesc
+ 1.61% core_yyalloc (inlined)
+ 1.59% heap_copytuple
+ 1.54% text_to_cstring (inlined)
+ 0.84% ExprEvalPushStep (inlined)
+ 0.84% ExecInitRangeTable
+ 0.84% scanner_init
+ 0.83% ExecInitRangeTable
+ 0.81% CreateQueryDesc
+ 0.81% _bt_search
+ 0.77% ExecIndexBuildScanKeys
+ 0.66% RelationGetIndexScan
+ 0.65% make_pathtarget_from_tlist

Given how hard it is to improve performance with as flatly distributed
costs as the above profiles, I actually think these are quite promising
results.

I'm not even convinced that it makes all that much sense to measure
end-to-end performance here, it might be worthwhile to measure with a
debugging function that allows to exercise parsing, parse-analysis,
rewrite etc at configurable loop counts. Given the relatively evenly
distributed profiles were going to have to make a few different
improvements to make headway, and it's hard to see benefits of
individual ones if you look at the overall numbers.

Greetings,

Andres Freund

In response to

Re: POC: converting Lists into arrays at 2019-03-02 23:11:43 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2019-03-04 19:03:27	Re: POC: converting Lists into arrays
Previous Message	Alvaro Herrera	2019-03-04 18:56:00	Re: pg_partition_tree crashes for a non-defined relation