Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From: Steve Singer <steve(at)ssinger(dot)info>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Date: 2013-01-19 22:33:05
Message-ID: BLU0-SMTP78072D679E8BC80C7BD3C3DC110@phx.gbl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 13-01-09 03:07 PM, Tom Lane wrote:
> Andres Freund <andres(at)2ndquadrant(dot)com> writes:
>> Well, I *did* benchmark it as noted elsewhere in the thread, but thats
>> obviously just machine (E5520 x 2) with one rather restricted workload
>> (pgbench -S -jc 40 -T60). At least its rather palloc heavy.
>> Here are the numbers:
>> before:
>> #101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
>> after:
>> #101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992
>> So on my system if there is a difference, its positive (0.12%).
> pgbench-based testing doesn't fill me with a lot of confidence for this
> --- its numbers contain a lot of communication overhead, not to mention
> that pgbench itself can be a bottleneck. It struck me that we have a
> recent test case that's known to be really palloc-intensive, namely
> Pavel's example here:
> http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com
>
> I set up a non-cassert build of commit
> 78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
> reduced the data-copying overhead for that). On my Fedora 16 machine
> (dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
> I get a runtime for Pavel's example of 17023 msec (average over five
> runs). I then applied oprofile and got a breakdown like this:
>
> samples| %|
> ------------------
> 108409 84.5083 /home/tgl/testversion/bin/postgres
> 13723 10.6975 /lib64/libc-2.14.90.so
> 3153 2.4579 /home/tgl/testversion/lib/postgresql/plpgsql.so
>
> samples % symbol name
> 10960 10.1495 AllocSetAlloc
> 6325 5.8572 MemoryContextAllocZeroAligned
> 6225 5.7646 base_yyparse
> 3765 3.4866 copyObject
> 2511 2.3253 MemoryContextAlloc
> 2292 2.1225 grouping_planner
> 2044 1.8928 SearchCatCache
> 1956 1.8113 core_yylex
> 1763 1.6326 expression_tree_walker
> 1347 1.2474 MemoryContextCreate
> 1340 1.2409 check_stack_depth
> 1276 1.1816 GetCachedPlan
> 1175 1.0881 AllocSetFree
> 1106 1.0242 GetSnapshotData
> 1106 1.0242 _SPI_execute_plan
> 1101 1.0196 extract_query_dependencies_walker
>
> I then applied the palloc.h and mcxt.c hunks of your patch and rebuilt.
> Now I get an average runtime of 16666 ms, a full 2% faster, which is a
> bit astonishing, particularly because the oprofile results haven't moved
> much:
>
> 107642 83.7427 /home/tgl/testversion/bin/postgres
> 14677 11.4183 /lib64/libc-2.14.90.so
> 3180 2.4740 /home/tgl/testversion/lib/postgresql/plpgsql.so
>
> samples % symbol name
> 10038 9.3537 AllocSetAlloc
> 6392 5.9562 MemoryContextAllocZeroAligned
> 5763 5.3701 base_yyparse
> 4810 4.4821 copyObject
> 2268 2.1134 grouping_planner
> 2178 2.0295 core_yylex
> 1963 1.8292 palloc
> 1867 1.7397 SearchCatCache
> 1835 1.7099 expression_tree_walker
> 1551 1.4453 check_stack_depth
> 1374 1.2803 _SPI_execute_plan
> 1282 1.1946 MemoryContextCreate
> 1187 1.1061 AllocSetFree
> ...
> 653 0.6085 palloc0
> ...
> 552 0.5144 MemoryContextAlloc
>
> The number of calls of AllocSetAlloc certainly hasn't changed at all, so
> how did that get faster?
>
> I notice that the postgres executable is about 0.2% smaller, presumably
> because a whole lot of inlined fetches of CurrentMemoryContext are gone.
> This makes me wonder if my result is due to chance improvements of cache
> line alignment for inner loops.
>
> I would like to know if other people get comparable results on other
> hardware (non-Intel hardware would be especially interesting). If this
> result holds up across a range of platforms, I'll withdraw my objection
> to making palloc a plain function.
>
> regards, tom lane
>

Sorry for the delay I only read this thread today.

I just tried Pawel's test on a POWER5 machine with an older version of
gcc (see the grebe buildfarm animal for details)

78a5e738e: 37874.855 (average of 6 runs)
78a5e738 + palloc.h + mcxt.c: 38076.8035

The functions do seem to slightly slow things down on POWER. I haven't bothered to run oprofile or tprof to get a breakdown of the functions since Andres has already removed this from his patch.

Steve

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dickson S. Guedes 2013-01-19 22:34:59 Re: Review: Patch to compute Max LSN of Data Pages
Previous Message Tom Lane 2013-01-19 22:23:19 Re: string escaping in tutorial/syscat.source