Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Steve Singer <steve(at)ssinger(dot)info>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Re: [PATCH] unified frontend support for pg_malloc et al and palloc/pfree mulation (was xlogreader-v4)
Date: 2013-01-21 07:28:56
Message-ID: 20130121072856.GB6880@awork2.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2013-01-19 17:33:05 -0500, Steve Singer wrote:
> On 13-01-09 03:07 PM, Tom Lane wrote:
> >Andres Freund <andres(at)2ndquadrant(dot)com> writes:
> >>Well, I *did* benchmark it as noted elsewhere in the thread, but thats
> >>obviously just machine (E5520 x 2) with one rather restricted workload
> >>(pgbench -S -jc 40 -T60). At least its rather palloc heavy.
> >>Here are the numbers:
> >>before:
> >>#101646.763208 101350.361595 101421.425668 101571.211688 101862.172051 101449.857665
> >>after:
> >>#101553.596257 102132.277795 101528.816229 101733.541792 101438.531618 101673.400992
> >>So on my system if there is a difference, its positive (0.12%).
> >pgbench-based testing doesn't fill me with a lot of confidence for this
> >--- its numbers contain a lot of communication overhead, not to mention
> >that pgbench itself can be a bottleneck. It struck me that we have a
> >recent test case that's known to be really palloc-intensive, namely
> >Pavel's example here:
> >http://www.postgresql.org/message-id/CAFj8pRCKfoz6L82PovLXNK-1JL=jzjwaT8e2BD2PwNKm7i7KVg@mail.gmail.com
> >
> >I set up a non-cassert build of commit
> >78a5e738e97b4dda89e1bfea60675bcf15f25994 (ie, just before the patch that
> >reduced the data-copying overhead for that). On my Fedora 16 machine
> >(dual 2.0GHz Xeon E5503, gcc version 4.6.3 20120306 (Red Hat 4.6.3-2))
> >I get a runtime for Pavel's example of 17023 msec (average over five
> >runs). I then applied oprofile and got a breakdown like this:
...
> >
> >The number of calls of AllocSetAlloc certainly hasn't changed at all, so
> >how did that get faster?
> >
> >I notice that the postgres executable is about 0.2% smaller, presumably
> >because a whole lot of inlined fetches of CurrentMemoryContext are gone.
> >This makes me wonder if my result is due to chance improvements of cache
> >line alignment for inner loops.
> >
> >I would like to know if other people get comparable results on other
> >hardware (non-Intel hardware would be especially interesting). If this
> >result holds up across a range of platforms, I'll withdraw my objection
> >to making palloc a plain function.
> >
> > regards, tom lane
> >
>
> Sorry for the delay I only read this thread today.
>
>
> I just tried Pawel's test on a POWER5 machine with an older version of gcc
> (see the grebe buildfarm animal for details)
>
> 78a5e738e: 37874.855 (average of 6 runs)
> 78a5e738 + palloc.h + mcxt.c: 38076.8035
>
> The functions do seem to slightly slow things down on POWER. I haven't
> bothered to run oprofile or tprof to get a breakdown of the functions
> since Andres has already removed this from his patch.

I haven't removed it from the patch afaik, so it would be great to get a
profile here! Its "only" for xlogdump, but that tool helped me immensely
and I don't want to maintain it independently...

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-01-21 07:48:45 Re: CF3+4 (was Re: Parallel query execution)
Previous Message Andres Freund 2013-01-21 07:25:56 Re: logical changeset generation v4