Re: generating a large XML document

From: Samuel Gendler <sgendler(at)ideasculptor(dot)com>
To: Julius Tuskenis <julius(at)nsoft(dot)lt>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: generating a large XML document
Date: 2011-06-20 07:21:06
Message-ID: BANLkTi=pLvOz5K=3Voh93D9bMX0V8sUezQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Jun 19, 2011 at 11:36 PM, Julius Tuskenis <julius(at)nsoft(dot)lt> wrote:

> Hello,
>
> I'm sorry to write again, but as I received no answer I wonder if there is
> a better mailing list to address concerning this question? Or is there
> nothing to be done about the speed of xmlagg ?. Please let me as no answer
> is the worst answer to get....

I played around a little after your initial post, but I'm not a postgresql
developer and I have no familiarity with the internals, so everything I'm
about to write is pure conjecture.

Programming languages that treat strings as immutable often suffer from this
kind of problem. WIth each string concatenation, both strings have to be
copied to a new location in memory. I suspect that this is happening in
this case. The only viable fix would be to use a buffer that is mutable and
append into it rather than doing raw string concatenation. If this is truly
the problem, I don't see that you have much choice but to re-implement
xmlagg in one of the available languages such that it uses a buffer instead
of immutable string concatenation. It is probably that the xmlelement
function doesn't need to be rewritten, since it is only concatenating
relatively short strings. It is less efficient than appending to a buffer,
but shouldn't get catastrophically slow. But xmlagg is concatenating many
rows. If each concatenation results in a full copy of the already
concatenated rows, you can see where performance would drop
catastrophically.

Here's the first google result for 'mutable string python' that I found,
which addresses this problem in python.
http://www.skymind.com/~ocrow/python_string/ You could rewrite the aggregate
function in plpython using one of the techniques in that file. I just
attempted to find the source to xm_agg in the postgresql source code and it
is pretty well obfuscated, so I don't see that being much help. I wasn't
even able to determine if the problem actually is immutable string
concatenation. So we don't know if xmlagg is building a DOM tree and then
serializing it once (which would imply that XMLELEMENT returns a single DOM
node, or if it is all workign with strings. Barring answers from someone
who actually knows, I can only suggest that you read through the
documentation on writing an aggregate function and then do some
experimentation to see what you get when you use your own aggregate function
instead of xml_agg. Hopefully, such experimentation won't take long to
determine if re-implementing xml_agg with a mutable buffer is a viable
option.

--sam

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Julius Tuskenis 2011-06-20 07:29:45 Re: generating a large XML document
Previous Message Pavel Stehule 2011-06-20 06:51:44 Re: generating a large XML document