Lots and lots of strdup's (bug #4079)

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Lots and lots of strdup's (bug #4079)
Date: 2008-04-01 20:37:00
Message-ID: 13958.1207082220@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into the complaint here
http://archives.postgresql.org/pgsql-bugs/2008-04/msg00005.php
about 8.3 being a lot slower than 8.2. Apparently what he's
doing is sending a whole lot of INSERT commands in a single
query string. And, sure enough, 8.3 is a lot slower. The
oprofile output is, um, localized:

samples % image name symbol name
749240 48.0999 libc-2.7.so memcpy
392513 25.1986 libc-2.7.so strlen
331905 21.3077 libc-2.7.so memset
5302 0.3404 postgres AllocSetCheck
3548 0.2278 postgres AllocSetAlloc
3507 0.2251 postgres base_yyparse
3160 0.2029 postgres hash_search_with_hash_value
2068 0.1328 postgres SearchCatCache
1737 0.1115 postgres base_yylex
1606 0.1031 postgres XLogInsert

I eventually traced this down to the strdup's that were inserted into
PortalDefineQuery() in this patch:
http://archives.postgresql.org/pgsql-committers/2007-03/msg00098.php
that is, the cost differential is entirely because we started copying
a Portal's source query text into the Portal's own memory. In the
example at hand here, the source query string is big (about a quarter
megabyte for 50000 INSERTSs), and we do that copy 50000 times.
Even though strdup is cheap on a per-byte basis, the O(N^2) law
eventually catches up.

Although you could argue that this example represents crummy SQL
coding style, it's still a performance regression from pre-8.3,
so I think we need to fix it.

It seems to me to be clearly necessary to copy the source text into
the Portal if the Portal is going to be long-lived ... but in the
case of simple-Query execution we know darn well that the original
string in MessageContext is going to outlive the Portal, so the
copying isn't really needed --- and this seems like the only code
path where the problem exists. In other cases a single querystring
isn't likely (or, usually, even able) to contain more than one command
so no repeat copying will occur.

So I'm inclined to revert the decision I made in that patch that
PortalDefineQuery() should copy the strings rather than expecting
the caller to be responsible for providing a suitably long-lived
string. We could handle this two ways:
* Put the strdup operations back into the callers that need them,
ie just revert the logic change.
* Add an additional bool parameter to PortalDefineQuery to tell it
whether the strings need to be copied.

The second option seems a bit cleaner to me but might conceivably break
third party code, if there is any that calls PortalDefineQuery.
OTOH, if anyone out there has started to depend on the assumption
that PortalDefineQuery will copy their strings, the first option
would break their code silently, which is even worse than breaking
it obviously.

Thoughts?

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2008-04-01 21:01:38 Re: Scroll cursor oddity...
Previous Message Gregory Stark 2008-04-01 19:20:51 Re: Scroll cursor oddity...