Re: copy vs. C function

From: Jon Nelson <jnelson+pgsql(at)jamponi(dot)net>
To:
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: copy vs. C function
Date: 2011-12-14 14:06:09
Message-ID: CAKuK5J3VsY-1_4wzRZiYR_ExWVGhnMHYmkBZBxnvBxkMfqsL5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Wed, Dec 14, 2011 at 12:18 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Jon Nelson <jnelson+pgsql(at)jamponi(dot)net> writes:
>> The only thing I have left are these statements:
>
>> get_call_result_type
>> TupleDescGetAttInMetadata
>> BuildTupleFromCStrings
>> HeapTupleGetDatum
>> and finally PG_RETURN_DATUM
>
>> It turns out that:
>> get_call_result_type adds 43 seconds [total: 54],
>> TupleDescGetAttInMetadata adds 19 seconds [total: 73],
>> BuildTypleFromCStrings accounts for 43 seconds [total: 116].
>
>> So those three functions account for 90% of the total time spent.
>> What alternatives exist? Do I have to call get_call_result_type /every
>> time/ through the function?
>
> Well, if you're concerned about performance then I think you're going
> about this in entirely the wrong way, because as far as I can tell from
> this you're converting all the field values to text and back again.
> You should be trying to keep the values in Datum format and then
> invoking heap_form_tuple.  And yeah, you probably could cache the
> type information across calls.

The parsing/conversion (except BuildTupleFromCStrings) is only a small
fraction of the overall time spent in the function and could probably
be made slightly faster. It's the overhead that's killing me.

Remember: I'm not converting multiple field values to text and back
again, I'm turning a *single* TEXT into 8 columns of varying types
(INET, INTEGER, and one INTEGER array, among others). I'll re-write
the code to use Tuples but given that 53% of the time is spent in just
two functions (the two I'd like to cache) I'm not sure how much of a
gain it's likely to be.

Regarding caching, I tried caching it across calls by making the
TupleDesc static and only initializing it once.
When I tried that, I got:

ERROR: number of columns (6769856) exceeds limit (1664)

I tried to find some documentation or examples that cache the
information, but couldn't find any.

--
Jon

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message idc danny 2011-12-14 14:32:03 Re: copy vs. C function
Previous Message Kevin Martyn 2011-12-14 13:14:12 Re: copy vs. C function