Re: Processing very large TEXT columns (300MB+) using C/libpq

From: Cory Nemelka <cnemelka(at)gmail(dot)com>
To: Geoff Winkless <pgsqladmin(at)geoff(dot)dj>
Cc: "pgsql-admin(at)postgresql(dot)org" <pgsql-admin(at)postgresql(dot)org>
Subject: Re: Processing very large TEXT columns (300MB+) using C/libpq
Date: 2017-10-20 15:54:51
Message-ID: CAMe5Gn0T7OtneYReFCvpSqhaTSGvO0Ujf+RjKVX_eeaRRgEPQg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-admin

I'll take out all the code that isn't directly related to reading the data
and see if that helps. That was next step I intended anyway.

thank you for the reply

--cnemelka

On Fri, Oct 20, 2017 at 9:43 AM, Geoff Winkless <pgsqladmin(at)geoff(dot)dj> wrote:

> It's probably worth removing the iterating code Just In Case.
>
> Apologies for egg-suck-education, but I assume you're not doing something
> silly like
>
> for (i=0; i < strlen(bigtextstring); i++) {
> ....
> }
>
> I know it sounds stupid, but you'd be amazed how many times that crops up,
> and for small strings it doesn't matter, but for large strings it's
> catastrophic.
>
> Geoff
>
> On 20 October 2017 at 16:16, Cory Nemelka <cnemelka(at)gmail(dot)com> wrote:
>
>> All I am am doing is iterating through the characters so I know it isn't
>> my code.
>>
>> --cnemelka
>>
>> On Fri, Oct 20, 2017 at 9:14 AM, Cory Nemelka <cnemelka(at)gmail(dot)com> wrote:
>>
>>> Yes, but I should be able to read them much faster. The psql client can
>>> display an 11MB column in a little over a minute, while in C using libpg
>>> library, it takes over an hour.
>>>
>>> Anyone have any experience with the same issue that can help me resolve?
>>>
>>> --cnemelka
>>>
>>> On Thu, Oct 19, 2017 at 5:20 PM, Aldo Sarmiento <aldo(at)bigpurpledot(dot)com>
>>> wrote:
>>>
>>>> I believe large columns get put into a TOAST table. Max page size is
>>>> 8k. So you'll have lots of pages per row that need to be joined with a size
>>>> like that: https://www.postgresql.org/docs/9.5/static/storage-toa
>>>> st.html
>>>>
>>>> *Aldo Sarmiento*
>>>> President & CTO
>>>>
>>>>
>>>>
>>>> 8687 Research Dr
>>>> <https://maps.google.com/?q=8687+Research+Dr,+Irvine,+CA+92618&entry=gmail&source=g>,
>>>> Irvine, CA 92618
>>>> <https://maps.google.com/?q=8687+Research+Dr,+Irvine,+CA+92618&entry=gmail&source=g>
>>>> *O*: (949) 223-0900 - *F: *(949) 727-4265
>>>> aldo(at)bigpurpledot(dot)com | www.bigpurpledot.com
>>>>
>>>> On Thu, Oct 19, 2017 at 2:03 PM, Cory Nemelka <cnemelka(at)gmail(dot)com>
>>>> wrote:
>>>>
>>>>> I have getting very poor performance using libpq to process very large
>>>>> TEXT columns (300MB+). I suspect it is IO related but can't be sure.
>>>>>
>>>>> Anyone had experience with same issue that can help me resolve?
>>>>>
>>>>> --cnemelka
>>>>>
>>>>
>>>>
>>>
>>
>

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Tom Lane 2017-10-20 16:55:22 Re: Processing very large TEXT columns (300MB+) using C/libpq
Previous Message Geoff Winkless 2017-10-20 15:43:42 Re: Processing very large TEXT columns (300MB+) using C/libpq