Quick Links

Re: overload

From:	Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To:	lists-pgsql(at)useunix(dot)net
Cc:	pgsql-sql(at)postgresql(dot)org, pavel(dot)stehule(at)gmail(dot)com
Subject:	Re: overload
Date:	2011-07-11 14:42:32
Message-ID:	CAJu1cLbPwGCFuFVUJJRDvKzVqUFZWscvaMCHjpczrdMu8D0T7A@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-sql

Thanx Wayne and Pavel,

i will try to make this work with cursors. Theoretically there can be more
than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form
%patterns into it after each 500k records, and group them at the end.

I didn't manage to find example where plpgsql uses hash arrays or where
plperl uses cursors.
Any of these methods links/examples would be helpful to me.

Thanx in advance

On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql(at)useunix(dot)net> wrote:

> Hi Viktor,
>
> I'm not sure what your requirements are in terms of performance and
> stability of the your result set. See Pavel's response. A cursor issues
> a single query and renders a single result set. The result set is
> static, the cursor just gives you finer control/performance when
> retrieving rows from the set. Using a transaction will also render better
> performance when %patterns contains a large number of keys/values,
> insert all of them in one transaction, the same one you opened for the
> cursor.
>
> Your method issues many queries and will take longer for each successive
> query. And the number of queries will increase as table size increases.
> It could also return duplicate rows and/or missed rows due to other
> transactions completing between your select query.
>
> If you can tolerate the above issues then so be it, if not you really
> should look at cursors.
>
> Also there might be a bug in your code if you delete entries from
> 'entry'. Your depending on $rowCountAll to remain static which is not the
> case if you ever delete entries. You can fix this by skipping the
> "select count(1)" step and just breaking your loop when less then
> $windowSize entries are returned from the "select sequence.." query.
>
> Wayne
>
>
> On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> > Thanx Wayne,
> > at the end i did it that way and it works.
> > The code is below.
> >
> > CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> > varying
> > LANGUAGE plperl
> > AS $_X$
> > my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> > #my $rowCountAll = $rvCnt->{processed};
> > my $row = $rvCnt->{rows}[0];
> > my $rowCountAll = $row->{cnt};
> > my $windowSize = 500000;
> > my %patterns=();
> > for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> > my $sql="select sequence from entry limit $windowSize offset $p";
> >
> > my $rv = spi_exec_query($sql);
> > my $rowCount = $rv->{processed};
> > my $patLen = $_[0];
> > my $patt = '';
> >
> > foreach my $rn (0 .. $rowCount -1){
> > my $row = $rv->{rows}[$rn];
> > my $seq = $row->{sequence};
> > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > $patt=substr($seq,$x,$patLen);
> > if (! defined $patterns{$patt}) {
> > $patterns{$patt}=1;
> > }else{
> > $patterns{$patt}++;
> > }
> > }
> > }
> > }
> >
> > foreach $patt (keys %patterns){
> > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> > spi_exec_query($sql);
> > }
> > return $tmp;
> > $_X$;
> >
> >
> > On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql(at)useunix(dot)net> wrote:
> >
> > > I'm have the same situation with large tables. Take a look at using a
> > > cursor to fetch several thousand rows at a time. I presume what's
> > > happening is that perl is attempting to create a massive list/array in
> > > memory. If you use a cursor the list should only contain X number of
> > > rows where X in the number specified at each fetch execution. You'll
> > > need to define the cursor inside a transaction block.
> > >
> > > - begin transaction
> > > - define the cursor
> > > - fetch rows from cursor
> > > - while row count from previous step > 0, execute previous step
> > > - terminate transaction
> > >
> > > Or you could use plpgsql instead of plperl, FOR loops over result sets
> in
> > > plpgsql implicitly use cursors... it's just a little less code.
> > >
> > > Hope that helps,
> > > Wayne
> > >
> > > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > > Hi,
> > > > while reading 20GB table through PL/PERL function , it constantly
> grows
> > > in
> > > > RAM.
> > > > I wanted to ask you which is the best way to read table inside that
> > > > function without such memory consumption.
> > > > Thanks in advance
> > > >
> > > > Code is here:
> > > >
> > > > CREATE FUNCTION pattern_counter("patLength" integer)
> > > > RETURNS varchar AS
> > > > $BODY$
> > > > my $rv = spi_exec_query("select sequence from entry");
> > > > my $rowCount = $rv->{processed};
> > > > my $patLen = $_[0];
> > > > my $patt = '';
> > > > my %patterns=();
> > > > foreach my $rn (0 .. $rowCount -1){
> > > > my $row = $rv->{rows}[$rn];
> > > > my $seq = $row->{sequence};
> > > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > > > $patt=substr($seq,$x,$patLen);
> > > > if (! defined $patterns{$patt}) {
> > > > $patterns{$patt}=1;
> > > > }else{
> > > > $patterns{$patt}++;
> > > > }
> > > > }
> > > > }
> > > > foreach $patt (keys %patterns){
> > > > my $sql="insert into patterns
> > > values('".$patt."',".$patterns{$patt}.")";
> > > > spi_exec_query($sql);
> > > > }
> > > > return '';
> > > > $BODY$
> > > > LANGUAGE plperl VOLATILE
> > > > COST 100;
> > > >
> > > >
> > > >
> > > > --
> > > > ---------------------------------------
> > > > Viktor Bojovi??
> > > > ---------------------------------------
> > > > Wherever I go, Murphy goes with me
> > >
> >
> >
> >
> > --
> > ---------------------------------------
> > Viktor Bojovi??
> > ---------------------------------------
> > Wherever I go, Murphy goes with me
>

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

In response to

Re: overload at 2011-07-08 19:49:49 from lists-pgsql

Responses

Re: overload at 2011-07-11 22:56:25 from Viktor Bojović

Browse pgsql-sql by date

	From	Date	Subject
Next Message	Viktor Bojović	2011-07-11 22:56:25	Re: overload
Previous Message	Gavin Flower	2011-07-10 22:48:03	Re: [SQL] using explain output within pgsql