Re: overload

From: lists-pgsql(at)useunix(dot)net
To: viktor(dot)bojovic(at)gmail(dot)com
Cc: pgsql-sql(at)postgresql(dot)org
Subject: Re: overload
Date: 2011-07-08 19:49:49
Message-ID: 20110708194949.GM13621@slacker.ja10629.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

Hi Viktor,

I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.

Your method issues many queries and will take longer for each successive
query. And the number of queries will increase as table size increases.
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.

If you can tolerate the above issues then so be it, if not you really
should look at cursors.

Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.

Wayne

On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
>
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql(at)useunix(dot)net> wrote:
>
> > I'm have the same situation with large tables. Take a look at using a
> > cursor to fetch several thousand rows at a time. I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory. If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution. You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE FUNCTION pattern_counter("patLength" integer)
> > > RETURNS varchar AS
> > > $BODY$
> > > my $rv = spi_exec_query("select sequence from entry");
> > > my $rowCount = $rv->{processed};
> > > my $patLen = $_[0];
> > > my $patt = '';
> > > my %patterns=();
> > > foreach my $rn (0 .. $rowCount -1){
> > > my $row = $rv->{rows}[$rn];
> > > my $seq = $row->{sequence};
> > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > > $patt=substr($seq,$x,$patLen);
> > > if (! defined $patterns{$patt}) {
> > > $patterns{$patt}=1;
> > > }else{
> > > $patterns{$patt}++;
> > > }
> > > }
> > > }
> > > foreach $patt (keys %patterns){
> > > my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > > spi_exec_query($sql);
> > > }
> > > return '';
> > > $BODY$
> > > LANGUAGE plperl VOLATILE
> > > COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me

In response to

Responses

Browse pgsql-sql by date

  From Date Subject
Next Message Uwe Bartels 2011-07-10 18:54:10 using explain output within pgsql
Previous Message Pavel Stehule 2011-07-08 19:18:15 Re: overload