Re: overload

From: Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>
To: lists-pgsql(at)useunix(dot)net
Cc: pgsql-sql(at)postgresql(dot)org, pavel(dot)stehule(at)gmail(dot)com
Subject: Re: overload
Date: 2011-07-11 22:56:25
Message-ID: CAJu1cLa+Bi52cZ0UCZRvcK4C8=Rf3gA07g90t8aNLEXyx1RP3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

I have found cursors example in plperl. now it works fine. the code is
below.

CREATE OR REPLACE FUNCTION uniprot_frekvencije.pattern_counter2("patLength"
integer)
RETURNS character varying AS
$BODY$
my $sth = spi_query("select sequence from entry");
my $patLen = $_[0];
my $patt = '';
my $row;

my %patterns=();
while (defined ($row = spi_fetchrow($sth))) {
my $seq = $row->{sequence};
for (my $x = 0;$x<=length($seq) - $patLen;$x++){
$patt=substr($seq,$x,$patLen);
if (! defined $patterns{$patt}) {
$patterns{$patt}=1;
}else{
$patterns{$patt}++;
}
}
}
foreach $patt (keys %patterns){
my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
spi_exec_query($sql);
}
return '';
$BODY$
LANGUAGE plperl VOLATILE
COST 100;

On Mon, Jul 11, 2011 at 4:42 PM, Viktor Bojović <viktor(dot)bojovic(at)gmail(dot)com>wrote:

> Thanx Wayne and Pavel,
>
> i will try to make this work with cursors. Theoretically there can be more
> than 4 bilion records in result, but only 58mil returned which took 3 days.
> Possibly i will have to create temporary table and store results form
> %patterns into it after each 500k records, and group them at the end.
>
> I didn't manage to find example where plpgsql uses hash arrays or where
> plperl uses cursors.
> Any of these methods links/examples would be helpful to me.
>
> Thanx in advance
>
>
> On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql(at)useunix(dot)net> wrote:
>
>> Hi Viktor,
>>
>> I'm not sure what your requirements are in terms of performance and
>> stability of the your result set. See Pavel's response. A cursor issues
>> a single query and renders a single result set. The result set is
>> static, the cursor just gives you finer control/performance when
>> retrieving rows from the set. Using a transaction will also render better
>> performance when %patterns contains a large number of keys/values,
>> insert all of them in one transaction, the same one you opened for the
>> cursor.
>>
>> Your method issues many queries and will take longer for each successive
>> query. And the number of queries will increase as table size increases.
>> It could also return duplicate rows and/or missed rows due to other
>> transactions completing between your select query.
>>
>> If you can tolerate the above issues then so be it, if not you really
>> should look at cursors.
>>
>> Also there might be a bug in your code if you delete entries from
>> 'entry'. Your depending on $rowCountAll to remain static which is not the
>> case if you ever delete entries. You can fix this by skipping the
>> "select count(1)" step and just breaking your loop when less then
>> $windowSize entries are returned from the "select sequence.." query.
>>
>> Wayne
>>
>>
>> On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
>> > Thanx Wayne,
>> > at the end i did it that way and it works.
>> > The code is below.
>> >
>> > CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
>> > varying
>> > LANGUAGE plperl
>> > AS $_X$
>> > my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
>> > #my $rowCountAll = $rvCnt->{processed};
>> > my $row = $rvCnt->{rows}[0];
>> > my $rowCountAll = $row->{cnt};
>> > my $windowSize = 500000;
>> > my %patterns=();
>> > for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
>> > my $sql="select sequence from entry limit $windowSize offset $p";
>> >
>> > my $rv = spi_exec_query($sql);
>> > my $rowCount = $rv->{processed};
>> > my $patLen = $_[0];
>> > my $patt = '';
>> >
>> > foreach my $rn (0 .. $rowCount -1){
>> > my $row = $rv->{rows}[$rn];
>> > my $seq = $row->{sequence};
>> > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
>> > $patt=substr($seq,$x,$patLen);
>> > if (! defined $patterns{$patt}) {
>> > $patterns{$patt}=1;
>> > }else{
>> > $patterns{$patt}++;
>> > }
>> > }
>> > }
>> > }
>> >
>> > foreach $patt (keys %patterns){
>> > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
>> > spi_exec_query($sql);
>> > }
>> > return $tmp;
>> > $_X$;
>> >
>> >
>> > On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql(at)useunix(dot)net> wrote:
>> >
>> > > I'm have the same situation with large tables. Take a look at using a
>> > > cursor to fetch several thousand rows at a time. I presume what's
>> > > happening is that perl is attempting to create a massive list/array in
>> > > memory. If you use a cursor the list should only contain X number of
>> > > rows where X in the number specified at each fetch execution. You'll
>> > > need to define the cursor inside a transaction block.
>> > >
>> > > - begin transaction
>> > > - define the cursor
>> > > - fetch rows from cursor
>> > > - while row count from previous step > 0, execute previous step
>> > > - terminate transaction
>> > >
>> > > Or you could use plpgsql instead of plperl, FOR loops over result sets
>> in
>> > > plpgsql implicitly use cursors... it's just a little less code.
>> > >
>> > > Hope that helps,
>> > > Wayne
>> > >
>> > > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
>> > > > Hi,
>> > > > while reading 20GB table through PL/PERL function , it constantly
>> grows
>> > > in
>> > > > RAM.
>> > > > I wanted to ask you which is the best way to read table inside that
>> > > > function without such memory consumption.
>> > > > Thanks in advance
>> > > >
>> > > > Code is here:
>> > > >
>> > > > CREATE FUNCTION pattern_counter("patLength" integer)
>> > > > RETURNS varchar AS
>> > > > $BODY$
>> > > > my $rv = spi_exec_query("select sequence from entry");
>> > > > my $rowCount = $rv->{processed};
>> > > > my $patLen = $_[0];
>> > > > my $patt = '';
>> > > > my %patterns=();
>> > > > foreach my $rn (0 .. $rowCount -1){
>> > > > my $row = $rv->{rows}[$rn];
>> > > > my $seq = $row->{sequence};
>> > > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){
>> > > > $patt=substr($seq,$x,$patLen);
>> > > > if (! defined $patterns{$patt}) {
>> > > > $patterns{$patt}=1;
>> > > > }else{
>> > > > $patterns{$patt}++;
>> > > > }
>> > > > }
>> > > > }
>> > > > foreach $patt (keys %patterns){
>> > > > my $sql="insert into patterns
>> > > values('".$patt."',".$patterns{$patt}.")";
>> > > > spi_exec_query($sql);
>> > > > }
>> > > > return '';
>> > > > $BODY$
>> > > > LANGUAGE plperl VOLATILE
>> > > > COST 100;
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > ---------------------------------------
>> > > > Viktor Bojovi??
>> > > > ---------------------------------------
>> > > > Wherever I go, Murphy goes with me
>> > >
>> >
>> >
>> >
>> > --
>> > ---------------------------------------
>> > Viktor Bojovi??
>> > ---------------------------------------
>> > Wherever I go, Murphy goes with me
>>
>
>
>
> --
> ---------------------------------------
> Viktor Bojović
>
> ---------------------------------------
> Wherever I go, Murphy goes with me
>

--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message B.Rathmann 2011-07-12 08:33:01 Looking for a "show create table <name>" equivalent
Previous Message Viktor Bojović 2011-07-11 14:42:32 Re: overload