Обсуждение: overload

Поиск
Список
Период
Сортировка

overload

От
Viktor Bojović
Дата:
Hi,<br />while reading 20GB table through PL/PERL function , it constantly grows in RAM. <br />I wanted to ask you
whichis the best way to read table inside that  function without such memory consumption.<br />Thanks in advance<br
/><br/>Code is here:<br /><br />CREATE  FUNCTION pattern_counter("patLength" integer)<br />  RETURNS varchar AS<br
/>$BODY$<br/>    my $rv = spi_exec_query("select sequence from entry");<br />    my $rowCount = $rv->{processed};<br
/>    my $patLen = $_[0];<br />    my $patt = '';<br />    my %patterns=();<br />    foreach my $rn (0 .. $rowCount
-1){<br/>    my $row = $rv->{rows}[$rn];<br />    my $seq = $row->{sequence};<br />    for (my $x =
1;$x<=length($seq)- $patLen;$x++){<br />         $patt=substr($seq,$x,$patLen);<br />        if (! defined
$patterns{$patt}){<br />        $patterns{$patt}=1;<br />        }else{<br />        $patterns{$patt}++;<br />       
}<br/>    }<br />    }<br />    foreach $patt (keys %patterns){<br />     my $sql="insert into patterns
values('".$patt."',".$patterns{$patt}.")";<br/>    spi_exec_query($sql);<br />    }<br />return '';<br />$BODY$<br /> 
LANGUAGEplperl VOLATILE<br />  COST 100;<br /><br /><br clear="all" /><br />-- <br
/>---------------------------------------<br/>Viktor Bojović<br />---------------------------------------<br />Wherever
Igo, Murphy goes with me<br /> 

Re: overload

От
lists-pgsql@useunix.net
Дата:
I'm have the same situation with large tables.  Take a look at using a
cursor to fetch several thousand rows at a time.  I presume what's
happening is that perl is attempting to create a massive list/array in
memory.  If you use a cursor the list should only contain X number of
rows where X in the number specified at each fetch execution.  You'll
need to define the cursor inside a transaction block.

- begin transaction
- define the cursor
- fetch rows from cursor
- while row count from previous step > 0, execute previous step
- terminate transaction

Or you could use plpgsql instead of plperl, FOR loops over result sets in
plpgsql implicitly use cursors... it's just a little less code.

Hope that helps,
Wayne

On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> Hi,
> while reading 20GB table through PL/PERL function , it constantly grows in
> RAM.
> I wanted to ask you which is the best way to read table inside that
> function without such memory consumption.
> Thanks in advance
> 
> Code is here:
> 
> CREATE  FUNCTION pattern_counter("patLength" integer)
>   RETURNS varchar AS
> $BODY$
>     my $rv = spi_exec_query("select sequence from entry");
>     my $rowCount = $rv->{processed};
>     my $patLen = $_[0];
>     my $patt = '';
>     my %patterns=();
>     foreach my $rn (0 .. $rowCount -1){
>     my $row = $rv->{rows}[$rn];
>     my $seq = $row->{sequence};
>     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
>         $patt=substr($seq,$x,$patLen);
>         if (! defined $patterns{$patt}) {
>         $patterns{$patt}=1;
>         }else{
>         $patterns{$patt}++;
>         }
>     }
>     }
>     foreach $patt (keys %patterns){
>     my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
>     spi_exec_query($sql);
>     }
> return '';
> $BODY$
>   LANGUAGE plperl VOLATILE
>   COST 100;
> 
> 
> 
> -- 
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me


Re: overload

От
Viktor Bojović
Дата:
Thanx Wayne,
at the end i did it that way and it works. 
The code is below.

CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character varying
LANGUAGE plperl
AS $_X$
my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
#my $rowCountAll = $rvCnt->{processed};
my $row = $rvCnt->{rows}[0];
my $rowCountAll = $row->{cnt};
my $windowSize = 500000;
my %patterns=();
for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
my $sql="select sequence from entry limit $windowSize offset $p";

my $rv = spi_exec_query($sql);
my $rowCount = $rv->{processed};
my $patLen = $_[0];
my $patt = '';

foreach my $rn (0 .. $rowCount -1){
my $row = $rv->{rows}[$rn];
my $seq = $row->{sequence};
for (my $x = 1;$x<=length($seq) - $patLen;$x++){
$patt=substr($seq,$x,$patLen);
if (! defined $patterns{$patt}) {
$patterns{$patt}=1;
}else{
$patterns{$patt}++;
}
}
}
}

foreach $patt (keys %patterns){
my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
spi_exec_query($sql);
}
return $tmp;
$_X$;


On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
I'm have the same situation with large tables.  Take a look at using a
cursor to fetch several thousand rows at a time.  I presume what's
happening is that perl is attempting to create a massive list/array in
memory.  If you use a cursor the list should only contain X number of
rows where X in the number specified at each fetch execution.  You'll
need to define the cursor inside a transaction block.

- begin transaction
- define the cursor
- fetch rows from cursor
- while row count from previous step > 0, execute previous step
- terminate transaction

Or you could use plpgsql instead of plperl, FOR loops over result sets in
plpgsql implicitly use cursors... it's just a little less code.

Hope that helps,
Wayne

On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> Hi,
> while reading 20GB table through PL/PERL function , it constantly grows in
> RAM.
> I wanted to ask you which is the best way to read table inside that
> function without such memory consumption.
> Thanks in advance
>
> Code is here:
>
> CREATE  FUNCTION pattern_counter("patLength" integer)
>   RETURNS varchar AS
> $BODY$
>     my $rv = spi_exec_query("select sequence from entry");
>     my $rowCount = $rv->{processed};
>     my $patLen = $_[0];
>     my $patt = '';
>     my %patterns=();
>     foreach my $rn (0 .. $rowCount -1){
>     my $row = $rv->{rows}[$rn];
>     my $seq = $row->{sequence};
>     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
>         $patt=substr($seq,$x,$patLen);
>         if (! defined $patterns{$patt}) {
>         $patterns{$patt}=1;
>         }else{
>         $patterns{$patt}++;
>         }
>     }
>     }
>     foreach $patt (keys %patterns){
>     my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
>     spi_exec_query($sql);
>     }
> return '';
> $BODY$
>   LANGUAGE plperl VOLATILE
>   COST 100;
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me



--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

Re: overload

От
Pavel Stehule
Дата:
Hello

using a "window" implemented via LIMIT OFFSET is not good - it is
solution on some systems where cursors are not available, but it is
bad solution on PostgreSQL. Use a cursor instead - it is significantly
more efective with less memory requests.

Regards

Pavel Stehule

2011/7/8 Viktor Bojović <viktor.bojovic@gmail.com>:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
>>
>> I'm have the same situation with large tables.  Take a look at using a
>> cursor to fetch several thousand rows at a time.  I presume what's
>> happening is that perl is attempting to create a massive list/array in
>> memory.  If you use a cursor the list should only contain X number of
>> rows where X in the number specified at each fetch execution.  You'll
>> need to define the cursor inside a transaction block.
>>
>> - begin transaction
>> - define the cursor
>> - fetch rows from cursor
>> - while row count from previous step > 0, execute previous step
>> - terminate transaction
>>
>> Or you could use plpgsql instead of plperl, FOR loops over result sets in
>> plpgsql implicitly use cursors... it's just a little less code.
>>
>> Hope that helps,
>> Wayne
>>
>> On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
>> > Hi,
>> > while reading 20GB table through PL/PERL function , it constantly grows
>> > in
>> > RAM.
>> > I wanted to ask you which is the best way to read table inside that
>> > function without such memory consumption.
>> > Thanks in advance
>> >
>> > Code is here:
>> >
>> > CREATE  FUNCTION pattern_counter("patLength" integer)
>> >   RETURNS varchar AS
>> > $BODY$
>> >     my $rv = spi_exec_query("select sequence from entry");
>> >     my $rowCount = $rv->{processed};
>> >     my $patLen = $_[0];
>> >     my $patt = '';
>> >     my %patterns=();
>> >     foreach my $rn (0 .. $rowCount -1){
>> >     my $row = $rv->{rows}[$rn];
>> >     my $seq = $row->{sequence};
>> >     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
>> >         $patt=substr($seq,$x,$patLen);
>> >         if (! defined $patterns{$patt}) {
>> >         $patterns{$patt}=1;
>> >         }else{
>> >         $patterns{$patt}++;
>> >         }
>> >     }
>> >     }
>> >     foreach $patt (keys %patterns){
>> >     my $sql="insert into patterns
>> > values('".$patt."',".$patterns{$patt}.")";
>> >     spi_exec_query($sql);
>> >     }
>> > return '';
>> > $BODY$
>> >   LANGUAGE plperl VOLATILE
>> >   COST 100;
>> >
>> >
>> >
>> > --
>> > ---------------------------------------
>> > Viktor Bojovi??
>> > ---------------------------------------
>> > Wherever I go, Murphy goes with me
>
>
>
> --
> ---------------------------------------
> Viktor Bojović
> ---------------------------------------
> Wherever I go, Murphy goes with me
>


Re: overload

От
lists-pgsql@useunix.net
Дата:
Hi Viktor,

I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.

Your method issues many queries and will take longer for each successive
query.  And the number of queries will increase as table size increases. 
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.

If you can tolerate the above issues then so be it, if not you really
should look at cursors.

Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.

Wayne


On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
> 
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
> 
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
> 
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
> 
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
> 
> 
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
> 
> > I'm have the same situation with large tables.  Take a look at using a
> > cursor to fetch several thousand rows at a time.  I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory.  If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution.  You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE  FUNCTION pattern_counter("patLength" integer)
> > >   RETURNS varchar AS
> > > $BODY$
> > >     my $rv = spi_exec_query("select sequence from entry");
> > >     my $rowCount = $rv->{processed};
> > >     my $patLen = $_[0];
> > >     my $patt = '';
> > >     my %patterns=();
> > >     foreach my $rn (0 .. $rowCount -1){
> > >     my $row = $rv->{rows}[$rn];
> > >     my $seq = $row->{sequence};
> > >     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > >         $patt=substr($seq,$x,$patLen);
> > >         if (! defined $patterns{$patt}) {
> > >         $patterns{$patt}=1;
> > >         }else{
> > >         $patterns{$patt}++;
> > >         }
> > >     }
> > >     }
> > >     foreach $patt (keys %patterns){
> > >     my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > >     spi_exec_query($sql);
> > >     }
> > > return '';
> > > $BODY$
> > >   LANGUAGE plperl VOLATILE
> > >   COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
> 
> 
> 
> -- 
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me


Re: overload

От
Viktor Bojović
Дата:
Thanx Wayne and Pavel,

i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.

I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors.
Any of these methods links/examples would be helpful to me.

Thanx in advance

On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql@useunix.net> wrote:
Hi Viktor,

I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.

Your method issues many queries and will take longer for each successive
query.  And the number of queries will increase as table size increases.
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.

If you can tolerate the above issues then so be it, if not you really
should look at cursors.

Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.

Wayne


On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
>
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
>
> > I'm have the same situation with large tables.  Take a look at using a
> > cursor to fetch several thousand rows at a time.  I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory.  If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution.  You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE  FUNCTION pattern_counter("patLength" integer)
> > >   RETURNS varchar AS
> > > $BODY$
> > >     my $rv = spi_exec_query("select sequence from entry");
> > >     my $rowCount = $rv->{processed};
> > >     my $patLen = $_[0];
> > >     my $patt = '';
> > >     my %patterns=();
> > >     foreach my $rn (0 .. $rowCount -1){
> > >     my $row = $rv->{rows}[$rn];
> > >     my $seq = $row->{sequence};
> > >     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > >         $patt=substr($seq,$x,$patLen);
> > >         if (! defined $patterns{$patt}) {
> > >         $patterns{$patt}=1;
> > >         }else{
> > >         $patterns{$patt}++;
> > >         }
> > >     }
> > >     }
> > >     foreach $patt (keys %patterns){
> > >     my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > >     spi_exec_query($sql);
> > >     }
> > > return '';
> > > $BODY$
> > >   LANGUAGE plperl VOLATILE
> > >   COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me



--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me

Re: overload

От
Viktor Bojović
Дата:
I have found cursors example in plperl. now it works fine. the code is below.

CREATE OR REPLACE FUNCTION uniprot_frekvencije.pattern_counter2("patLength" integer)
  RETURNS character varying AS
$BODY$
    my $sth = spi_query("select sequence from entry");
    my $patLen = $_[0];
    my $patt = '';
    my $row;

    my %patterns=();
    while (defined ($row = spi_fetchrow($sth))) {
    my $seq = $row->{sequence};
    for (my $x = 0;$x<=length($seq) - $patLen;$x++){
        $patt=substr($seq,$x,$patLen);
        if (! defined $patterns{$patt}) {
        $patterns{$patt}=1;
        }else{
        $patterns{$patt}++;
        }
    }
    }
    foreach $patt (keys %patterns){
    my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
    spi_exec_query($sql);
    }
return '';
$BODY$
  LANGUAGE plperl VOLATILE
  COST 100;

On Mon, Jul 11, 2011 at 4:42 PM, Viktor Bojović <viktor.bojovic@gmail.com> wrote:
Thanx Wayne and Pavel,

i will try to make this work with cursors. Theoretically there can be more than 4 bilion records in result, but only 58mil returned which took 3 days.
Possibly i will have to create temporary table and store results form %patterns into it after each 500k records, and group them at the end.

I didn't manage to find example where plpgsql uses hash arrays or where plperl uses cursors.
Any of these methods links/examples would be helpful to me.

Thanx in advance


On Fri, Jul 8, 2011 at 9:49 PM, <lists-pgsql@useunix.net> wrote:
Hi Viktor,

I'm not sure what your requirements are in terms of performance and
stability of the your result set. See Pavel's response. A cursor issues
a single query and renders a single result set. The result set is
static, the cursor just gives you finer control/performance when
retrieving rows from the set. Using a transaction will also render better
performance when %patterns contains a large number of keys/values,
insert all of them in one transaction, the same one you opened for the
cursor.

Your method issues many queries and will take longer for each successive
query.  And the number of queries will increase as table size increases.
It could also return duplicate rows and/or missed rows due to other
transactions completing between your select query.

If you can tolerate the above issues then so be it, if not you really
should look at cursors.

Also there might be a bug in your code if you delete entries from
'entry'. Your depending on $rowCountAll to remain static which is not the
case if you ever delete entries. You can fix this by skipping the
"select count(1)" step and just breaking your loop when less then
$windowSize entries are returned from the "select sequence.." query.

Wayne


On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote:
> Thanx Wayne,
> at the end i did it that way and it works.
> The code is below.
>
> CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character
> varying
> LANGUAGE plperl
> AS $_X$
> my $rvCnt = spi_exec_query("select count(1) as cnt from entry");
> #my $rowCountAll = $rvCnt->{processed};
> my $row = $rvCnt->{rows}[0];
> my $rowCountAll = $row->{cnt};
> my $windowSize = 500000;
> my %patterns=();
> for (my $p=0;$p<$rowCountAll;$p+=$windowSize){
> my $sql="select sequence from entry limit $windowSize offset $p";
>
> my $rv = spi_exec_query($sql);
> my $rowCount = $rv->{processed};
> my $patLen = $_[0];
> my $patt = '';
>
> foreach my $rn (0 .. $rowCount -1){
> my $row = $rv->{rows}[$rn];
> my $seq = $row->{sequence};
> for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> $patt=substr($seq,$x,$patLen);
> if (! defined $patterns{$patt}) {
> $patterns{$patt}=1;
> }else{
> $patterns{$patt}++;
> }
> }
> }
> }
>
> foreach $patt (keys %patterns){
> my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")";
> spi_exec_query($sql);
> }
> return $tmp;
> $_X$;
>
>
> On Fri, Jul 8, 2011 at 8:50 PM, <lists-pgsql@useunix.net> wrote:
>
> > I'm have the same situation with large tables.  Take a look at using a
> > cursor to fetch several thousand rows at a time.  I presume what's
> > happening is that perl is attempting to create a massive list/array in
> > memory.  If you use a cursor the list should only contain X number of
> > rows where X in the number specified at each fetch execution.  You'll
> > need to define the cursor inside a transaction block.
> >
> > - begin transaction
> > - define the cursor
> > - fetch rows from cursor
> > - while row count from previous step > 0, execute previous step
> > - terminate transaction
> >
> > Or you could use plpgsql instead of plperl, FOR loops over result sets in
> > plpgsql implicitly use cursors... it's just a little less code.
> >
> > Hope that helps,
> > Wayne
> >
> > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote:
> > > Hi,
> > > while reading 20GB table through PL/PERL function , it constantly grows
> > in
> > > RAM.
> > > I wanted to ask you which is the best way to read table inside that
> > > function without such memory consumption.
> > > Thanks in advance
> > >
> > > Code is here:
> > >
> > > CREATE  FUNCTION pattern_counter("patLength" integer)
> > >   RETURNS varchar AS
> > > $BODY$
> > >     my $rv = spi_exec_query("select sequence from entry");
> > >     my $rowCount = $rv->{processed};
> > >     my $patLen = $_[0];
> > >     my $patt = '';
> > >     my %patterns=();
> > >     foreach my $rn (0 .. $rowCount -1){
> > >     my $row = $rv->{rows}[$rn];
> > >     my $seq = $row->{sequence};
> > >     for (my $x = 1;$x<=length($seq) - $patLen;$x++){
> > >         $patt=substr($seq,$x,$patLen);
> > >         if (! defined $patterns{$patt}) {
> > >         $patterns{$patt}=1;
> > >         }else{
> > >         $patterns{$patt}++;
> > >         }
> > >     }
> > >     }
> > >     foreach $patt (keys %patterns){
> > >     my $sql="insert into patterns
> > values('".$patt."',".$patterns{$patt}.")";
> > >     spi_exec_query($sql);
> > >     }
> > > return '';
> > > $BODY$
> > >   LANGUAGE plperl VOLATILE
> > >   COST 100;
> > >
> > >
> > >
> > > --
> > > ---------------------------------------
> > > Viktor Bojovi??
> > > ---------------------------------------
> > > Wherever I go, Murphy goes with me
> >
>
>
>
> --
> ---------------------------------------
> Viktor Bojovi??
> ---------------------------------------
> Wherever I go, Murphy goes with me



--
---------------------------------------
Viktor Bojović

---------------------------------------
Wherever I go, Murphy goes with me



--
---------------------------------------
Viktor Bojović
---------------------------------------
Wherever I go, Murphy goes with me