Re: Highly Efficient Custom Sorting

From: Eliot Gable <egable+pgsql-performance(at)gmail(dot)com>
To: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
Cc: Craig James <craig_james(at)emolecules(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Highly Efficient Custom Sorting
Date: 2010-07-03 03:17:47
Message-ID: AANLkTimaUPkOn6OkCspsZ7iTiY9S6MFTrKTgFXKckl2u@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Well, I re-wrote the algorithm in Perl. However, it did not solve the speed
issue. Running time now is a whopping 240+ ms instead of the 31.8ms I was
getting before (15.2 of which is sorting). Here is the Perl code on the
sorting. I won't post the pl/pgsql code, because this is far more clear (in
my opinion) on what the algorithm does:

DROP TYPE IF EXISTS glbtype CASCADE;
CREATE TYPE glbtype AS (
id INTEGER,
"group" TEXT,
priority INTEGER,
weight INTEGER
);

DROP TYPE IF EXISTS glbtype_result CASCADE;
CREATE TYPE glbtype_result AS (
id INTEGER,
priority INTEGER,
weight INTEGER,
"order" BIGINT
);

CREATE OR REPLACE FUNCTION GroupedRandomWeightedLB(glbtype[]) RETURNS SETOF
glbtype_result AS
$BODY$
# Input is an array of a composite type
my ($input) = @_;
my %groups;
$input =~ s/^{|}$//g;
$input =~ s/[)(]//g;
my @rows;
my $count = 0;
while ($input && $count < 10000) {
my ($id, $group, $prio, $weight, @rest) = split(/,/, $input);
push(@rows, {id => $id, group => $group, priority => $prio, weight =>
$weight});
$count++;
$input = join(',', @rest);
}

if(scalar @rows < 1) {
elog(NOTICE, ' No rows sent for sorting.');
return undef;
} else {
elog(NOTICE, ' '.(scalar @rows).' rows sent for sorting.');
}

foreach $rw (@rows) {
if($rw->{group} && $rw->{priority} && $rw->{weight}) {
push( @{ $groups{$rw->{group}}{$rw->{priority}} }, $rw);
elog(NOTICE, ' Pushing '.$rw->{group}.' with prio ('.$rw->{priority}.'),
weight ('.$rw->{weight}.') onto array.');
} else {
elog(NOTICE, ' Invalid sort row: Group ('.$rw->{group}.'), Prio
('.$rw->{priority}.'), Weight ('.$rw->{weight}.')');
}
}

foreach $group (keys %groups) {
elog(NOTICE, ' Sorting group '.$group.'...');
foreach $prio (keys %{$groups{$group}}) {
my @rows = @{ $groups{$group}{$prio} };
elog(NOTICE, ' Sorting '.(scalar @rows).' rows in priority
'.$prio.'...');
my @zeros;
my @nonzeros;
my $total_weight = 0;
my $row_order = 1;
for($row_id = 0; $row_id < scalar @rows; $row_id++) {
my $row = $rows[$row_id];
$total_weight += $row->{weight};
elog(NOTICE, ' Total Weight ('.$total_weight.')');
if($row->{weight} == 0) {
push(@zeros, $row);
} else {
push(@nonzeros, $row);
}
}
my @first_order = (@zeros, @nonzeros);
undef(@zeros);
undef(@nonzeros);
while(scalar @first_order) {
elog(NOTICE, ' '.(scalar @first_order).' items remaining ...');
my $rand = int(rand($total_weight));
elog(NOTICE, ' Random weight ('.$rand.')');
my $running_weight = 0;
for($row_id = 0; $row_id < scalar @first_order; $row_id++) {
my $row = $first_order[$row_id];
$running_weight += $row->{weight};
elog(NOTICE, ' Running weight ('.$running_weight.') Current Weight
('.$row->{weight}.')');
if($running_weight >= $rand) {
elog(NOTICE, ' : Priority ('.($row->{priority}).') Weight
('.($row->{weight}).')');
return_next(
{ id => int($row->{id}),
priority => int($row->{priority}),
weight => int($row->{weight}),
order => int($row_order) }
);
$row_order++;
splice(@first_order, $row_id, 1);
# Recalculate total weight
$total_weight = 0;
foreach $row (@first_order) {
$total_weight += $row->{weight};
}
elog(NOTICE, ' : Remaining Weight ('.$total_weight.')');
break;
}
}
}
}
}
return undef;
$BODY$
LANGUAGE plperl VOLATILE;

5 rows sent for sorting.
Pushing GROUP_7 with prio (1), weight (0) onto array.
Pushing GROUP_7 with prio (1), weight (5) onto array.
Pushing GROUP_8 with prio (1), weight (1) onto array.
Pushing GROUP_8 with prio (1), weight (5) onto array.
Pushing GROUP_8 with prio (1), weight (5) onto array.
Sorting group GROUP_7...
Sorting 2 rows in priority 1...
Total Weight (0)
Total Weight (5)
2 items remaining ...
Random weight (0)
Running weight (0) Current Weight (0)
: Priority (1) Weight (0)
: Remaining Weight (5)
1 items remaining ...
Random weight (0)
Running weight (5) Current Weight (5)
: Priority (1) Weight (5)
: Remaining Weight (0)
Sorting group GROUP_8...
Sorting 3 rows in priority 1...
Total Weight (1)
Total Weight (6)
Total Weight (11)
3 items remaining ...
Random weight (8)
Running weight (1) Current Weight (1)
Running weight (6) Current Weight (5)
Running weight (11) Current Weight (5)
: Priority (1) Weight (5)
: Remaining Weight (6)
2 items remaining ...
Random weight (2)
Running weight (1) Current Weight (1)
Running weight (6) Current Weight (5)
: Priority (1) Weight (5)
: Remaining Weight (1)
1 items remaining ...
Random weight (0)
Running weight (1) Current Weight (1)
: Priority (1) Weight (1)
: Remaining Weight (0)

2 rows sent for sorting.
Pushing GROUP_1 with prio (1), weight (0) onto array.
Pushing GROUP_1 with prio (2), weight (4) onto array.
Sorting group GROUP_1...
Sorting 1 rows in priority 1...
Total Weight (0)
1 items remaining ...
Random weight (0)
Running weight (0) Current Weight (0)
: Priority (1) Weight (0)
: Remaining Weight (0)
Sorting 1 rows in priority 2...
Total Weight (4)
1 items remaining ...
Random weight (2)
Running weight (4) Current Weight (4)
: Priority (2) Weight (4)
: Remaining Weight (0)

Total runtime: 244.101 ms

On Fri, Jul 2, 2010 at 9:44 PM, Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>wrote:

> On 03/07/10 00:36, Craig James wrote:
>
> > Perl itself is written in C, and some of it's operations are extremely
> > fast.
>
> The same is true of PL/PgSQL, though ;-)
>
> The main advantage of PL/Perl is that it doesn't rely on the SPI to do
> everything. It's interpreted not compiled, but it has a much faster
> approach to interpretation than PL/PgSQL.
>
> Really, the right choice depends on exactly what the OP is doing and
> how, which they're not saying.
>
> Where's the code?
>
> --
> Craig Ringer
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our
children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from
our children, we're stealing from them--and it's not even considered to be a
crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not
live to eat.) ~Marcus Tullius Cicero

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Merlin Moncure 2010-07-03 18:08:35 Re: Highly Efficient Custom Sorting
Previous Message Craig Ringer 2010-07-03 01:44:38 Re: Highly Efficient Custom Sorting