Re: [PATCH] Hooks at XactCommand level

From: Gilles Darold <gilles(at)darold(dot)net>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Nicolas CHAHWEKILIAN <leptitstagiaire(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: [PATCH] Hooks at XactCommand level
Date: 2021-08-13 12:43:01
Message-ID: d3ca6093-156d-e8c3-fb92-bd892ca905ec@darold.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Le 13/08/2021 à 11:58, Andres Freund a écrit :
> Hi,
>
> On 2021-08-10 10:12:26 +0200, Gilles Darold wrote:
>> Sorry for the response delay. I have though about adding this odd hook to be
>> able to implement this feature through an extension because I don't think
>> this is something that should be implemented in core. There were also
>> patches proposals which were all rejected.
>>
>> We usually implement the feature at client side which is imo enough for the
>> use cases. But the problem is that this a catastrophe in term of
>> performances. I have done a small benchmark to illustrate the problem. This
>> is a single process client on the same host than the PG backend.
>>
>> For 10,000 tuples inserted with 50% of failures and rollback at statement
>> level handled at client side:
>>
>>         Expected: 5001, Count:  5001
>>         DML insert took: 13 wallclock secs ( 0.53 usr +  0.94 sys =  1.47
>> CPU)
> Something seems off here. This suggests every insert took 2.6ms. That
> seems awfully long, unless your network latency is substantial. I did a
> quick test implementing this in the naive-most way in pgbench, and I get
> better times - and there's *lots* of room for improvement.
>
> I used a pgbench script that sent the following:
> BEGIN;
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> RELEASE SAVEPOINT insert_success;
> {repeat 5 times}
> COMMIT;
>
> I.e. 5 failing and 5 succeeding insertions wrapped in one transaction. I
> get >2500 tps, i.e. > 25k rows/sec. And it's not hard to optimize that
> further - the {ROLLBACK TO,RELEASE} SAVEPOINT; SAVEPOINT; INSERT can be
> sent in one roundtrip. That gets me to somewhere around 40k rows/sec.
>
>
> BEGIN;
>
> \startpipeline
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> RELEASE SAVEPOINT insert_success;
> SAVEPOINT insert_fail;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> \startpipeline
> ROLLBACK TO SAVEPOINT insert_fail;
> SAVEPOINT insert_success;
> INSERT INTO testinsert(data) VALUES (1);
> \endpipeline
>
> {repeat last two blocks three times}
> COMMIT;
>
> Greetings,
>
> Andres Freund

I have written a Perl script to mimic what I have found in an Oracle
batch script to import data in a table. I had this use case in a recent
migration the only difference is that the batch was written in Java.

$dbh->do("BEGIN") or die "FATAL: " . $dbh->errstr . "\n";
my $start = new Benchmark;
my $sth = $dbh->prepare("INSERT INTO t1 VALUES (?, ?)");
exit 1 if (not defined $sth);
for (my $i = 0; $i <= 10000; $i++)
{
        $dbh->do("SAVEPOINT foo") or die "FATAL: " . $dbh->errstr . "\n";
        # Generate a duplicate key each two row inserted
        my $val = $i;
        $val = $i-1 if ($i % 2 != 0);
        unless ($sth->execute($val, 'insert '.$i)) {
                $dbh->do("ROLLBACK TO foo") or die "FATAL: " .
$dbh->errstr . "\n";
        } else {
                $dbh->do("RELEASE foo") or die "FATAL: " . $dbh->errstr
. "\n";
        }
}
$sth->finish();
my $end = new Benchmark;

$dbh->do("COMMIT;");

my $td = timediff($end, $start);
print "DML insert took: " . timestr($td) . "\n";

The timing reported are from my personal computer, there is no network
latency, it uses localhost. Anyway, the objective was not to bench the
DML throughput but the overhead of the rollback at statement level made
at client side versus server side. I guess that you might have the same
speed gain around x3 to x5 or more following the number of tuples?

The full script can be found here
https://github.com/darold/pg_statement_rollbackv2/blob/main/test/batch_script_example.pl

Cheers,

--
Gilles Darold

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2021-08-13 12:50:17 Re: SI messages sent when excuting ROLLBACK PREPARED command
Previous Message Julien Rouhaud 2021-08-13 12:32:56 Re: Shared memory size computation oversight?