Skip site navigation (1) Skip section navigation (2)

Re: Populating large tables with occasional bad values

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: "John T(dot) Dow" <john(at)johntdow(dot)com>
Cc: "pgsql-jdbc(at)postgresql(dot)org" <pgsql-jdbc(at)postgresql(dot)org>,"tivvpgsqljdbc(at)gtech-ua(dot)com" <tivvpgsqljdbc(at)gtech-ua(dot)com>
Subject: Re: Populating large tables with occasional bad values
Date: 2008-06-13 19:10:01
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-jdbc
John T. Dow wrote:

> Fifth - now what? I tried to commit at this point, hoping that the
> first insert statements would be commited, but they are not. Actually, I
> was hoping that the entire batch would be processed and then I could see
> which individual statements didn't insert (using the update counts) and
> then redo them individually, but I think that is wishful thinking.

The first error causes a transaction rollback. The only way the JDBC 
driver could skip the failed operation and continue would be to wrap 
each statement in a savepoint, and if there's an error issue a rollback 
to the last savepoint. That's not ideal from a performance point of view.

> Best possible solution with batches? Submit the batch, learn who many 
> were accepted before the first error, resubmit just those. Process the
> problem statement by itself. Repeat with the statements left in the
> original batch until none are left, then create a new batch and begin
> anew. If a batch, or portion thereof, has no exception, then of course
> commit.

IMO the best solution with batches is to use batches to store your 
planned operations in a staging table using INSERTs that cannot fail 
under normal circumstances. Then issue a single PL/PgSQL call or a bulk 
UPDATE ... SELECT with an appropriate WHERE clause to apply the updates 
and catch problems.

The approach you describe will work, but it'll be a bit ugly and not 
very fast. If failures are very rare you might find it to be OK, but 
your suggestion of using only 10 statements per batch suggests that 
failures aren't *that* rare.

> In the posting I quote below, the implication is that you can learn
> multiple statements in the batch that failed, but I didn't see that
> happening.

AFAIK that's database specific. The docs I read suggested that DBs are 
permitted to stop processing and return info on the first bad row, or to 
continue processing and return info on all rows.

As PostgreSQL cannot just keep going after an error within a transaction 
unless you use savepoints, it choses to do the former.

It might be cool if Pg supported an automatic savepoint mode where every 
statement updated an implicit savepoint snapshot. If the statement 
failed you could ROLLBACK TO LAST STATEMENT. If possible the savepoint 
would be replaced with each statement so there'd be no ever-growing set 
of savepoints to worry about. With this capability way you could use 
proper transactional isolation but also achieve some error handling 
within the transaction. Personally, though, I either restructure my SQL 
to avoid the potential errors or use PL/PgSQL to handle them. The app 
has to be capable of reissuing failed transactions anyway, so sometimes 
an app-level transaction do-over is quite enough. There might be 
situations in which that hypothetical feature could be handy, though.

> It seems that posgres just quits at the first problem.

Yes, by design. In a transaction if a statement fails then without a 
savepoint in place there is, AFAIK, no way to just pretend the bad 
statemnet was never issued.

I don't know if it'll keep going if you enable autocommit. It might; it 
depends on how the JDBC driver does batches and how the server processes 
them. Your performance will suffer with autocommit on - though not as 
badly as  when you're paying the high round trip latencies - and you 
won't be able to roll back if something nasty happens. If you don't care 
about things like automatic rollback on network dropout you should 
probably test batches with autocommit on and see how they behave.

Craig Ringer

In response to

pgsql-jdbc by date

Next:From: John T. DowDate: 2008-06-13 21:56:24
Subject: Re: Populating large tables with occasional bad values
Previous:From: John T. DowDate: 2008-06-13 14:59:54
Subject: Re: Populating large tables with occasional bad values

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group