Well, thanks a lot for the attention. My main purpose was to reduce the memory footprint. But, before I did the tests, I had the idea that the new method would be slower than the older ... So it would only be better on large files, i.e. where the reduced memory usage was more important than raw speed. This was because of the extra cycle through the array.
On Wed, 23 Aug 2006, Luis Vilar Flores wrote:
To all that already forgot the first emails, I developed an modified version of the method toBytes from the org.postgresql.util.PGbytea class. The old method uses 3 buffers to translate the data from the nework to the client, this uses too much memory. My method only uses 2 buffers, but does one more pass through the original buffer (to calculate it's final size).
I'm not super impressed with these timing results. They are certainly showing some effects due to GC, consider the rise in time here at 10.5MB.
The new method is very similar to the old, but it just computes the final size before the copy. The old method does less instructions to convert an array, the new method is only faster when the older is slowed down by garbage collection/memory allocation.
OLD method:
size: 9.5MB execute+next: 804ms getBytes: 377ms used mem: 66169KB
size: 10.5MB execute+next: 634ms getBytes: 546ms used mem: 73112KB
size: 11.5MB execute+next: 689ms getBytes: 450ms used mem: 80057KB
size: 12.5MB execute+next: 748ms getBytes: 482ms used mem: 87001KB
I came up with my own contrived benchmark (attached) that attempts to focus solely on the getBytes() call and avoid the time of fetching results, but it doesn't give really consistent results and I haven't been able to come up with a case that actually shows the new method was faster even with 30MB of data. This is on Debian Linux / 2xOpteron 246 / jdk 1.5.0-05.
I think the old option should be there for a while, but I hope that the new method proves to be as fast as the old, so we can just discard the MAX_3_BUFF_SIZE and always compute the final size - as the method code would be clearer that way.
I've committed this to CVS HEAD with a rather arbitrarily set MAX_3_BUFF_SIZE value of 2MB. Note that this is also the escaped size, so we may actually be dealing with output data a quarter of that size. If anyone could do some more testing of what a good crossover point would be that would be a good thing.
It's me who thanks for such a great product ...
Thanks for your patience with this item.
Kris Jurka
import java.sql.*; public class ByteaTest2 { public static void main(String args[]) throws Exception { Class.forName("org.postgresql.Driver"); Connection conn = DriverManager.getConnection("jdbc:postgresql://localhost:5432/jurka","jurka",""); for (int k=0; k<5; k++) { long t1 = System.currentTimeMillis(); long total = 0; for (int j=0; j<10; j++) { PreparedStatement pstmt = conn.prepareStatement("SELECT varcharsend(repeat(?,?))"); pstmt.setString(1, "a\\001"); pstmt.setInt(2, 150000); ResultSet rs = pstmt.executeQuery(); rs.next(); for (int i=0; i<100; i++) { byte b[] = rs.getBytes(1); total += b.length; } rs.close(); pstmt.close(); } long t2 = System.currentTimeMillis(); System.out.println(t2-t1); } } }
Luis Flores
Analista de Sistemas
Evolute - Consultoria
Informática
Email:
lflores@evolute.pt
Tel: (+351) 212949689