Обсуждение: Query ResultSet parsing speedup patch (resend)
Hi, This is a resend since I did not get any feedback to this patch when I sent it two months ago. Could someone at least have some recommendations what I should do to get this patch evaluated or even applied to the official source. Also, could someone update the CVS info page (http://jdbc.postgresql.org/development/cvs.html) to point to the new pgfoundry ? --- I tried to optimise the parsing of ResultSets from the network stream, especially int and long values, but also a bit for String values. The attached patch gives upto 40% speedup when parsing larger queries containing equal amounts of int4, int8 or varchar(16) columns if the system is cpu bound. The speedup comes mainly from avoiding creation of useless objects and avoiding useless byte[] copies. In more real-life scenarios the speedup will be much smaller. The patch also contains the test code which I used to test the performance (BenchTest.java). The benchmark results are: unpatched 503 jdbc driver: speed: 180.18 memory: 36.8MB patched 503 jdbc driver: speed: 261.44 memory: 19.6MB +40% -40% The benchmark was run on Java6rc build 100 with postgresql 8.1.4 running on localhost with Athlon64 2x2GHz, 64bit mode. If the patch is accepted I can try to do similar optimisations for other fields. --- What the patch does: Parsing string: old way: read byte[] containing the exact bytes of a string new way: create a string directly from network buffer with start and end index Parsing numbers: old way: create a string that contains the number and use Integer.parseInt new way: parse the number directly from bytes Exact changes explained: new class VisibleBufferedInputStream: - replaced java.io.BufferedInputStream with faster implementation * no synchronisation * allows direct access to the buffer byte[] contents which helps to avoiding useless copies when converting to String * has method for scanning the length of next null terminated string PGStream: - uses VisibleBuffereInputStream * faster implementations for for ReceiveIntegerR and ReceiveString AbstractJdbc2ResultSet - parses int and long values directly from byte[] -> number instead of first creating a throw-away string - if the fast conversion fails or if the current charset is not safe for the optimisation then uses the old way to parse - also very minor optimisation to getFixedString to optimise for non-money string types Encoding: - uses HashMap instead of synchronised Hashtable - does not do two queries to encodings map when obtaining database encoding - added method to check if the numbers match ASCII charset locations (all currently listed charsets match)
Вложения
On 29-Sep-06, at 6:01 PM, Mikko Tiihonen wrote: > Hi, > > This is a resend since I did not get any feedback to this patch when I > sent it two months ago. Could someone at least have some > recommendations > what I should do to get this patch evaluated or even applied to the > official source. > > Also, could someone update the CVS info page > (http://jdbc.postgresql.org/development/cvs.html) to point to the new > pgfoundry ? > Well, we haven't actually moved it yet, it's there more by accident. > --- > > I tried to optimise the parsing of ResultSets from the network stream, > especially int and long values, but also a bit for String values. > > The attached patch gives upto 40% speedup when parsing larger queries > containing equal amounts of int4, int8 or varchar(16) columns if the > system is cpu bound. The speedup comes mainly from avoiding > creation of > useless objects and avoiding useless byte[] copies. In more real-life > scenarios the speedup will be much smaller. > > The patch also contains the test code which I used to test the > performance (BenchTest.java). The benchmark results are: > > unpatched 503 jdbc driver: speed: 180.18 memory: 36.8MB > patched 503 jdbc driver: speed: 261.44 memory: 19.6MB > +40% -40% > The benchmark was run on Java6rc build 100 with postgresql 8.1.4 > running > on localhost with Athlon64 2x2GHz, 64bit mode. > > If the patch is accepted I can try to do similar optimisations for > other > fields. > > --- > > What the patch does: > Parsing string: > old way: read byte[] containing the exact bytes of a string > new way: create a string directly from network buffer with start > and end > index > > Parsing numbers: > old way: create a string that contains the number and use > Integer.parseInt > new way: parse the number directly from bytes > > Exact changes explained: > > new class VisibleBufferedInputStream: > - replaced java.io.BufferedInputStream with faster implementation > * no synchronisation > * allows direct access to the buffer byte[] contents which helps > to avoiding useless copies when converting to String > * has method for scanning the length of next null terminated string > > PGStream: > - uses VisibleBuffereInputStream > * faster implementations for for ReceiveIntegerR and ReceiveString > > AbstractJdbc2ResultSet > - parses int and long values directly from byte[] -> number instead > of first creating a throw-away string > - if the fast conversion fails or if the current charset is not safe > for the optimisation then uses the old way to parse > - also very minor optimisation to getFixedString to optimise for > non-money string types > > Encoding: > - uses HashMap instead of synchronised Hashtable > - does not do two queries to encodings map when obtaining database > encoding > - added method to check if the numbers match ASCII charset locations > (all currently listed charsets match) > <postgresql-resultset-parsing-speedup.patch> > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings
On Sat, 30 Sep 2006, Mikko Tiihonen wrote: > This is a resend since I did not get any feedback to this patch when I > sent it two months ago. Could someone at least have some recommendations > what I should do to get this patch evaluated or even applied to the > official source. > I like the change to use VisibleBufferedInputStream. That's a clean way of avoiding copies. I don't particularly like the number parsing changes as they're a pretty ugly hack, but I can't deny the performance improvements (I saw 30-35%). I plan on applying this unless someone objects. One bug I found was that you have the wrong bytes.length check in getFastInt. It can handle one more byte than you specified. Kris Jurka
On Fri, 6 Oct 2006, Kris Jurka wrote: > On Sat, 30 Sep 2006, Mikko Tiihonen wrote: > >> This is a resend since I did not get any feedback to this patch when I >> sent it two months ago. Could someone at least have some recommendations >> what I should do to get this patch evaluated or even applied to the >> official source. >> > > I like the change to use VisibleBufferedInputStream. That's a clean way of > avoiding copies. I don't particularly like the number parsing changes as > they're a pretty ugly hack, but I can't deny the performance improvements (I > saw 30-35%). I plan on applying this unless someone objects. > Applied. Kris Jurka