Re: Compression and on-disk sorting
От | Andrew Piskorski |
---|---|
Тема | Re: Compression and on-disk sorting |
Дата | |
Msg-id | 20060517001434.GA3222@tehun.pair.com обсуждение исходный текст |
Ответ на | Re: Compression and on-disk sorting ("Jim C. Nasby" <jnasby@pervasive.com>) |
Ответы |
Re: Compression and on-disk sorting
Re: Compression and on-disk sorting |
Список | pgsql-hackers |
On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote: > On Tue, May 16, 2006 at 12:27:42PM -0400, Andrew Dunstan wrote: > > Rod Taylor wrote: > > >>I habitually turn off all compression on my Windows boxes, because it's > > >>a performance hit in my experience. Disk is cheap ... > > > > > >Disk storage is cheap. Disk bandwidth or throughput is very expensive. > > Sure, but in my experience using Windows File System compression is not > > a win here. Presumably if it were an unqualified win they would have it > Does anyone have time to hack some kind of compression into the on-disk > sort code just to get some benchmark numbers? Unfortunately, doing so is > beyond my meager C abilitiy... Folks, first of all, I'm in no way an expert on data compression in RDBMSs, but other databases DO include data compression features and claim it as a SIGNIFICANT win in I/O reduction. Looking at performance of the Windows File System compression, etc., doesn't make too much sense when there are actual RDBMS compression implementations to compare to, on the commerical market, in open source code, and in the academic literature. Oracle has included "table compression" since 9iR2. They report table size reductions of 2x to 4x as typical, with proportional reductions in I/O, and supposedly, usually low to negligible overhead for writes: http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14211/build_db.htm#sthref289 Decision Speed: Table Compression In Action by Meikel Poess and Hermann Baer (2003): http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/poess_tablecomp.html Compressing Data for Space and Speed by Sanjay Mishra (2004): http://www.oracle.com/technology/oramag/oracle/04-mar/o24tech_data.html Order For Maximum Compression: http://oramossoracle.blogspot.com/2005/11/table-compression-order-for-maximum.html I don't remember whether the current (Open Source) MonetDB includes table compression or not, but they've published papers with lots of interesting detail on the compression and other high performance OLAP features in their latest (not released) "X100" MoneyDB research codebase: http://monetdb.cwi.nl/ http://homepages.cwi.nl/~mk/MonetDB/ http://sourceforge.net/projects/monetdb/ ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm Now, the docs and papers above are all focused on query performance, they say nothing directly about using using compression for on-disk sorts. But, I'd bet that similar rules of thumb will apply in both cases. The main tricks seem to be: One, EXTREMELY lightweight compression schemes - basically table lookups designed to be as cpu friendly as posible. Two, keep the data compressed in RAM as well so that you can also cache more of the data, and indeed keep it the compressed until as late in the CPU processing pipeline as possible. A corrolary of that is forget compression schemes like gzip - it reduces data size nicely but is far too slow on the cpu to be particularly useful in improving overall throughput rates. Note, I have not really tested ANY of the above myself, your mileage may well vary from what I recall from those various articles... -- Andrew Piskorski <atp@piskorski.com> http://www.piskorski.com/
В списке pgsql-hackers по дате отправления: