Re: Compression and on-disk sorting

Поиск
Список
Период
Сортировка
От Andrew Piskorski
Тема Re: Compression and on-disk sorting
Дата
Msg-id 20060517001434.GA3222@tehun.pair.com
обсуждение исходный текст
Ответ на Re: Compression and on-disk sorting  ("Jim C. Nasby" <jnasby@pervasive.com>)
Ответы Re: Compression and on-disk sorting
Re: Compression and on-disk sorting
Список pgsql-hackers
On Tue, May 16, 2006 at 12:31:07PM -0500, Jim C. Nasby wrote:
> On Tue, May 16, 2006 at 12:27:42PM -0400, Andrew Dunstan wrote:
> > Rod Taylor wrote:
> > >>I habitually turn off all compression on my Windows boxes, because it's 
> > >>a performance hit in my experience. Disk is cheap ...
> > >
> > >Disk storage is cheap. Disk bandwidth or throughput is very expensive.

> > Sure, but in my experience using Windows File System compression is not 
> > a win here. Presumably if it were an unqualified win they would have it 

> Does anyone have time to hack some kind of compression into the on-disk
> sort code just to get some benchmark numbers? Unfortunately, doing so is
> beyond my meager C abilitiy...

Folks, first of all, I'm in no way an expert on data compression in
RDBMSs, but other databases DO include data compression features and
claim it as a SIGNIFICANT win in I/O reduction.

Looking at performance of the Windows File System compression, etc.,
doesn't make too much sense when there are actual RDBMS compression
implementations to compare to, on the commerical market, in open
source code, and in the academic literature.

Oracle has included "table compression" since 9iR2.  They report table
size reductions of 2x to 4x as typical, with proportional reductions
in I/O, and supposedly, usually low to negligible overhead for writes:
 http://download-east.oracle.com/docs/cd/B19306_01/server.102/b14211/build_db.htm#sthref289
 Decision Speed: Table Compression In Action by Meikel Poess and Hermann Baer (2003):
http://www.oracle.com/technology/oramag/webcolumns/2003/techarticles/poess_tablecomp.html
 Compressing Data for Space and Speed by Sanjay Mishra (2004):
http://www.oracle.com/technology/oramag/oracle/04-mar/o24tech_data.html
 Order For Maximum Compression:  http://oramossoracle.blogspot.com/2005/11/table-compression-order-for-maximum.html

I don't remember whether the current (Open Source) MonetDB includes
table compression or not, but they've published papers with lots of
interesting detail on the compression and other high performance OLAP
features in their latest (not released) "X100" MoneyDB research
codebase:
 http://monetdb.cwi.nl/ http://homepages.cwi.nl/~mk/MonetDB/ http://sourceforge.net/projects/monetdb/
ftp://ftp.research.microsoft.com/pub/debull/A05june/issue1.htm

Now, the docs and papers above are all focused on query performance,
they say nothing directly about using using compression for on-disk
sorts.  But, I'd bet that similar rules of thumb will apply in both
cases.

The main tricks seem to be:  One, EXTREMELY lightweight compression
schemes - basically table lookups designed to be as cpu friendly as
posible.  Two, keep the data compressed in RAM as well so that you can
also cache more of the data, and indeed keep it the compressed until
as late in the CPU processing pipeline as possible.

A corrolary of that is forget compression schemes like gzip - it
reduces data size nicely but is far too slow on the cpu to be
particularly useful in improving overall throughput rates.

Note, I have not really tested ANY of the above myself, your mileage
may well vary from what I recall from those various articles...

-- 
Andrew Piskorski <atp@piskorski.com>
http://www.piskorski.com/


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Wheeler
Дата:
Сообщение: Re: PL/pgSQL 'i = i + 1' Syntax
Следующее
От: "Jaime Casanova"
Дата:
Сообщение: Re: PL/pgSQL 'i = i + 1' Syntax