Saturday, July 21, 2007

Compression ... expensive or cheap?

Given the current super-availability of CPU power and hard disk terabytes, give an answer to that question is not an easy task.

We may take the point of view of information explotion, then we think it's a never ending least that's the looouuuud complain the business and companies we work for, make every given years that they need to double or triplicate our SAN (read: any-storage-device-you own), expend some million bucks... and save for next loop. But it's a fact to be faced: information is being generated almost compulsively, even by devices a couple years ago where merely information 'mute'.

The scenario seems to get more complicated for years to come...

What if we look the recent direction microprocessor development has taken: multicore processors, devices with 2 or 4 or n independent processing units on one package. Visionary projects like Intel Terascale or IBM-Sony-Toshiba Cell will provide tremendous processing power.

And this huge leap forward takes processing growth rate much ahead of storage growth rate. Then, such powerhouses may help mitigate the information avalanche we live on... what do you think?

Before 11g

Yes, we may talk about compression before 11g, of course there isn't much noise about it because it's an incomplete feature on 10g, with restrictions that make it almost useless for regular OLTP applications: only works when data is inserted direct-path mode, that is bulk insert and Import Datapump (sorry, regular import doesn't have it).

But not everything is lost, there are some scenarios where it still can be usefull, very I must say.

1) Cut significantly storage for Historic/Archive and Testing/Development environments
2) Reduce IO (and increase performance) on Datawarehousing/BI applications

I've done comparative tests, showing that 10g compression gives a very good rate, reducing storage usage by 45% - 60%. I'm saying you may double your SAN "space-life" or slice the storage invoice, specially on scenario (1) when you fight fierce battles for budget that won't support direct business' operations.

With 11g

Oracle 11g offers data compression, but what are the benefits? what do we have to pay?

I've repeated the test under Oracle for Linux, and got a 80% compression rate using regular import and Datapump later; you must remember that import does not use direct insert, that's why you should alter your tables for compression under any operation.

My test bed is a Dell 2650 with 2 Xeon Procesors running at 2.4Ghz, 512Mb L2 cache, 4Gb RAM and 400 Gb Ultra SCSI, the OS selected is SuSE 10 SP1 for i386 (no need for above 4Gb memory space...unavailable either!).

I did use a copy of a test environment database, with more than 27,800 tables ranging from millions of rows to zero rows ... you guessed it: BaaN. After table creation, space used marked 1.83Gb for both 10gR2 and 11gR1. Following table shows results for this test, trying all basic scenarios

Additional CPU overhead? Apparently not, just takes more time to load your data. See following graph:

...To be Continued: You shure it takes longer? Look thru these multifocals

Subscribe to Oracle Database Disected by Email
Custom Search