SQL Server 2008 Data Compression Part-1
If most of you think that compression is a brand new feature introduced in SQL Server 2008 then I am afraid, you are wrong. It actually started with SQL Server 2005 when SP2 brought with itself a new storage format called Vardecimal.
Unlike Varchar, Vardecimal was a table-level option and affected the numeric data types stored in the underlying table.
SQL Server 2008 however goes beyond this and has introduced a full blown - Data Compression. Data Compression comes in two flavors:
-
Row Compression
-
Page Compression
Row Compression
Row compression changes the format of physical storage of data. Enabling row compression has the following affects on the underlying data:
-
It minimize the metadata (column information, length, offsets etc) associated with each record.
-
Numeric data types and fixed length strings are stored in variable-length storage format, just like Varchar.
It’s very simple to create a table with row compression enabled:
CREATE TABLE MyTable
To enabled row compression on an existing table:
Alter TABLE MyTable REBUILD WITH (DATA_COMPRESSION=Row, MAXDOP=2);Since compression is a CPU intensive process there is an additional setting MAXDOP which is used to specify the maximum number of processors to be used during compression operation.
Page Compression
The second and the most vital compression method is page compression. Page compression allows common data to be shared between rows for a given page. Its uses the following techniques to compress data:
-
Row compression. (Already discussed above)
-
Prefix Compression. For every column in a page duplicate prefixes are identified. These prefixes are saved in compression information headers (CI) which resides after page header. A reference number is assigned to these prefixes and that reference number is replaced where ever those prefixes are being used.
-
Dictionary Compression. Dictionary compression searches for duplicate values through out the page and stores them in CI. The main difference between prefix and dictionary compression is that prefix is only restricted to one column while dictionary is applicable to the complete page.
The syntax for enabling page compression is same as the one for row:
CREATE TABLE MyTableOr
Alter TABLE MyTable REBUILD WITH (DATA_COMPRESSION=Page, MAXDOP=2);There is however few important points that one needs to understand when using page compression. Page compression works on page level so the amount of benefit that you can get out of this really depends on the data distribution per page. You can get drastically different results by just moving the data around, changing the clustered Index or any other change that modifies the structure of data stored.
Example
It’s time to put every thing that we studied so for to a test. Create a sample table and fill it with some hypo theoretical data.
Set NoCount On
The above TSQL script creates a table CompressionCheck and inserts 100000 randomly generated records. Note that page compression is not yet enabled. To find the current space utilization by the table use sp_spaceused proc.
Exec sp_spaceused 'CompressionCheck';The above stats suggest that 13 MB of storage space is currently being used by CompressionCheck table. Now let’s apply page compression and see the results.
DBCC DROPCLEANBUFFERS --Removes all clean buffers from the buffer poolExec sp_spaceused 'CompressionCheck';
Results:
Comparing both the results we have successfully managed to reduce the space allocated to CompressionCheck table from 13 MB to just 3.5 MB.