Enabling Compression on a Large SQL Server Table With Zero Downtime

Typically with a table in SQL Server if you want to enable compression, you’d have to duplicate the table and have downtime while you get the 2 tables in sync and swap them over. Using computed columns, you can compress a table with no downtime, and without requiring any extra database or disk space.
So you have a table that is along the lines of this:

CREATE TABLE WebRequest
(
	WebRequestID INT IDENTITY(1,1) NOT NULL,
	RequestType INT NOT NULL,
	RequestDate DATETIME NOT NULL,
	RequestXML VARCHAR(MAX) NOT NULL
)

I’m using a table that’s auditing requests to a web service, but this could also be something storing free text notes in your application – basically anything with a LOB data type (the VARCHAR(MAX) RequestXML column in my example).

The data in it looks like this:

Your table has got to the point where it has hundreds of millions of rows in it, and it’s 800GB in size. You want to compress the table rather than have to pay for more disk space, but copying all the data into a new table and compressing it as you do so means you need alot of extra database space (my 800GB table would have needed an extra 180GB of space available), and getting the original and the new table in sync means downtime for the application while you stop new rows going into the original table.
You could add a new column to the table and insert the compressed data into it to, then remove the data from your uncompressed column, and while this means you don’t need the extra disk space, your application then needs to retrieve both the compressed and uncompressed columns each time and check which column it should use.

Here’s how you can compress this table without using any extra space, and with no downtime.

Prepare your application.

First, look at how you are going to compress the data.
SQL Server 2016 onwards, you can use the native COMPRESS() and DECOMPRESS() functions.
For versions of SQL Server prior to 2016, you can implement a CLR function as a wrapper for something like gzip, to mimic the 2016 functions (search for GZipStream SQL CLR for examples).

Next, you need to make sure all references to the table in your application use stored procedures, and that all columns used are specified (no using SELECT * !)
For example, to retrieve a specific row from my table, I have the SP:

CREATE PROCEDURE R_WebRequest
	@WebRequestID INT
AS
BEGIN
	SELECT RequestType, RequestDate, RequestXML
	FROM WebRequest
	WHERE WebRequestID = @WebRequestID
END

And to insert into the table:

CREATE PROCEDURE I_WebRequest
	@RequestType INT,
	@RequestDate DATETIME,
	@RequestXML VARCHAR(MAX)
AS
BEGIN
	INSERT INTO WebRequest
		(
			RequestType,
			RequestDate,
			RequestXML
		)
	VALUES (@RequestType, @RequestDate, @RequestXML)
END

You also need to make your LOB column allow NULL values, if it doesn’t already:

ALTER TABLE dbo.WebRequest
	ALTER COLUMN RequestXML VARCHAR(MAX) NULL

Changing a column to allow NULL values is instant, regardless of the size of the table.

Add columns to your table

We need a column to store our compressed data in:

ALTER TABLE dbo.WebRequest ADD
	RequestXMLCompressed VARBINARY(MAX) NULL

Again, this is instant to add to a table, as it’s adding a NULL value rather than a default data value. (Note that if you have a long running statement accessing the table, that may block you from adding the column. You should be fine with quick running functions, even if they’re very frequent).

Now we add the computed column.

Computed columns are supported in all versions of SQL Server from 2008 onwards (and if you’re still using something earlier than 2008 you really need to upgrade!) and are added to a table in the same way as a regular column, but instead of specifying a data type, you provide a SQL statement to execute.

From Microsoft’s article:
“A computed column is a virtual column that is not physically stored in the table but is computed from an expression that uses other columns in the same table”
You can set a “persisted” flag on computed columns, that will store the calculated value in the database – don’t enable it for this task, as doing so will store the uncompressed as well as the compressed data!

ALTER TABLE dbo.WebRequest ADD
	RequestXMLRead AS CASE WHEN RequestXMLCompressed IS NULL THEN RequestXML ELSE CONVERT(VARCHAR(MAX), DECOMPRESS(RequestXMLCompressed)) END

We’re adding a column with a CASE statement in it, that will return our existing RequestXML column if our new RequestXMLCompressed column doesn’t have any data in it, or it will return the decompressed RequestXMLCompressed column if it does have data in it.
Note the CONVERT(VARCHAR(MAX) around the DECOMPRESS() statement. SQL Server doesn’t know what data type you want your decompressed column to be in, so you need to convert it appropriately here (to the same data type as your uncompressed column).

Our table now looks like this:

You can see the new RequestXMLCompressed column with nothing in it yet, and the new computed column RequestXMLRead, which looks identical to the RequestXML column, but isn’t taking up any extra space in the database.

Update Stored Procedures

We can now update our stored procedures to use the new RequestXMLRead column.

Important: you must update all stored procedures that read from the table first, before any inserts or updates, or you’ll fail to read the data for your new or updated rows.

My read stored procedure now looks like this:

ALTER PROCEDURE R_WebRequest
	@WebRequestID INT
AS
BEGIN
	SELECT RequestType, RequestDate, RequestXMLRead
	FROM WebRequest
	WHERE WebRequestID = @WebRequestID
END

The only change here is that instead of returning the RequestXML column, we’re returning the RequestXMLRead column.

The insert stored procedure needs to be changed to this:

ALTER PROCEDURE I_WebRequest
	@RequestType INT,
	@RequestDate DATETIME,
	@RequestXML VARCHAR(MAX)
AS
BEGIN
	INSERT INTO WebRequest
		(
			RequestType,
			RequestDate,
			RequestXMLCompressed
		)
	VALUES (@RequestType, @RequestDate, COMPRESS(@RequestXML))
END

We’ve changed the column specified in the INSERT to be RequestXMLCompressed, ignoring the RequestXML column, and we’ve wrapped the passed in @RequestXML in the COMPRESS() function.

Now all values going into the table will be compressed, and we’ll start saving disk space over what we would have used previously.

If we call the insert stored procedure and add a new row to the table:

EXEC I_WebRequest 2, '05 Sep 2018 19:30', '<root><upper>people</upper><bet>2028355427</bet><border><riding>139679720.9040308</riding><cloud>382980347</cloud></border><atom>-14478772.229665756</atom><rapidly>instrument</rapidly><pond>pink</pond></root>'

The table now looks like this:

You can see our new row at the bottom – the RequestXML column is NULL, the data is stored in the RequestXMLCompressed column, but the RequestXMLRead column contains the readable XML that we inserted.

An added bonus to using this approach is that the table is still human readable without needing to remember to decompress the data first.

Update the existing data

As all the new data going into the table is now compressed, we’ve slowed the rate at which the table is growing, and we can start to compress the old data to recover space in the database.

Before you start, run this SQL to get the current size of the table, so you can compare it to when it’s all compressed:

EXEC sp_spaceused 'WebRequest'

This is just a test table so it’s tiny, but we can still see the impact after the data has all been compressed.

We need to take the existing RequestXML data and compress it into the RequestXMLCompressed column. When this is done, we need to set the RequestXML column to be NULL to recover the space used by it.
Run something along these lines to update your existing data:

DECLARE @IDToUpdate INT

WHILE EXISTS (SELECT 1 FROM WebRequest WHERE RequestXML IS NOT NULL)
BEGIN
	SET @IDToUpdate = (SELECT TOP 1 WebRequestID FROM WebRequest WHERE RequestXML IS NOT NULL)

	UPDATE wr
	SET wr.RequestXMLCompressed = COMPRESS(wr.RequestXML)
	FROM WebRequest wr
	WHERE wr.WebRequestID = @IDToUpdate

	UPDATE wr
	SET wr.RequestXML = NULL
	FROM WebRequest wr
	WHERE wr.WebRequestID = @IDToUpdate
END

The data in our table now looks like this:

All the rows have data in the RequestXMLCompressed column, and nothing in the RequestXML column, but the RequestXMLRead column returns the same readable XML.

If we check the size of the table again, we can see the data size has halved (16KB to 8KB):

Note that this isn’t a great data saving because my XML data is pretty small. For tables with large amounts of LOB data, you can see the table reduce significantly – an 800GB table I’ve performed this on was 180GB after the compression.

You may need to wait for SQL Server to hit the now unused pages in the database before the sp_spaceused command returns a smaller value – this is due to how SQL Server cleans up deleted pages in the database. You can force this on the table by rebuilding the clustered index, as long as it is safe to do so in your system.

This update SQL can be left running for as long as it takes to complete – even if you only run it over night when the system is quiet and it takes 3 weeks to run, all the time it’s compressing data and making your database smaller and your application is retrieving the data with no extra code required.

One thought on “Enabling Compression on a Large SQL Server Table With Zero Downtime”

Brent Ozar says:

September 13, 2018 at 13:46

Ooo, great idea! It would also work with regular queries (outside of stored procedures) as long as they go through a view. You could simply change the view to point at the new RequestXMLRead field AS RequestXML, and the queries wouldn’t be aware of the change.

VikingLair