Tuesday 17 August 2010

Keeping a hold on your database size

CRM databases have a tendency to grow very quickly. I've seen growth on smaller 50-ish user sites of easily more than 1GB per month. This gets bigger even quicker if you have a workflow and e-mail heavy solution.


I wont cover the e-mail issue in significant detail here other than to say some thought should always be given to the storage of e-mails in CRM and more specifically, attachments. This also applies to all other methods for saving attachments into the database. Some things to consider:
  • What to set the CRM maximum file size to (this cant be bigger than 8192 kilobytes). Keeping it smaller will improve database size, but will mean that larger attachments will not be saved.
  • When to promote an e-mail to CRM - do you really need every e-mail (and attachment) when a large amount of correspondence can often just be noise, with only certain elements of a conversation actually being relevant.
If you don't keep control of this, then you database can get very big very quickly.

The other area and the one I'm exploring in detail here is the asynchronous operation tables in CRM. With every system job and/or workflow, new records will get written in to these tables (specifically the Asyncoperationbase and Workflowbase tables) Over time as your system goes through the motions of your various business processes, match code updates, system expansion tasks and all the rest, these tables can get very, very big; sometimes containing millions of records, causing not only a storage problem but almost certainly giving us some performance headaches too.

Fortunately Microsoft have recognised this and given us a few ways of dealing with the issue as part of the Update Rollup process.

So first off we have 2 options, both enabled via the addition of new registry keys:
  1. AsyncRemoveCompletedJobs (http://support.microsoft.com/kb/974896)
    This will remove all completed Async jobs in CRM of the following types:
    CollectSQMData
    PersistMatchCode
    FullTextCatalogIndex
    UpdateContractStates
  2. AsyncRemoveCompletedWorkflows (http://support.microsoft.com/kb/974896)
    This will remove all completed workflow jobs from both the AsyncOperationBase and WorkflowBase tables.
And now to the downside - on of the strengths of CRM workflow is its persistent nature - meaning that because the workflow instances exist as data in the system we can report on them... so we are able to extract meaningful detail and manangmenet intelligence from our workflow data. Obviously this ability becomes a bit redundant if we dont keep any of this data.

My general answer to the above challenge is to enable the 'AsyncRemoveCompletedJobs ' registry key, but not the 'AsyncRemoveCompletedWorkflows' key. This means that we can reclaim some space from the other system jobs, while leaving the workflow history in place to support our MI requirements.

One last caveat; these registry cahnges are not restrospctive. In other words once the specific feature is enabled it will only remove jobs/workflows instanciated after the change, not historical data. To get around this the kind folks at Microsoft have provided us with some SQL swcripts that will remove historical data from the tables in question.

The article can be found here: http://support.microsoft.com/kb/968520/

A cautionary tale: The script provided by Microsoft deletes all completed jobs, workflow included. So if you intend to replicate the functionality of retaining completed workflow history as per the above registry keys, you will need to modify the script accordingly to not remove records where OperationType = '10'. If you dont do this you lose everything... and I'm talking from painful experience here :(.

No comments:

Post a Comment