Monday, December 14, 2009

Denormalisation and transactions

I'm all for denormalising a data model for performance or meeting business requirements with less complexity in the model. I'm happy to design a model with an aggregated figure from a related child table in a parent table - putting a year to date sales figure into a master customer table is one that seems to crop up regularly. One of my rules of thumb when I denormalise is to make sure that I transactionalise units of work to ensure that data integrity is maintained. In the case of the earlier example this would mean placing the insert, update or delete to the sales table inside the same transaction as the update to the year to date sales figure in the customer table. It's critical that the two parts to the data change either succeed completely or fail completely. If the first sales data is changed but the customer year to date sales figure is not (due to some failure) then without a rollback of the transaction your data now has problems. I've seen enough cases where a denormalised data model is used without transactions around units of work to wonder if some developers simply don't realise that this problem even exists.

By the way, you could also use triggers to protect your data integrity. It's not my preferred way of doing things - but it does make for a workable solution.

One other thing that I've found useful (if you inherited a denormalised model that doesn't make use of transactions) is to have a means of recalculating any aggregate values (that might be stored in the data model) by summing across the granular child rows. Having components that do this gives you the flexiblity to check the stored aggregate values from time to time or, in the event that the data does develop problems, you can calculate what the stored aggregate values should be and make the necessary updates.

No comments:

Post a Comment