By Buddel
In data warehouse land it is not very common to see constraints in the database. I never felt very comfortable with that, but until now I did not get around to analysing why I felt that way. Until I read this article by Tom Kyte. In the article Tom Kyte shows that the CBO (Cost Based Optimizer) can profit from the information that is derived from the presence of constraints by generating better query plans. Better in this case is defined as ‘producing result sets faster’. The examples in the article are not exactly ‘real world’ data warehouse examples. Following Tom Kyte’s line of reasoning I do agree that constraints are capable of improving the performance of queries.
The reasons for not having constraints in a data warehouse are along the lines of ‘I have checked the integrity when I did my ETL, so why would I need constraints to confirm that? And besides, constraints would only delay my ETL because they have to be checked before they are really enabled’. I see a couple of flaws in this reasoning:
- I suspect that most constraints in a data warehouse cannot be enabled when actually applied. The quality of the ETL might be good, but is it just as good as a constraint would be? I think not.
- Enabling constraints might take time, but how often do you have to check constraints? Only when doing the ETL, of course. I hope that in your DWH, doing ETL will be during a small part of the time your DWH is being used. Otherwise your DWH will have a problem. The rest of the time your DWH will be used for querying and Tom Kyte just showed that querying can be sped up by applying constraints.
Summing up my pros and cons of applying constraints.
Pro:
- it will improve the data quality of the DWH
- it can speed up the queries in your DWH (querying it is the purpose of your DWH anyway)
Con:
- it will take more time to do your ETL (which is only a means to create your DWH)
My conclusion is that I wil try to incorporate as many constraints as possible in my next DWH. It also means I will have to be smart enough to enable the constraints at just the right moment during my ETL to have an acceptable loading performance.
Via:: Klein Oracle Denkraam