Table partitioning to improve stats

In a earlier post about legacy databases I mentioned that dirty data are more likely to enter the database on an earlier stage. One day while I was looking at the selectivity for a few columns I found the values a bit strange. After some digging I noticed that for most of the time the data was OK, even though a proper foreign key constraint was missing for the columns. But at some time many years ago somebody had inserted something completely different , which had clearly affected the selectivity (the columns are indexed).

Table partitioning is better to assist in the management of the data more than for tuning purposes, and I imagined that this would be another example. By range partitioning on transaction time dirty data would be kept on a partition separate from more recent history and thus improve the column stats. Though I haven’t set up a test case (for the lack of a real problem) I believe that a correct selectivity will help the CBO and avoid these border problems where the CBO is tricked to throw off a wrong plan based on wrong statistics. That would be another example where the CBO is blamed for something that is not its fault.

This is why I don’t like hints, it feels like cheating and being lazy. The CBO may have bugs, but I think you have to be as good as the author of this book to prove it; usually there is an underlying problem, like the one I encountered.

Anyway, I did a quick test, by copying the table in question into a partioned version and ran dbms_stats.gather_table_stats on it. In fact the selectivity did not change much between partitions, probably not enough to make an impact on CBO choice of plan. But I had used other parameters on the new table (most importantly I think, estimate_percent=>100), the columns in the two tables had very different values for selectivity. Which reiterated to me previous knowledge; checking the stats and the way they are collected is important, and sometimes a simple solution will do.