I have an excellent career thanks to the Oracle database. For several reasons, I have met many people that support the community by sharing knowledge, and I have learned a lot along the way. One of those things is how to attack troubleshooting and optimization of slow applications. You may know “optimization” as tuning.
The Richness of Oracle Database Instrumentation #JoelKallmanDay
As a consultant, I have helped in many development projects involving the Oracle database. The Java or C# developers frequently relate to the database with merely superficial knowledge. When they run into a problem, they ask Dr. Google for help. There are, however, some issues with this approach. What comes up on top of a Google search may not be the best solution and might be based on a system configuration or hardware setup quite different from the current project. A better approach is to record what is going on and measure what can be measured. “What can be measured” is not always much. If the developer were proactive, the application would have built-in code for logging and tracing. Quite often, though, the program is a black box that only throws an error message when it crashes.
From the developer’s perspective, it is often somebody else’s fault when the application is slow, usually the network or the database. Usually, the burden of proof lies with the plaintiff. But it seems that the database and network administrators must deal with this. The DBA must collect the facts and analyze them to show what is happening. These situations are where my favorite feature in the Oracle database comes in: The rich instrumentation and the possibility to see what is going on and measure how long each operation takes. I can do that by enabling SQL trace, looking at the dynamic views for live observations, or looking at the workload data automatically collected in the database.
Problems fall into two categories: slowness or failures with optional error messages. I attack performance problems by defining the start and end of the task and measuring what is taking time. When the database is not busy processing data on the CPU, it waits for something. Seeing what it is waiting for, how much time it spent waiting, and how my times this happened reveals a lot about the program and logic used. Not always is it a fault in the application. Perhaps the data model is not good, or even the database can have a suboptimal configuration. It helps me enormously to see what the database did during a slow performance or before the failure.
Sometimes the database is not doing much, and nothing can be improved there. But even this is useful to know that the problem is elsewhere. A trace on an operation will reveal excessive network roundtrips or offloading more data to the application than it can process in a reasonable time. The solution to both problems may be moving more of the processing into the database.
My best advice to a new DBA is to learn methods for troubleshooting and optimization. In the Oracle Database documentation set, the Database Performance Tuning Guide and SQL Tuning Guide are great places to start. And, of course, nobody explains this better than Cary Millsap at Method-R .