random technical thoughts from the Nominet technical team

DBMS_MONITOR causing instance hanging

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Feb 25th, 2008

Recently we had an experience of a completely hung Oracle instance. This was a RAC database running version 10.2.0.3, only 1 instance in the 2 node cluster was affected, but there was no way of logging onto the affected instance either remotely or even as sysdba locally. Perhaps, we should have done a more thorough root cause analysis to find out why the system was hung using some of the techniques from this excerpt. I had never heard of the -prelim option to sqlplus before and it may have been interesting to perform a systemstate dump, but this was a busy production system and we had a really good idea what had caused it and just wanted to get back up and running again.

The system had been running along fine, until a colleague was examining an issue with a 3rd party piece of software and was using DBMS_MONITOR to obtain trace information of the connecting session. He noticed that every time he traced a connection from this application, it would suddenly make a new connection, which he then promptly would run DBMS_MONITOR on. This went on for around 6 sessions until eventually the instance hung to new connections and the helpdesk phones started to ring.

It was pretty clear we had to stop tracing ASAP, and right enough, once we had worked out which pid’s to kill server side the instance returned back to normal, just killing the sqlplus sessions on the client side was not actually doing the trick. Thankfully DBMS_MONITOR puts the pid number into the filename it uses to output the trace data to. However another way of obtaining which processes to kill would have been to run the following (we knew the traces were being run remotely from a specific client) on the still working node:

SQL> select machine, process
from gv$session;

It seems you can even kill a session running on a different node in a RAC cluster, so we probably could have even tried doing this via the sid, serial# information available from gv$session.

It turns out there is a bug with DBMS_MONITOR in Oracle 10.2.0.3, it is Bug 5485914 available in metalink, the notes for this state it may cause the instance to run slowly/hang. There is a patch available, also hopefully fixed in the imminent 10.2.0.4. I’m quite convinced it was the multiple times DBMS_MONITOR was running that caused the instance hang, so if you are on 10.2.0.3 be careful how many sessions you monitor at once.

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Recent Posts

Highest Rated

Categories

Archives

Meta: