When Oracle ASM goes bad
We have been using ASM with our Oracle 10.2 Linux clusters for over a year now. I have found it to be extremely stable. Until now. One of our RAC database instances crashed with the following:
Error: KGXGN aborts the instance (6) ORA-29702: error occurred in Cluster Group Service operation LMON: terminating instance due to error 29702
Trying to restart the instance produced the following errors:
ORA-00202: control file: ‘+ASM/path_to_file/control02.ctl’ ORA-17503: ksfdopn:2 Failed to open file +asm/path_to_file/control02.ctl ORA-15001: diskgroup “ASM” does not exist or is not mounted
Clearly for some reason there were issues for this instance accessing the datafiles. I saw that ASM was still running so I looked at the ASM instance alert logs and found the following:
ORA-00600: internal error code, arguments: [kfgFinalize_2], [], [], [] NOTE: cache dismounting group 3/0×6A02BCB9 (ASM) ERROR: diskgroup ASM was not mounted
Looking up the ORA-600 code on metalink I came across document 418063.1 this matched exactly with what we were seeing and all the stack trace calls matched up. There was one crucial difference between our environment and that mentioned in the note and that was the fact that the note stated this was fixed in the 10.2.0.3 patchset. We are running 10.2.0.3 on this cluster so this “fix” should be helping us. I never encountered this issue in a year of running on 10.2.0.2, but now have after 7 weeks of being on the so called fixed 10.2.0.3. We still await a solution to this.


May 24th, 2007 at 11:10 am
[…] time ago I posted on a problem we were having on one of our ASM instances that is used in one of our RAC clusters. […]