Patchset Problems
This is the definition of an Oracle patchset as given by Oracle: “A patchset is a tested and integrated set of product fixes. Patch sets provide bug fixes only; they do not include new functionality and they do not require certification on the target system”
We have been attempting to perform an Oracle database upgrade 10.1.0.3 -> 10.1.0.4, that is applying the 10.1.0.4 patchset. I found the statement above to have been well and truly overlooked in the 10.1.0.4 patchset.
As stated elsewhere, we are running a 2-node Oracle RAC cluster as our primary database platform. This went live around March 2004, at version 9.2.0.4. With the 9i version of Oracle you had to run some form of vendor clusterware to perfrom the clustering of the host servers. We chose to go with veritas Storage Foundation for Oracle RAC, version 4.0. Around October we upgraded to 10.1.0.3 and at this time had to upgrade the veritas version to 4.0 MP1 as this was the supported combination for 10g.
Judging from the definition of a patchset above, I was hoping that installing the 10.1.0.4 patchset would be a relatively painless exercise. Upgrading a RAC cluster first off all involves upgrading the CRS daemons, which are new in 10g. These live in a seperate directory structure and have to be upgraded first. You must make sure the servers in the cluster can rsh to each other AND to themselves. You must also run the patchset installer from the node you originally installed from! You can tell which node this is by looking at the order the nodes appear in the installer, the first node is in effect the “primary” node and the one on which you must run the patchset installer on.
After the CRS upgrade is done you will find that there is new functionality included with the CRS upgrade, this is a whole new daemon that is running. Under 10.1.0.3 we have the following running:
oracle 6949 6847 0 Feb 24 ? 0:00 /var/opt/oracle/product/crs/bin/evmlogger.bin -o /var/opt/oracle/product/crs/ev oracle 6847 1 0 Feb 24 ? 0:31 /var/opt/oracle/product/crs/bin/evmd.bin oracle 6901 6848 0 Feb 24 ? 0:00 su -c /var/opt/oracle/product/crs/bin/ocssd || exit 137 oracle 6906 6901 0 Feb 24 ? 81:47 /var/opt/oracle/product/crs/bin/ocssd.bin root 6852 1 0 Feb 24 ? 184:37 /var/opt/oracle/product/crs/bin/crsd.bin
But under 10.1.0.4 we have an additional daemon:
oracle 27353 1253 0 Aug 05 ? 0:00 /var/opt/oracle/product/crs/bin/evmlogger.bin -o /var/opt/oracle/product/crs/ev root 1257 1 0 Aug 03 ? 0:03 /var/opt/oracle/product/crs/bin/crsd.bin oracle 1253 1 0 Aug 03 ? 0:20 /var/opt/oracle/product/crs/bin/evmd.bin oracle 27473 27472 0 Aug 05 ? 0:00 /bin/sh -c /var/opt/oracle/product/crs/bin/ocssd || exit $? oracle 27472 27429 0 Aug 05 ? 0:00 su -c /bin/sh -c '/var/opt/oracle/product/crs/bin/ocssd || exit $?' oracle 27453 27449 0 Aug 05 ? 0:01 /var/opt/oracle/product/crs/bin/oclsmon.bin oracle 27474 27473 0 Aug 05 ? 4:00 /var/opt/oracle/product/crs/bin/ocssd.bin oracle 27449 27426 0 Aug 05 ? 0:00 su -c /var/opt/oracle/product/crs/bin/oclsmon || exit $?
That being said, the 10.1.0.4 patches of CRS seem to be more robust and appear to start quite a bit faster than in 10.1.0.3. After CRS is upgraded you now patch the oracle database server home directory, this was straightforward enough, but after you have done this you must startup the database and run some sql to patch the actual database. This proved to be a problem as the database would not start:
SQL> startup ORA-32004: obsolete and/or deprecated parameter(s) specified ORA-27546: Oracle compiled against IPC interface version %s.%s found version %s.%s
While in the alert log I could see:
Oracle instance running with ODM: VERITAS 4.0 ODM Library, Version 1.1 cluster interconnect IPC library is incompatible with this version of Oracle Oracle interface version information 2.4 cluster IPC library version information 2.3
So the 10.1.0.4 patchset has changed IPC version that it is compatible with. At first I attempted to apply a veritas patch that was designed for Oracle 9..2.0.6, which changes how stringent the IPC version checking is. This did work and enabled the database to startup, but when I started both nodes with the cluster_database parameter set to TRUE the following ORA-07445 errors appeared in the alert log:
ORA-07445: exception encountered: core dump [ksxpirqh()+4] [SIGSEGV] [Address not mapped to object] [0x0000000BF] [] []
This was happening with monotonous frequency every 5 minutes. The only option is to upgrade to Storage Foundation for Oracle RAC 4.1.
Another interesting feature of the 10.1.0.4 patchset is that it breaks automatic statistics gathering with the following errors in the alert log:
GATHER_STATS_JOB encountered errors. Check the trace file. Wed Jul 27 22:44:27 2005 Errors in file /opt/oracle/admin/NOM/bdump/bdbold_j000_5513.trc: ORA-00904: "T2"."SYS_DS_ALIAS_2": invalid identifier
Apparently this is fixed in 10.2. Basically the 10.1.0.4 not only includes new functionality and features, it also does not appear to be a well tested set of product fixes.

