We have a significant investment in Oracle database technology running several Oracle RAC clusters. In addition to the cost of an Oracle license, there is an ongoing yearly payment for support that entitles us to upgrades, patches and perhaps most importantly, the ability to log so called service requests stating what problem you have and an Oracle support analyst tries to fix your issue. Unfortunately Oracle support went to a global quite a few years and instead of having an analyst in the UK look at the problem, we seem to have analysts from a variety on countries in EMEA. I’ve found this really leads to a huge variation in the quality of analysts assigned to our cases.
Currently we have 6 outstanding cases, three of which are now 2 months old and still awaiting a fix. Mostly these issues have arisen due to us having applied the latest Oracle patchset, 10.2.0.3. Here is a list of our current issues:
- Flashback database fails with ORA-38761, bug:5914784 and has been with Oracle for 5 weeks.
- ORA-600: [kcrfr_resize2], bug:3306010 Oracle have worked on this for 2 months and counting.
- ORA-00355: change numbers out of order, bug:5935278 another 2 month old case.
- ORA-600: [kcrrupirfs.20], bug: 4767278, there is a patch available but I’m still waiting an Oracle response.
- ORA-600: [kfgFinalize_2], bug: 5393792 which is meant to be fixed, but does not appear to be.
- redo transport failing with ORA-16011 fixed by us but still waiting for Oracle to come up with anything.
It was this final case, the ORA-16011 problem that prompted this posting, as we must have ended up with a truly inept support analyst for this case. As stated this problem was regarding sending redo from our primary to our standby. Connectivity between primary and standby was fine and I could use sqlplus to connect to primary -> standby and standby -> primary. The analyst (after I told them I fixed it) asked for the tnsnames entry for our physical standby that the primary was using to connect to it:
STANDBY =
(DESCRIPTION =
(SDU=32767)
(SEND_BUF_SIZE=375000)
(RECV_BUF_SIZE=375000)
(ADDRESS_LIST =
(ADDRESS = (PROTO = TCP)(HOST = localhost)(PORT = 9024))
(ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT = 9025))
)
(CONNECT_DATA =
(SERVICE_NAME = STANDBY)
)
)
I also posted the following:
oracle 25366 1 0 Apr25 ? 00:00:00 ssh -N -L 9024:standby-node1:1521
oracle 20759 1 0 Mar04 ? 00:02:18 ssh -N -L 9025:standby-node2:1521
Basically we are sending the redo data to our standby over an ssh tunnel and the tnsnames entry says to connect to the localhost on the ports that the ssh tunnels are forwarding on. This works extremely well at securing the transport of the redo and is completely transparent to the database.
After seeing this information, the support analyst on this case asked “what does localhost mean”. I was utterly speechless to think we are paying money to Oracle to explain to their analysts basic computing terminology.