random technical thoughts from the Nominet technical team

OS X: Environment variables don’t work in IntelliJ’s ant support

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by matt on Dec 20th, 2006

Running ant tasks from within IntelliJ that rely on environment variables doesn’t seem to work on OS X. I have an ant task that copies static files such as CSS and Freemarker templates to my local tomcat installation – this allows me to see updates much faster than performing a full deploy of the application. Under Mac OS X however, the CATALINA_HOME environment variable it relies upon cannot be seen by the ant task when running in IntelliJ. I have come up with a simple workaround:

On the ant pane, click the Settings icon. Then, on the Properties tab click Add and enter the name of the environment variable prefixed by the namespace that you have given to it in the ant script (in my case, the full property name is env.CATALINA_HOME). Assign the appropriate value (e.g. /opt/tomcat5.5.20). Click OK – you’re done.

Futureproofing the middlebox

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by roy on Dec 15th, 2006

There are increasingly more applications which need to do more than simply have names resolved to addresses. There are methods needed to find zone-cuts (Mark Andrews’ SOA-discovery) for applications that want to update the DNS. DKIM uses DNS to store cryptographic material, etc etc. There is also DNSSEC, a whole layer of cryptographic data stored in the DNS to prove that the data stored in DNS is authentic. I’m sure there are many more.

The resolver needs to be as transparent as possible. It should not blindly or unknowingly restrict the application from functioning, and meanwhile it should have some ‘anti-spoof’ checking in place.

What follows is a small list of things needed for a resolver to be future-proof:

1) Support EDNS0 Maximum Payload Size.

DNS messages used to have a maximum payload size of 512 octets when transmitted over UDP. (This is the reason there are 13 root-servers in DNS, 14 would not fit in a response). If the server notices that the payload size is too small to store the necessary RRsets, it must set the TC (truncated) bit in the response. At this point, the resolver should re-query over TCP.

To avoid the extra latency involved in this fall-back scenario, the IETF standardized a method to advertise larger payload size. This advertising is done using EDNS0. EDNS0 defines a whole range of extentions for DNS, one of them is the ability to advertise maximum payload size. Technically, the resolver adds an record (OPT) to the additional section in a request or response. The payload size is stored in the CLASS field of the record.

This is a tradeoff. If the receiving side (a legacy nameserver) does not understand or expect OPT records or strange CLASS fields, the transaction will fail, causing the resolver to fallback to requery without the OPT record. Note that there might be a fallback yet again if the server responds with the TC bit.

Early deployment of EDNS (the standard which defines the OPT record) has seen a lot of these ‘fallbacks’: resolvers that use OPT, servers that do not understand them. But, currently the DNS is in a state where it is safe to assume that the server does. On the server side: Microsoft’s DNS server, ISC’s BIND and NLNetLabs’ NSD understand it. Older versions of some of these do not. Two other well known servers, PowerDNS (PDNS) and Dan Bernstein’s tinyDNS silently ignore the OPT record in requests, and respond anyway, so this won’t introduce a fallback either.

2) Ensure transmission of DNSSEC data.

DNSSEC provides for origin authentication and integrity cheking of DNS data using strong cryptography. All the cryptographic functionality resides in the zone-signer (code that signs data) and the validator (code that verifies responses). To show why this is, it’s useful to understand that the components of the resolver may be decomposed into five distinct and effectively independent functions: client (sending requests), server (receiving requests), resolver, cache, and validator. The resolver has in principal nothing to do with the cryptographic data. It just resolves and serves it. The server however must do some effort to find the proper data in its database (zonefiles or the like), but that is not too complex.

The resolver algorithm needs to do nothing at all with the cryptographic processing. This is done by the validator, preferably independent of the resolver algorithm. This allows an application to do its own DNSSEC validation. Note that this validator does not need to known anything about the resolving process nor the location of authoritative servers (NS, A and AAAA records). It has a trust anchor for some domain, and only cares about the DNSSEC data involved. It can query a resolver for DS and DNSKEY records, while the RRSIG and NSEC records are send back automatically.

To protect legacy resolvers, i.e., resolvers that cannot handle unknown data (more on that later), the request must include a signal to indicate to the server that it understands the presence of DNSSEC resource records. “understand” does not mean ‘able to validate’, merely that it will not be confused by these records.

This signal is called the DNSSEC-OKAY (DO) bit. It is allocated in the TTL field of the EDNS0 OPT record previously mentioned. It is basically the resolver signalling the server “send me DNSSEC data, if you have it”.

Since the resolver is now able to receive and store these DNSSEC records, applications can use it to validate DNS data.

3) Handle Unknown Data.

Without exception, all resource records have the same format: NAME, TYPE, CLASS, TTL, RDLEN and RDATA. There is no need for the resolver to understand that the RDATA of type 33 (SRV) has different semantics than the RDATA of type 44 (SSHFP). The syntax is of these records are identical. The type space is 16 bits wide, and only a handful of type codes have been assigned by IANA. There is only a tiny subset of types of which the resolver must know the RDATA semantics: SOA, NS, A, AAAA and CNAME.

4) Disregard and Remove Extraneous Data.

This section is about proper DNS hygiene. The resolver needs to protect client from potentially spoofed data. When a resolver queries a server, it is because it has learned that the server is authoratative for some domain in some class (typically the IN class). The resolver should therefore strip (not just ignore) all the resource records that have (1) a different class, and (2) a name not under the authority of the server. For example: a resolver connects to the authoritative nameserver for ‘.com’. When the resolver receives a response that contains names not ending with ‘.com’ (for instance: ns1.example.org A 127.0.0.1) it should remove them from the response before any other action. Note that the OPT record is only used to signal resolver and server capabilities and have no use outside of the actual transaction. After processing it, it should be removed as well.

Conclusion.

Some implementors sit on top of bleeding edge technology, others only implement if the demand for some technology is high. The latter approach is really a Catch 22. Applications can’t (and thus won’t) deploy new technology based on DNS if resolvers are not future proof. Resolvers won’t become future proof if there is no demand from the application side.

I believe it’s essential that all resolvers — especially the well-known ones — implement the principles listed above. I’d like to see organisations use future proof technology. This removes barriers to applications using DNS, DNSSEC, DKIM and whatever else is on the horizon.

Monitoring an Oracle Physical Standby

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by jason on Dec 11th, 2006

When you are using an Oracle dataguard physical standby as your disaster recovery solution you want to make sure the standby is as up-to-date as possible with the redo being generated on the primary database. While there is an argument for not actually applying that redo straight away (recovering from human error being one) , you do want to make sure your standby has actually received it. There are quite a few views you can look at to see how your standby is doing.

This one checks which redo log sequence number you have applied up to:

STANDBY_SQL> SELECT MAX(SEQUENCE#), APPLIED FROM V$ARCHIVED_LOG GROUP BY APPLIED;
MAX(SEQUENCE#)     APP
--------------     ----
1129                YES

This query will tell you what the background processes that make your standby tick are actually up to:

STANDBY_SQL> select process, client_process, sequence#, status from V$managed_standby;
PROCESS   CLIENT_P  SEQUENCE# STATUS
--------- -------- ---------- ------------
ARCH      ARCH              0 CONNECTED
ARCH      ARCH              0 CONNECTED
RFS       UNKNOWN           0 IDLE
RFS       UNKNOWN           0 IDLE
MRP0      N/A             774 WAIT_FOR_LOG
RFS       UNKNOWN           0 IDLE
RFS       UNKNOWN           0 IDLE
RFS       UNKNOWN           0 IDLE
RFS       LGWR            774 IDLE
RFS       LGWR            236 IDLE
RFS       UNKNOWN           0 IDLE

What you don’t want to see in the above is WAIT_FOR_GAP, because then you know you have hit a problem.

You can tell which recovery mode you are in via the following:

STANDBY_SQL> SELECT RECOVERY_MODE FROM V$ARCHIVE_DEST_STATUS WHERE DEST_ID=2 ;
RECOVERY_MODE
-----------------------
MANAGED REAL TIME APPLY

The view, V$dataguard_status is quite useful in highlighting any issues/problems the standby may be having, but I find the V$dataguard_stats to be the best view for telling you where you have applied up to:

SQL> select * from v$dataguard_stats;
NAME                             VALUE
-------------------------------- ----------------
apply finish time                +00 00:00:00.0
apply lag                        +00 00:00:13
estimated startup time           24
standby has been open            N
transport lag                    +00 00:00:05

What this tell us, is that we are seeing the redo 5 seconds after it has hit the primary, we are using Maximum performance on the primary, I’d expect this to be 0 with Maximum Availability or Maximum Protection. You can also see from this view how good your standby is at keeping up with the redo generated with the apply lag. The apply lag can be very useful for monitoring, if you have a gap the transport lag can still be quite low but your apply lag can just grow and grow, if you are missing an archived redo logfile and it’s deleted on your primary you can end up in a non-recoverable situation again rendering your standby not much use.

The final thing I’d advise monitoring is the mrp background process, if this dies then again you can get a gap building up that would not be bridged by a fal (fetch archive log) request as the mrp process is responsible for detecting and resolving gaps. The mrp process looks something like ora_mrp0_SID.

ARP_ANNOUNCE

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jad on Dec 8th, 2006

There is an option ARP_ANNOUNCE in linux that allows you to control which source address is put in to ARP headers. It can take the following values.
0 (default) Any local address
1 Use address from the same subnet as the target address
2 prefer primary address.

This is worth knowing because the default can give some very strange results when routing packets with linux.

On CentOS you can change the default by adding a line like this to /etc/sysctl.conf

net.ipv4.conf.all.arp_announce = 1

Thanks to Understanding Linux Networking Internals for helping me solve this problem.

I wonder why any local address is the default? I dont see how this could ever be useful.

Read-Write on a physical standby database?

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Dec 8th, 2006

In theory it should be entirely possible to open your physical standby database read-write, perform some testing and then use Oracle flashback database to get back to being a physical standby again. Unfortunately in a RAC environment things don’t quite go according to the manual:

STANDBY_SQL> CREATE RESTORE POINT pre_test GUARANTEE FLASHBACK DATABASE;

PRIMARY_SQL> ALTER SYSTEM ARCHIVE LOG CURRENT;

STANDBY_SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;

STANDBY_SQL> ALTER DATABASE OPEN;

WOOPS! This is what we see when attempting this:

ALTER DATABASE  OPEN;

ORA-03113: end-of-file on communication channel

ORA-00600:  internal error code, arguments: [3705], [1], [8], [3], [8]

Fortunately (I think) this is a known issue: Bug 4479323 OPEN RESETLOGS can fail with OERI[3705] in RAC. That is to say it is NOT just related to a physical standby but could be a recovery issue in any RAC database. The workarounds suggested in the Metalink note 4479323.8 did not work for me. This issue is meant to be fixed in 10.2.0.3 which is imminent for Linux x86-64. Unfortunately the one-off patch seems to just be for 10.1.0.4.

I should point out that I was unable to restore the database back to a physical standby and had to completely reinstate it from scratch.

Keyboard navigation under Mac OS X

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by chris on Dec 7th, 2006

I recently noticed that my work machine (a G4 PowerBook) and my home machine (an Intel Mac Mini) differed in that the Restart/Shutdown dialog was navigable via the keyboard on the former, but not on the latter. By this I mean that pressing the tab key would allow you to move between the buttons so that you could choose the required button with the keyboard alone.

At first I suspected this was some odd quirk about Intel machines, but having done some research it is just down to the fact that keyboard navigation is partially disabled by default. At some point I must have turned it on for my work machine. You can rectify this by visiting the “Keyboard and Mouse” preference pane, going to the keyboard shortcuts section and selecting “All controls” rather than “Text boxes and lists only” at the bottom.

Why is this the default? Seems strange to me.

Lotus notes and Images in HTML email

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jad on Dec 7th, 2006

Some users here have reported that when they open HTML email containing images that the images just appear as red boxes containing a small x. I found the solution to this problem on the Notes forums.

The issue seems to be that web navigator database (perweb.nsf) doesn’t exist. To fix this issue (I tested it on a mac) do the following.

  1. Edit your current location (popup menu at bottom right)
  2. Goto the Internet Browser tab
  3. Select Notes as the internet browser
  4. Set retrieve/open pages to “from Notes workstation”
  5. Save the location document (the thing you are editing)
  6. Open an email that contains images. They should now appear.
  7. Try a couple of emails if it doesn’t work first time
  8. Once you have working images you can go back and set your internet browser to firefox or whatever you use.

Brief guide to ssh tunnels

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 3 out of 5)
Loading ... Loading ...
Posted by jay on Dec 6th, 2006

Suppose you are away on a conference, say in Sao Paulo, and you haven’t set up VPN on your laptop but you need to access a server only accessible inside your corporate network. How do you do it?

Well all it takes is a host already inside your corporate network that you can ssh to because ssh has a clever facility built in to enable a tunnel through that computer.

Imagine I have an ssh host inside my network called ’ssh-host’ and the server I want to access is an intranet web server called ‘target’. Then all I need do from my laptop out in the wild is issue the following command

ssh -N -L 1234:target:80 myusername@ssh-host

And that will redirect port 1234 on my laptop to tunnel through to port 80 on the target server. To use it all I do is open up a web browser and go to http://localhost:1234 and hey presto the web page from the target server appears.

Just to talk through the command:

  • -N This tells ssh not to execute a command on the remote server. This does mean that the ssh command does not appear to complete after you execute it in a shell but just sits there doing nothing. However it has worked. You will need to control-C to quit the ssh command.
  • -L This tells ssh to create a tunnel.
  • 1234:target:80 This tells ssh that the tunnel should be from port 1234 on the localhost to port 80 on the machine called target.
  • myusername@ssh-host This is the username and host that sits inside the corporate network and provides the tunnel.

Lotus Notes 7.0.2 beta 2 for OS X

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jad on Dec 6th, 2006

Here at Nominet we get to apply the power of Lotus Notes to all our communication needs. So it was with great excitement that I discovered the beta release of Lotus Notes 7.0.2 for OS X.

Before starting dont forget to backup your ID files and and local address book or databases. I removed the 6.5.5 install from my mac before starting. and then just ran the Notes installer. This is much improved from previous installs and I had no problems with it.

Running it for the first time gave no surprises. Just the normal configuration steps.

Opening emails gives a Javascript error!. This is not good and appears to be happening on all messages. However disabling Javascript error messages in the User Preferences stopped it.

The default font is much better and things feel a bit quicker and smoother, but not a lot.

A worthwhile upgrade I think.

Update: The mouse scroll wheel now works!!

Update2: You can tell notes to save the state of the window at exit. so all the databases you normally open at start up will be there! Also you can now right click on the senders name in a memo and select “create memo to”.

When dataguard goes bad

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Dec 4th, 2006

We have been using Oracle dataguard to provide us with a replicated copy of our production database for disaster recovery. After our pain with logical standby I had hoped using the more robust and mature physical standby would lead to less pulling out of hair etc. Mostly it has been very efficient:

select name, value from v$dataguard_stats;

apply finish time             +00 00:00:00.0

apply lag                     +00 00:00:15

estimated startup time           24

standby has been open            N

transport lag                 +00 00:00:08

However what I was not expecting was this in the alert log on the standby:

ORA-07445: exception encountered: core dump [kcrarmb()+152] [SIGFPE] [Integer divide by zero][0x0085C0]

This killed the managed recovery process (MRP) which is responsible for applying the redo data from the standby redo logs. Thankfully redo continued to be sent after a log switch on the primary, though as MRP is responsible for spotting and resolving archive gaps nothing was being applied on the standby (due to MRP not running) but we also were missing an archive log. This opened up the potential of data loss, even though we have later redo data if you can’t resolve a gap you are stuck.

Searching on metalink returns absolutely no hits. What I liked best though was the response of an Oracle engineer when asked about this issue:

“I had a Look into our Knowledge and Report Database, this kind of Error has not been reported before.”

I don’t think we are running anything out of the ordinary, though it is a RAC -> RAC configuration, oh and maybe not everyone is running real time apply which is new from 10.1.

A fix was fairly easy to come by as restarting the managed recovery process then proceeds to perform gap resolution and once there is no gap new redo can happily be applied again. This however is quite an annoyance that MRP can’t just restart itself, so close monitoring of the apply lag and perhaps even the mrp background process is really required - it’s no good paying for a standby only to find on disaster it’s out of date!

Next »

Recent Posts

Highest Rated

Categories

Archives

Meta: