random technical thoughts from the Nominet technical team

How to build complicated Dynamic SQL within PL/SQL

1 Star2 Stars3 Stars4 Stars5 Stars (4 votes, average: 4.5 out of 5)
Loading ... Loading ...
Posted by chris on Mar 22nd, 2007

It is not unusual to be in a situation where a routine in PL/SQL takes a long series of arguments to use as parameters in a query. For example, suppose we have the following function:

FUNCTION Find_People_By_Address(street_address_in VARCHAR2,
                                locality_in VARCHAR2,
                                city_in VARCHAR2,
                                province_in VARCHAR2,
                                country_in VARCHAR2,
                                postcode_in VARCHAR2) RETURN PERSON_CURSOR;

We pass in parameters and expect to receive back a cursor which will give us people whose address matches those parameters. If we pass NULL, it means that we don’t want to consider that field. One way to do this is to open our cursor with a SQL statement that that looks something like this

SELECT ID, NAME
FROM PEOPLE_TABLE
WHERE (street_address_in IS NULL OR STREET_ADDRESS = street_address_in)
AND (locality_in IS NULL OR LOCALITY = locality_in)
AND (city_in IS NULL OR CITY = city_in)
...

This is fine for the simplest cases, but sometimes the optimizer gets a little confused because you have mentioned every column in the table. It is also a little hard to read and this becomes more and more of a problem as the query gets complicated. It would be better if passing in NULL for everything except postcode produced a query like this

SELECT ID, NAME
FROM PEOPLE_TABLE
WHERE POSTCODE = postcode_in

It is obvious to the optimizer what to do. Assuming postcode is indexed, the query will perform well. The logical way to do this is to build the SQL dynamically, but this can also open the door for the dreaded SQL Injection Attack. This happens if you use concatenation to add the arguments into the query like this

IF city_in IS NOT NULL THEN
   query := query || ' AND CITY = ''' || city_in || '''';
END IF;

Obviously, this is bad practice. So how do we get around this? The answer is of course bind variables, which Oracle supports for dynamic SQL. But it isn’t obvious how we use them here. Surely the number of bind variables we need will depend on the number of non-NULL arguments passed to our routine? Yes and no. You can work around this by keeping the number of bind variables fixed with one per argument, but by effectively throwing away the ones you are not interested in. You build the SQL like this:

query := 'SELECT ID, NAME FROM PEOPLE TABLE WHERE 1=1';

IF street_address IS NOT NULL THEN
   query := query || ' AND STREET_ADDRESS = :1';
ELSE
   query := query || ' AND (1=1 OR :1 IS NULL)';
END IF;

......

and then open the cursor like so

OPEN result_cursor
FOR query
USING street_address_in, locality_in, city_in, province_in, country_in, postcode_in;

So our query will end up with six bind variables in it every time, but the ones which are NULL will effectively be thrown away. Our query may not end up as readable as the postcode example given above, but it should be as fast and doesn’t have a vulnerability to SQL Injection.

Oracle physical standby real time apply & 10.2.0.3

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 4.5 out of 5)
Loading ... Loading ...
Posted by jason on Mar 16th, 2007

We have been happily running an oracle dataguard physical standby for many months with the 10.2.0.2 patchset. We were using real time apply with the physical standby which means the redo generated on the primary are applied to the standby as soon as they are received. After upgrading to 10.2.0.3 it seems we are having real difficulty in getting the standby to run with real time apply:

MRP0: Background Media Recovery terminated with error 355

ORA-00355: change numbers out of order

ORA-00353: log corruption near block 2 change 863653355 time 02/21/2007 11:58:53

ORA-00312: online log 30 thread 1: '+DATA2/nom/standby30.log'

Managed Standby Recovery not using Real Time Apply

Again, you start to worry about hardware issues, but as far as I can tell, there seems to be no hardware problems. I have managed to repeat this issue on 2 completely distinct installations so hardware seems unlikely. We are also seeing quite random behaviour in that sometimes the standby can be made to go into real time apply mode but nothing gets applied, even though redo is being shipped to it.

After Oracle had failed to come up with anything for a few days, I noticed that real time apply was using parallel log recovery, so I tried turning parallelism off, by setting the following parameter:

alter system set parallel_max_servers=0 scope=both;

Upon starting real time apply again I saw the following in the alert log:

parallel recovery setup failed: using serial mode

But it did start in real time apply. Though it seems if after attempting real time apply in parallel a standby redo has been corrupted previously i need to do a few log switches on the primary before it will go into real time apply again in serial mode.

Oracle have said it is similiar to BUG: 3806172 but this applied to 10.1 and logical standby not physical. It is possible to get managed recovery to start without using real time apply but then you have to start switching your log files in a more predictable fashion using archive_lag_target.

We are still awaiting a fix for this issue.

Some issues with Oracle 10.2.0.3 patchset

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Mar 16th, 2007

We have recently applied the latest Oracle patchset, 10.2.0.3 to our databases. While installing went smoothly enough the new patchset has not been without issues, here are some of the problems (none of which were evident in 10.2.0.2) we are finding:

ORA-07445: EXCEPTION ENCOUNTERED: CORE DUMP [OPIDSA()+386] [SIGSEGV]

This seems to be being caused by Oracle application express, though apparently is more to do with memory management on the server rather than application specific. There has just recently been a metalink note posted about this: 418855.1

This has a bug number 5648872 and there will be a patch available for this (with the same number as the bug) at some point.

Next up we saw the following in our alert log:

ORA-600 [KCRRUPIRFS.20] [4] [368]

This error was accompanied by the arc process dying (to our standby database) and the following being dumped into a trace file:

Corrupt redo block 479421 detected: bad block number

I can find no hardware issues and we were able to get the redo data to the standby ok. This is still ongoing with oracle.

Final error for this blog posting was an issue with our physical standby after being patched to 10.2.0.3:

ORA-00600: INTERNAL ERROR CODE, ARGUMENTS: [KCRFR_RESIZE2], [652614828032]

This error occurred upon trying to start the managed recovery process

Which it failed to do. This error may be caused by a corrupt archivelog file on the standby, though how this corruption came about is as yet unknown. There is more about 10.2.0.3 and physical standby databases in a future posting.

Middleware remoting protocol migration

1 Star2 Stars3 Stars4 Stars5 Stars (9 votes, average: 2.89 out of 5)
Loading ... Loading ...
Posted by miquel on Mar 13th, 2007

At Nominet we have successfully migrated the remoting protocol governing communication between our client applications and the software running in the middleware layer. We have moved from SOAP to Hessian, refactoring the base code on the way. This post explains why we migrated, the advantages of the migration and how we tested different protocols to decide the move.

Starting point

Due to not relevant reasons at the moment, Nominet middleware services were being exposed to client applications internally using Glue, a privative SOAP implementation. Middleware code was quite tied to the use of this library, fact that was not convenient at all because it made difficult to introduce a different protocol.

SOAP protocol is good when you need to expose services in an environment where you do not control the client side and want to provide hight compatibility. SOAP it is also good if you expose small stateless services with fast processing and small amount of data transfer in and out. However, due to the heavy process of XML, SOAP is not appropriate for services which can require a significant amount of information going in and out. Additionally, in an environment where you control server and client code, there are better communication protocol options.

In our environment, client–middleware communication required huge amounts of data going back and forth and using SOAP was doing more harm than good. We were not comfortable with this situation and started to study the possibility to move to an open source light protocol, taking the opportunity of refactoring the code in order to reduce dependency on the chosen communication protocol.

Protocol testing and comparison

In order to decide which protocol we would like to implement we first researched a bit on the available options and after pin picking a small group we tested them in our environment. The tests were done in two phases, first we tested all the initially considered protocols with a light load and compared them. With the first phase test results in hand, we discussed the convenience of using a Java only protocol or one that allowed a mixture of client platforms. Then we decided to test further the best protocol within each category. Therefore, in the second phase we tested just two protocols, a Java only one and another allowing multiple types of client to see how they behaved on heavy loads.

We initially looked at the following protocols:

  • Hessian

Lightweight binary protocol from Caucho, HTTP-based, Custom binary serialization mechanism. Support for several platforms PHP / Python / C++ / C# / Objective C / Ruby / Java.

  • Burlap

XML-based lightweight protocol from Caucho, HTTP-based, Custom XML based serialization mechanism, Do not know support different than for Java

  • Spring HttpInvoker

Spring Java-to-Java remoting, HTTP-based, Java serialization just like RMI, easy to set up.

  • RMI JRMP Protocol

Java remoting standard, each method needs to throw a checked RemoteException and need to generate stubs and skeletons.

  • Glue SOAP

Web-Methods HTTP-based web services (was the current implementation). Support many different platforms.

Burlap was discarted before testing in favour of hessian and the resulting set was tested and compared.

First phase of protocol testing

It was intended to achieve a performance comparison within the current infrastructure just changing the wire protocol so each protocol could be compared with the current implementation using GLUE.

GLUE tests measurements were performed using the current code at the same repository release version in which the code was branched to perform necessary changes to introduce protocol independence.

For the other protocols the same test was run with the additional infrastructure to support protocol independence. Therefore the tests of the rest of the protocols were in the same conditions as the Glue test or even with a slightly overhead due to the additional layer of code.

The server and client for the test run into two JVMs on the same host to avoid network perturbations. JIT and GC are activated in order to obtain the best performance and to evaluate the common case.

The test consisted on repeated calls to the middleware services passing a structure with increasing complexity. The called method just replicates the structure and returns the copy.

The structure passed in each call is a 4 elements arraylist in which the first element is a 50 char fixed string, the second is an n-element integer arraylist, the third is an n-element UserDetails objects arraylist and the last is an n-element TestStructureBean objects arraylist. The array length shown in the data are the number of elements on the arraylists, so for a 200 array length the structure will have:

  • 1 String
  • 200 Integer

Once the structure is built for each iteration, the service is then called repeatedly using the same structure in each call. The service is called for an initial amount of times before taking any time measure to allow caching and afterwards, the elapsed time for 100 calls is measured.

Test results

Then resulting time of 100 iterations for increasing lengths of the structure is shown in the table below and also in two charts, the first one plotting the data for all protocols and the second one plotting the same data but with Glue protocol removed.

.

Protocol Response Time Table per array length (milliseconds)
Protocol Response Time

.

Protocol Response Time Chart
Protocol Response Time Chart

.

Protocol Response Time Chart Without Glue protocol
Protocol Response Time Chart Without Glue

.

Conclusion

As can be read from the data and the first chart, Glue has a strong processing overhead. When the transferred data increases on size, its response time grows much faster than the rest of protocols. The other protocols response time have a moderated grow accompanying the data size increase.

Within the other three protocols, the fastest one is the HTTPInvoker in all cases except with tiny data sizes. Hessian has a good performance rating for small and medium data sizes, performing even better than RMI but its response time degrades when the data size increases.

HTTPInvoker is then a good choice to implement Java to Java remoting and Hessian is adequate if there are other languages involved in the client side.

Second phase of protocol testing

With the results from the first testing round in hand, we decided to focus on the best two suitable options for our purposes, therefore we chose HTTP invoker and Hessian and tested them further with heavier loads.

Two different tests were performed in this phase, the first one is the same performed in the first testing phase but involving longer arrays (thousands of elements) but fewer call iterations for each array length. The second test consists in calling a method with a big String as a parameter and receiving the same String back as a result. For this test we used Strings ranging from 100k characters to 550k characters. In both cases the tests performed 10 call iterations.

Test Results

Then resulting measured time of 10 iterations for different structure lengths and document sizes are shown in the tables below. Data is also shown graphically in two bar charts and for the benefit of comparison two more charts are provided, displaying the response time ratio between Hessian and HTTP Invoker for the heavy structure and big documents tests.

.

Protocol Response Time Table
per array length (milliseconds)
Heavy load response time

.

Protocol Response Time Table
per document size (milliseconds)
Big docs response time

.

Response Time Chart for Large Structures
Heavy load response time chart

.

Response Time for big documents
Big docs response time chart

.

Response time ratio (Hessian to HTTPInvoker) Chart Large Structures
Hessian to HTTPInvoker Ratio heavy load

.

Response time ratio (Hessian to HTTPInvoker) Chart Big Documents
Hessian to HTTPInvoker Ratio big documents

.

Conclusion

HTTPInvoker was, as expected, faster than Hessian, however the idea of this tests was to check how both protocols compare for big data chunks from two points of view: A big structure formed by many small objects or a big chunk of data like a String of thousands of characters.

As can be seen in the ratio charts, Hessian performs better when transferring data in one single piece. Despite the fact that the two ratio charts are not directly comparable, it is worth noting that for structures of many small objects the ratio is almost alway over 1.6 while for single big objects the ratio is mainly below 1.6.

We finally decided to use Hessian because it allows a range of client platforms and we have a small percentage of non Java applications supported by Hessian which will access the middleware services. We could actually expose our middleware services with both protocols at the same time, but the performance gain would not compensate the system over complication that this would represent. Despite the fact that Hessian is in average a 50% slower than Http Invoker it will be for sure a huge improvement when compared to Glue.

Solaris 10/x86 gcc “Segmentation Fault (core dumped)”

1 Star2 Stars3 Stars4 Stars5 Stars (3 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by johng on Mar 13th, 2007

I compiled openssl-0.9.8e on a fully patched Solaris 10 x86 server, but when the resultant binaries were executed they all resulted in “Segmentation Fault (core dumped)”. After wasting some time trying different compiler options I came across a fix.

http://www.openssl.org/~appro/values.c

It seems strange that Sun ship a version of gcc unable to correctly compile code which makes us of the “.init” segement. It also seemed odd that this issue isn’t more widely publicised.

Starting an Oracle instance & Hugepages

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Mar 12th, 2007

Hugepages are a feature of Linux to increase the memory page size so instead of allocating memory in 4Kb blocks it gets done in 2MB chunks, other OS’s have similiar features. In theory this should enable you to have better scaling with large memory allocations. It is relatively straightforward to set up, just add an entry like the following in the /etc/sysctl.conf file:

vm.nr_hugepages = 5000

This allocates 10GB of memory to be allocated via hugepages. However to get oracle to utilise this memory, I find I have to startup a 10gR2 RAC instance using srvctl:

srvctl start instance -d DATABASE -i INSTANCE

To be clear, when starting up the RAC instance via sqlplus the memory for the instance DOES NOT get allocated from the hugepages pool, only when you startup with srvctl. The trouble starts when you are doing an upgrade, say installing a patchset, you have to startup the database with the migration option to apply the changes to the data dictionary. Except you cannot use srvctl to startup upgrade, though it does accept various options, like mount & nomount.

Be careful if you are using hugepages that if you startup your instance with just using sqlplus you may find yourself out of memory, I found I had to change the memory & the number of processes to get the instance to start for the upgrade.

DNSSEC zone walker

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by alexd on Mar 8th, 2007

We recently released a new tool; here’s how it describes itself :

The Rapid Enumeration Tool (RET) is designed to use DNSSEC NSEC records to enumerate quickly zone data whilst evading detection by systems which might be designed specifically to identify zone enumeration activity.

It does this by using one or more open recursive resolvers to forward queries to the authoritative name servers for the zone. Each resolver is configured with its own `personality’, specifying query rates, query failure/success ratio, proportions of query types, query name decoration, etc. This allows the RET to feed queries to each resolver, that are specifically tailored to match the queries that a resolver might typically send to the authoritative name server.

Unlike other NSEC resource record `walkers’, the RET does not explicitly query for NSEC RRs to walk the zone. Instead, it combines a `walker’ approach with a dictionary attack (combined with a random name generator for more awkward cases). This means that discernible artifacts in the pattern of queries that arrive at the authoritative servers should be minimised.

For anyone who might be interested, the download is here.

We started off by using dnsjava but ran into serious problems with the threading model. We were firing off lots of queries all the time, and it seemed that dnsjava wasn’t really designed for heavier use. Each time a new query was sent, dnsjava would fire up another thread to handle the response (which could take minutes to time out).

Our original design compounded this problem by having two threads for each resolver (one a query sending thread, and the other to fill the token bucket which limited the query rate).

So, with a couple of hundred resolvers all being targetted with a high query rate (some of which would time out), there would eventually be thousands of threads running and the system would grind to a halt.

In order to solve this problem, we first had to fix dnsjava. Rather than modify the library itself, we decided to make an independent extension. This extension extended the Resolver implementation to offer a new NonBlockingResolver, with a ResponseQueue interface to the caller.

Now we could run the DNS library in three threads rather than (potentially) three thousand!

We could also use the new ResponseQueue interface to reduce the number of threads running in our code. Instead of a query-sending thread per resolver, we could now run the RET in three threads : a NameFiller, a QuerySender and a ResponseThread. We now had seven threads in total (one to control the whole thing).
The real work is done by the NameFiller and the ResponseThread, which both make heavy use of the NameManager and HoleManager classes; synchronisation abounds!

The hole and name managers are made particularly complex due to the requirement for the RET to be a simple cross-platform download - there could be no database in use here. Instead, all the data had to be kept in Java Lists - these are not at all efficient for larger data sets. This meant that the RET had to control the number of holes to no more than a few thousand at any point (even when enumerating a zone with millions of domains).

Add to this :

  • the requirement for the majority of queried names to be for names which exist in the zone (but have not been queried for recently)
  • the fact that a lot of domains are “unhelpful” (they won’t yield an NSEC record on direct querying, but instead need a new hole to be opened after them by the guessing of a name which would come soon after)

and the situation becomes quite difficult!

If the RET were for use in earnest, then the List would be replaced with a database and the tight grip on the number of holes could be loosened. This would make it pretty much indetectable.

New release of dnsjnio

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by alexd on Mar 7th, 2007

Those who use the dnsjnio non-blocking I/O extension to dnsjava might be interested in a new release (0.9.6) which fixes a minor issue.

Previously, there was a slight race hazard which could kill dnsjnio if network connectivity was lost at just the wrong time during a bout of heavy DNS traffic. This release removes this issue.

Prototype and script.aculo.us problems in Rails 1.2

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 2.5 out of 5)
Loading ... Loading ...
Posted by jay on Mar 2nd, 2007

Whilst working through the examples in the “Ajax on Rails” book I had all sorts of problems with Prototype and script.aculo.us. For a start, there is a simple example of a link that toggles a div between visible and invisible, like this:

<%= link_to_function "Toggle DIV", "$('indicator').toggle()" %>

But all I got with this was the error message “toggle is not a function. I even tried replacing toggle() with show() and got another error. Eventually I upgraded from Prototype 1.4 to 1.5 and the problem went away.

Now all through this, script.aculo.us had been working fine. But then I tried another simple example:

Effect.toggle('indicator', 'blind')

Again I got the error message that ‘toggle is not a function’. So this time I upgraded script.aculo.us to 1.7 and it all seems to work fine.

Having said that all I’ve done is upgrade within my current project so now I need to work out how to upgrade the prototype and script.aculo.us built into Rails so that all new projects start with the new versions.

Recent Posts

Highest Rated

Categories

Archives

Meta: