random technical thoughts from the Nominet technical team

How to corrupt your Data Dictionary with Oracle Streams

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 4 out of 5)
Loading ... Loading ...
Posted by arjan on Jun 3rd, 2011

Hopefully this blog will save you some trouble if you are working with Oracle Streams. If you do what I did on a production database you will end up with a corrupt Data Dictionary. Luckily I didn’t encounter this issue in production, but in development.

I was preparing our target data warehouse on a 11.2.0.2 development database (Linux x86_64)  following Note 753158.1 from “My Oracle Support”. It shows how to configure your database for downstream data capture. Basically one creates a tablespace that is the default tablespace for the Streams Administrator and you create a user that becomes the Streams Administrator.

create tablespace bi_streams datafile ‘&&datafile_dir.bi_streams01.dbf’
size 256m autoextend on next 10m maxsize unlimited;

exec dbms_logmnr_d.set_tablespace (‘bi_streams’);

drop user bi_streams_admin cascade;

create user bi_streams_admin identified by pwd
default tablespace bi_streams
temporary tablespace temp
quota unlimited on bi_streams;
Notice the call to dbms_logmnr_d.set_tablespace (‘bi_streams’);. The documentation tells us this about the function of that procedure:

The SET_TABLESPACE procedure re-creates all LogMiner tables in an alternate tablespace.

Let’s say your scripting this configuration and something went wrong in a later stage. Now you want to retry this configuration and you drop the tablespace you mentioned in the call to dbms_logmnr_d.set_tablespace (‘bi_streams’); to start with a clean slate. This causes trouble though and potentially a lot of trouble. The next call to dbms_logmnr_d.set_tablespace (‘bi_streams’); fails with this error.

ERROR at line 1:
ORA-04063: package body “SYS.DBMS_LOGMNR_INTERNAL” has errors
ORA-06508: PL/SQL: could not find program unit being called:
“SYS.DBMS_LOGMNR_INTERNAL”
ORA-06512: at “SYS.DBMS_LOGMNR_D”, line 135
ORA-06512: at line 1

On investigation I found a list of over 100 invalid objects all related to streams and log mining. This probably makes senses as the tablespace that I dropped may have contained objects that Data Dictionary objects or on which they are based. So running utlrp.sql didn’t help anymore.

I opened an SR with Oracle and they advised to drop all  the invalid objects and rerun catalog.sql and catproc.sql, however that didn’t help either. Oracle then advised/asked whether the database could be recreated or restored. Luckily this was still a test environment that was empty, so that was no problem. However, imagine you do this on a Live database with 24×7 uptime requirements. You will then be lucky if you have a backup of that tablespace and can restore it, validating the objects again. Otherwise you could be looking at restoring your production database.

I decided to protect myself against this issue by building the Logminer Data Dictionary in the SYSAUX tablespace which it does by default. I am much less likely to try dropping that tablespace. In fact, this is only possible when the database has been opened in MIGRATE mode. It protects me from corrupting the Data Dictionary and I don’t have to call dbms_logmnr_d.set_tablespace.

What if the LogMiner Data Dictionary is already in a custom tablespace that you now want to drop ? You can actually call DBMS_LOGMNR_D.SET_TABLESPACE(‘SYSAUX’) and this will  move the existing objects to the SYSAUX tablespace. That enables you to drop the previous tablespace without corrupting the Data Dictionary.

A New Application Development Architecture

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 3.5 out of 5)
Loading ... Loading ...
Posted by patrick on Mar 20th, 2009

I attended the 2009 Hotsos Symposium, an excellent Oracle database performance tuning conference, in Dallas. The event was a great opportunity to hear world-renowned Oracle performance experts present.

One of the most interesting talks was “The Helsinki Declaration: A set of Principles for the IT Community regarding Application Development” by Toon Koppelaars.

Toon described the expansion of features in the Oracle database over the years. He went on to explain that since the advent of Java, more and more functionality has been implemented outside the database. However new frameworks, methods and languages are appearing frequently and often disappearing quickly, sometimes within a couple of years. Many developers are constantly chasing the latest technology because it’s cool and will allegedly solve all presently-experienced problems. This leads to code quickly becoming legacy, having to be re-written and/or no developers having the necessary skills to maintain it. For example how happy or able would your Java developers be to maintain a system built using Struts, a relatively young framework, but now commonly seen as legacy. Would they first spend ages rewriting it, these days called refactoring, to use Spring, the effort for which gives no value to the user.

Although these technologies are changing, what users want has not changed; they still largely want “window on data” applications.

While this is happening the database technology is remaining stable and under-utilised.

Toon recommends replacing this traditional architecture. He has successfully deployed systems using a new architecture, named The Helsinki Declaration (that’s where it was first proposed). This architecture has a thin user-interface layer, deployed in whatever technology/framework is flavour of the month, and business logic and data logic layers implemented in the comparatively very stable database. Only the thin user interface is then vulnerable to the latest fad.

This is described well on his blog. I recommend starting with his first observation and then proceeding to the second, third and fourth observations.

Talking with conference attendees afterwards I was surprised (or maybe I shouldn’t have been) by how many had experienced exactly the issues Toon described on systems development and maintenance projects.

Oracle Online Table Redefinition and Data Transformation

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 3 out of 5)
Loading ... Loading ...
Posted by arjan on Feb 20th, 2009

Introduction

Oracle realizes that these days the up-time of applications is critical. Therefore they have developed ways to reduce the downtime for applications during application upgrades. At Nominet we have had some software releases that took longer than expected and we have had some table redefinitions during upgrades that run for several hours. That makes it worthwhile to have a look at Oracle’s methods to limit the disruption for the up-time of applications. This article describes a way to redefine Oracle table structures and data while that table remains online and accessible for reads and writes in its original form for Live applications.

What can you do

Oracle has created a PL/SQL package called DBMS_REDEFINITION that allows you to restructure a table definition and transform the data in a table. That means you can:

  • Add columns
  • Drop columns
  • Change the data type or length of columns
  • Transform the data inside the table using standard and user-written PL/SQL functions

When is this handy

This capacity is very useful when you want to change a big table that is often used by an application. The changes you want to apply will cause an unacceptable amount of downtime by creating many locks on the table or making it inaccessible for the application otherwise.

How does it work

You create an empty table with a different name than the source table and with the new structure that you want to achieve. This table need not have any of the constraints, grants or indexes that the source table has. You then start the redefinition using the PL/SQL package DBMS_REDEFINITION. This creates a materialized view with a materialized view log in the background. These two structures capture all the changes to the source table while you are transforming the data into the empty table you have created before the redefinition started. Once the transformation is finished you can let Oracle copy all the indexes, constraints and grants to the new table with one procedure call. The last step is the renaming of the new table to the original name. This is the only moment that the table will be locked for a millisecond.

Example

Let’s first create a source table and fill it with some data. We’ll also add an index and grant to prove that these will also be copied to the new transformed table.

create table customer(
 id               number      primary key
,name             varchar2(30)
,address          varchar2(30)
,phone_number     varchar2(10)
,country_prefix   varchar2(5)
,county_id        number
,credit           number
);

create index cust_name_idx on customer(name);grant select on customer to public;

truncate table customer;

begin
   for i in 1..3
   loop
      insert into customer values (i, 'ARJAN', 'OXFORD', rpad(i, 10, i), '0044', i, i*1000);
   end loop;
end;
/

commit;

Let’s look at the data before transformation.

select * from customer;

        ID NAME                           ADDRESS                        PHONE_NUMB COUNT  COUNTY_ID     CREDIT
---------- ------------------------------ ------------------------------ ---------- ----- ---------- ----------
         1 ARJAN                          OXFORD                         1111111111 0044           1       1000
         2 ARJAN                          OXFORD                         2222222222 0044           2       2000
         3 ARJAN                          OXFORD                         3333333333 0044           3       3000

3 rows selected.

Now let’s start the redefinition process. First we’ll create a user-defined PL/SQL function to show that you can make customized transformation to your data.

create or replace function transform_credit(p_credit number)
return number
is

begin
   return ((p_credit/3) + 10);
end;
/

At this point we are ready to go through the redefinition process. This entails 3 calls to procedures within the DBMS_REDEFINITION package. The first call starts the process and defines the column mapping between the source and target table.

create table customer_tmp(
 id               number
,name             varchar2(30)
,address          varchar2(30)
,phone_number     varchar2(15)
,county_id        number
,credit           number
);

You can see that we are doing the following transformations:

  • Using 2 Oracle defined functions to change the columns “NAME” and “ADDRESS”
  • Concatenating the country code to the phone number and in effect dropping the column country_prefix
  • Transforming the value in the column credit with our user-defined function

Suppose that this transformation will take several hours and changes are being made by a Live application to the data in the source table that we don’t want to loose. To prove that these data will be in the transformed table we’ll insert some more records into the source table from a different session.

begin
for i in 4..6
loop
   insert into customer values (i, 'JAN', 'AMSTERDAM', rpad(i, 10, i), '0031', 1, i*1000);
end loop;
end;
/
commit;

Once the transformation is finished we can copy all indexes, constraints and grants to the target table with one simple call.

set serveroutput on

declare
   l_errors number;
begin
  dbms_redefinition.copy_table_dependents
  ( user, 'CUSTOMER', 'CUSTOMER_TMP',
    copy_indexes => dbms_redefinition.cons_orig_params,
    num_errors => l_errors );

  dbms_output.put_line('Errors: '||l_errors);
end;
/

During this transformation the source table has been available for reads and writes for the Live application. We are now ready to rename the table so the application can read and write from the new target table.

begin
    dbms_redefinition.finish_redef_table( user, 'CUSTOMER', 'CUSTOMER_TMP' );
end;
/

Let’s look at the result.

select * from customer;
        ID NAME                           ADDRESS                        PHONE_NUMBER     COUNTY_ID     CREDIT
---------- ------------------------------ ------------------------------ --------------- ---------- ----------
         1 Arjan                          Oxford                         00441111111111           1 343.333333
         2 Arjan                          Oxford                         00442222222222           2 676.666667
         3 Arjan                          Oxford                         00443333333333           3       1010
         4 Jan                            Amsterdam                      00314444444444           1 1343.33333
         5 Jan                            Amsterdam                      00315555555555           1 1676.66667
         6 Jan                            Amsterdam                      00316666666666           1       2010

6 rows selected.

As you can see the data have been transformed and all changes made to the source table during the transformation have also been transformed and migrated to the new target table.

Limitations

It would be great if you could add a column and fill it with values selected from a different table (for example when you want to denormalize a table). This is not possible though mainly due to the limits on materialized views in Oracle.

Conclusion

The use of DBMS_REDEFINITION can decrease the downtime of your applications. Its use is quite simple compared to inventing and coding this functionality manually. It cannot be used in all circumstances, but it is certainly a worthwhile addition to the toolkit of any developer or DBA.

Implementing SYSASM privileges in Oracle 11g

1 Star2 Stars3 Stars4 Stars5 Stars (2 votes, average: 3 out of 5)
Loading ... Loading ...
Posted by arjan on Jan 21st, 2009

The introduction of Automatic Storage Management (ASM) in Oracle 10g has added a new set of skills needed to manage the Oracle technology stack. In some organizations this role is not filled by the DBA, but a different specialist. In its latest release 11g, Oracle wants to give organizations the opportunity to split the management of the RDBMS and ASM securely between different people. For this purpose the new Oracle role SYSASM has been created. Authenticating connections as SYSASM can be done in two ways:
1.    Password authentication through the database
2.    OS authentication via membership of a specific group (by default “dba”)

For this article I would like to focus on the case of OS authentication. Imagine the case where you originally did not want to separate the ASM and RDBMS administrator roles. Let’s say a security audit has now prescribed that the two roles should be separated and/or a storage specialist called tom has been hired to manage ASM. How do you do it ? Well, first it is good to know that the Unix groups that have the privilege to connect as SYSDBA are compiled into the oracle executable, normally during installation of the software.
Do you need to do a complete new installation to compile a new “asm” group role into the oracle executable ? No, you just need to know which file to edit and then relink the Oracle executable. Here is how to do it:

First create the OS group “asm”:

  groupadd -g 2003 asm

The OS groups that have the different Oracle roles assigned to them are configured in $ORACLE_HOME/rdbms/lib/config.s (on Linux).
Append the group “asm” to the OS user of a storage administrator called tom:

  usermod -a -G asm tom

Shutdown all databases running from this ORACLE_HOME with the IMMEDIATE option.
Make a backup of config.s and edit these lines in the original:
Change:

  #define SS_ASM_GRP "dba"

To:

  #define SS_ASM_GRP "asm"

Make a backup of config.o:

  mv config.o config.o.orig

Relink the oracle executable with:

  make -f ins_rdbms.mk ioracle

You can now connect as the OS user tom, set your ORACLE_SID to point to the ASM instance and connect as follows to start the ASM instance:

[tom@test1 ~]$ sqlplus / as sysasmSQL*Plus: Release 11.1.0.6.0 - Production on Wed Jan 21 14:45:57 2009

Copyright (c) 1982, 2007, Oracle.  All rights reserved.
Connected to an idle instance.

SQL> startup;
ASM instance started
Total System Global Area  267227136 bytes
Fixed Size                  2143544 bytes
Variable Size             239917768 bytes
ASM Cache                  25165824 bytes
ASM diskgroups mounted
SQL>

Hopefully this clarifies the way Oracle has implemented the OS authentication. It also may save you the hassle of doing a new installation and having to save your configuration files like tnsnames.ora and possibly init.ora’s.

Using gqlplus with Instant Client on Mac OS X

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by chris on Sep 15th, 2008

I need to use Oracle’s SQL*Plus quite often from my Mac but Oracle’s support for OS X is sketchy to say the least.  You can’t install the proper client on Intel machines, so you have to use the Instant Client instead.  I also want to use gqlplus, which adds things like a command history to SQL*Plus.  (You may be amused to know that SQL*Plus was once called UFI, meaning “User Friendly Interface”, not sure how it could be made less friendly really).  After a bit of trial and error, I got it working.  This is how I have it set up:

  • I installed instantclient into its own directory
  • I added this directory to PATH and to both LD_LIBRARY_PATH and DYLD_LIBRARY_PATH environment variables
  • I put tnsnames.ora and login.sql (for commands to be run at login time) in my home directory
  • I pointed the TNS_ADMIN and SQLPATH environment variables to my home directory
  • I built and installed gqlplus in /usr/local/bin

I have also set up an alias so that gqlplus runs with the -d flag.  This turns off the auto-completion which can make gqlplus slow to load.

Test database creation with SAN / NAS snapshots

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by arjan on Aug 27th, 2008

Any organization using databases needs many copies of the live database for development, training, beta and performance testing. As a DBA a considerable amount of time can be spent on provisioning these databases. For the organization it often means that large parts of a SAN or NAS storage array are taken up by storing all the copies. Traditionally a DBA might make backups of the live database to tape and when asked for a test database restore the backup from tape to a different host.

However the current snapshot technology in many storage arrays offers an effective way to save a lot of space and time in this provisioning process.  A snapshot can be described as a virtual copy of a logical source volume. Behind the scenes the storage array lets the snapshot point to the same disk blocks as the source volume for reading of the blocks. Only when either the source volume or the snapshot data changes the storage array writes the values that the snapshot needs to see to a separate place on disk. This way the storage array maintains a consistent view of a whole volume using up much less disk space. The amount of disk space saved depends of course on the amount of changes applications make against the source volume and snapshot. For databases it is on average true that the changes aren’t many relative to the size of the database.

Saving disk space

The advantage here is clear. We can save a lot of disk space by using snapshots rather than full copies of a database. Suppose you have a Live database of 200 Gb and need ten copies of that live database. You could spend nearly  2 TeraByte on that (10* 200 Gb). If you use snapshots and reserve 20% of diskspace for each copy to allow for database changes you will use:

(200Gb + ((0.2 * 200Gb) *10)) = 600 Gb

That means a 1.4 Terabyte saving.

Saving time

The other advantage of using snapshots is in saved time. Creating a snapshot is usually a matter of seconds or perhaps a few manual actions in the GUI to the storage array. Restoring a backup of 200 Gb in the traditional way from tape can take up to half a day or more. And you’ll have to repeat that for every copy needed again.

In summary, I think snapshots put less pressure on your disk space management and allows you to provision databases faster so that developers and testers can get on with their jobs faster.

Performance gains of the TABLE operator

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by arjan on Aug 19th, 2008

Following up from my previous post on using SQL on collections using the TABLE operator, I did a little test on my assertion that the TABLE operator will be faster. I removed the spooling to screen in the code to make sure that the actual gains are better illustrated by the elapsed time. Remember the original table X (see the previous post for the PL/SQL procedures):

 Name                                                              Null?    Type
 ----------------------------------------------------------------- -------- --------------------------------------------
 ID                                                                         NUMBER
 NAME                                                                       VARCHAR2(30)

I filled the original table X with 100.000 records and made a collection filled with 50.000 ID’s. See the results below:

declare
   test_coll test_coll_type := test_coll_type();
begin
   test_coll.extend(50000);
   for i in 1..50000
   loop
      test_coll(i) := i;
   end loop;
   who_walks_the_dog1(test_coll);
end;
 11  /

PL/SQL procedure successfully completed.

Elapsed: 00:03:36.57

declare
   test_coll test_coll_type := test_coll_type();
begin
   test_coll.extend(50000);
   for i in 1..50000
   loop
      test_coll(i) := i;
   end loop;loop;
  9
   who_walks_the_dog2(test_coll);
end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.21

That is an amazing difference. But of course, we haven’t indexed the ID column on the table yet. That causes 50.000 full table scans on a 100.000 record table.

explain plan for select name from x where id = 1;
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     8 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| X    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------

Let’s make this comparison fairer and see what happens when we create an index on the ID column and analyze the table.

create index x_idx on x(id);
analyze table x compute statistics;
explain plan for select name from x where id = 1;
-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |    12 |     1   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| X     |     1 |    12 |     1   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | X_IDX |     1 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

We are using the index now for the statement that doesn’t use the TABLE operator. What happens now when we execute the procedures again ?

declare
   test_coll test_coll_type := test_coll_type();
begin
   test_coll.extend(50000);
   for i in 1..50000
   loop
      test_coll(i) := i;
   end loop;
   who_walks_the_dog1(test_coll);
end;
 11  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:02.05
declare
   test_coll test_coll_type := test_coll_type();
begin
   test_coll.extend(50000);
   for i in 1..50000
   loop
      test_coll(i) := i;
   end loop;loop;
  9
   who_walks_the_dog2(test_coll);
end;
 12  /

PL/SQL procedure successfully completed.

Elapsed: 00:00:00.17

Using the TABLE operator is still about ten times faster in this case. The more records you use the greater the gains. The converse is true as well.

Query collections with SQL as if a normal table

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by arjan on Aug 18th, 2008

You may have worked with collections in SQL and PL/SQL already. Collections are basically arrays holding columns or records of a user-defined data type. They come in various flavors. They can be defined as PL/SQL variables, but also as data types within the database. For example like this:

create type test_coll_type is table of number;
/

Imagine you have a procedure receiving such an array filled with id’s as an IN parameter. Within the procedure we need to check whether any of the values in this array exist within a table in the database. One’s first idea to solve this would probably be something like this:

create table x(id number, name varchar2(30));

ID NAME
---------- ------------------------------
1 James
2 Mark
3 John

create or replace procedure who_walks_the_dog1(p_coll   test_coll_type)
is
   l_name varchar2(30);
begin
   for i in p_coll.first..p_coll.last
   loop
     if p_coll(i) is not null
     then
      begin
      select name into l_name from x where id = p_coll(i);
      exception
      when no_data_found then null;
      end;

      dbms_output.put_line(l_name||' walks the dog');
     end if;
   end loop;
end who_walks_the_dog1;
/

Of course that is not very pretty code and it took me longer to write than I initially thought it would take. It also means a lot of calls to the database. That creates a lot of switches between the SQL and PL/SQL engine causing performance degradation.

It would be ideal if we could just take the list of ID’s and use it in a WHERE clause saying “WHERE id in (list of id’s)”.

Here is how you do it. You use the TABLE operator that basically converts a collection to a normal table so you can use the collection in SQL statements. That looks like this:

create or replace procedure who_walks_the_dog2(p_coll   test_coll_type)
is
type name_tabtyp is table of varchar2(30) index by binary_integer;
name_tab name_tabtyp;
begin
select name bulk collect into name_tab from x where id in (select column_value from table(p_coll));

if name_tab.count > 0
then
for i in name_tab.first..name_tab.last
loop
dbms_output.put_line(name_tab(i)||' walks the dog');
end loop;
end if;
end who_walks_the_dog2;
/

The TABLE operator allows you to write shorter, simpler code that’s more intuitive to read and it should be faster especially when you process a lot of rows. Test it like this.

declare
   test_coll test_coll_type := test_coll_type();
begin
   test_coll.extend(5);
   test_coll(1) := 1;
   test_coll(2) := 2;

   who_walks_the_dog2(test_coll);
end;
/

Collections and the TABLE operator have lots more interesting applications. Perhaps more in the next post.

Finding 2 letter domains with Oracle regular expressions

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Aug 15th, 2008

I had cause recently to search for domains that contained 2 and only 2 letters. One way of performing this is to use the relatively new Oracle support for querying with regular expressions – regexp_like. The Oracle implementation of regular expressions follow enhanced POSIX support.

First off how not to do it. If you attempt something like the following:

select * 
from domains
where key like '__';

This will indeed return all domains containing 2 characters, but it also returns domains with numbers in them as well and this was not what we were interested in. It really had to be a regular expression:

select * 
from domains
where regexp_like (key, '^[a-z]{2}$');

So how do we decode this expression?

    ^ In this position actually donates the beginning of the data. We need this as we want to find 2 occurances and only 2 occurances so we start counting from the beginning of the data, rather than at some arbitrary point.
    [a-z] is the standard find me any alphabetic character, we know the data is lowercase.
    {2} states we need to find 2 occurances of the previous expression, i.e. 2 alphabetic characters.
    The $ signifies the end of the line. All domain names contain 2 or more characters we do not want to find occurrances that have more than 2 characters.

I’m sure there is more than one way of performing this little bit of sql, but certainly regular expressions make it easier, though they can take a bit of getting used to.

Nameservers and very large zones

1 Star2 Stars3 Stars4 Stars5 Stars (1 votes, average: 5 out of 5)
Loading ... Loading ...
Posted by jay on Jun 2nd, 2008

There comes a point in a zone’s life when it gets too big to be held in memory. For TLDs we really only have .com that has reached this scale, but ENUM zones with this problem are numerous.

So if you want to run an authoritative nameserver for these zones you basically have these options:

  1. Use a DB plugin to an existing nameserver like BIND. However the performance from this is so poor that this is not a reasonable option in most scenarios.
  2. Buy an off-the-shelf nameserver like ANS from Nominum. However this is only a real option for the fantastically rich or those with such a controlled network they can make do with just two nameserver instances.
  3. Use a service provider that runs the zones for you and has their own database back end technology. We use UltraDNS who have this functionality.
  4. Write your own. Of course this is all Nominum and UltraDNS have done and more recently CommunityDNS, so how hard can it be.

Common misconceptions about a database back end

So how exactly do you go about designing the database back end for a nameserver? Well, in my opinion most people start back to front and continue that way.

What gets most people excited about using databases as a back end is three things. Each of which is quite wrong and can be dismissed in turn.

Use the main registration database

If you start using databases then why not have the nameserver run off the main registration database. That is already a fault tolerant cluster and it would mean that the nameserver was always instantly up to date.

This is madness for several reasons. First, no database has 100% uptime but a nameserver cluster should. To be clear, that does not mean every nameserver available 100% of the time, but at least one nameserver can be accessed 100% of the time.

Then there is the issue of zone file serial numbers. Are you going to update the serial number for every single update of the database (yes you have to)? What happens when things go wrong and need to be unwound? What happens if you need to restore from a backup?

There are also the performance issues from the way the data is stored. It is likely to be optimised for the registration system, not the nameserver system that pulls it off.

Finally this limits you to nameservers that are connected to the database by a reliable, fast and secure channel. Maybe possible in a single enterprise but not to be tried across the public Internet.

In essence running it this way is just too brittle and should not be considered. To be honest I don’t think any experienced people would think of that, but I wanted to make sure I covered all the options.

Multi-headed nameserver

The second cause for excitement is the possibility of a multi-headed nameserver. In other words, lots of front ends all dealing with the same database back end (which we assume is separate from the source database). The reasoning for this is that databases are large, expensive beasts, optimised for handling data requests from many clients. Whereas the front ends are much lighter, simpler beasts that are optimised around network processing. So fitting the two together seems to be a natural fit.

It also means that updates are only processed by one machine, the database, not directly impacting the others. Fewer updates need to be sent and there is less chance of inconsistency of the data.

In order to see why this is wrong (in most cases) we need to think about what kind of database do we really need for a nameserver. It turns out that we don’t actually need most of the features found in a modern RDBMS. For example we don’t need views, stored procedures, pluggable indexes etc. All we need is a simple, fast and efficient database, which doesn’t require hardware of a different nature to the front end.

The next point is to examine where the bottlenecks of performance are in that setup. The most obvious one is whenever the front end needs to ask the database for the data rather than use its cache. It starts with the network transfer then the contention for the lookup, it may have to go to disk, then the network transfer between the front end and back end. I would contend that with a multi-headed setup these bottlenecks are much worse than a simple 1-1 configuration of one front end and one database both on the same machine.

Note that one of the bottlenecks is not the processing of updates. The time spent processing updates is about one thousandth the time spent handling requests (or less) and so has no impact of the overall performance unless it is very badly implemented. Having more updates as a result of the 1-1 config is not going to outweigh the benefits.

Database replication

The third cause for excitement is the possibility of using database replication. It brings up a vision of nameservers magically being up to date across a whole cluster without anyone doing anything. After all databases are supposed to be good at replication.

This is the easiest point to rebut. Database replication is general purpose and proprietary. Whereas DNS already has an open standard, tried and tested and DNS specific database replication mechanism – AXFR and IXFR. Yes I hear you say, but using database replication means that you don’t need to serialise the data into DNS and back again and translate between DNS packet structure and the database structure.

Well for a start, the database structure should be very close to the DNS packet structure, not much point in having it any other way. Then of course even database replication has to do the serialisation at some point or another, so is there any thing actually gained by it?

The real killer though is the impact on caches. A sensible nameserver implementation will have multiple levels of data store, with a small, fast, pre-compiled cache first in line to check, then the database next which will aim for an in-memory hit first and finally a disk hit if all else fails. If you allow data to enter the nameserver directly at the database level then you need a mechanism for the database to signal the cache that it may have invalid data, and then have the cache refresh itself.

In my view this is simply too complex and will just slow down performance. Far easier to do away with it and have all updates come in the front door, where the nameserver knows they are and so can update its cache accordingly.

Ideal nameserver design

So putting this altogether gives me an ideal design for the database back end in a nameserver. As you’ve probably guessed, the big emphasis for me has been on performance, but that’s because adding a database is bound to be slower than an in-memory representation and so everything possible needs to be done to compensate for that.

In this design then the database is a lightweight embedded database in every single instance of the nameserver. It is stripped of all unnecessary features and the data tables and indexes are optimised solely for great DNS performance.

All access to this database is through the front end – either IXFR, AXFR, DDNS or a control process. No round-the-back access to the database is possible, it is hermetically sealed.

You may even be lucky and find a database used like this obviates the need for a separate cache.

Next »

Recent Posts

Highest Rated

Categories

Archives

Meta: