random technical thoughts from the Nominet technical team

Oracle data skew & statistics

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
Posted by jason on Mar 28th, 2008

I recently encountered a classic example of the Oracle Optimizer being fooled by data skew. James Morle, presented at Hotsos recently on how the vast majority of issues are caused by skew and latency, certainly I have seen many bad plans chosen by the Oracle optimizer that were the result of the optimizer not being aware of the distribution of the data.

So I have a view looking like the following:

 Name                       Null?    Type
--------------------------  -------   --------------
 KEY                        NOT NULL VARCHAR2(255)
 SUFFIX                     NOT NULL VARCHAR2(255)
 INSTANCE                   NOT NULL NUMBER(10)
 NS_ID                      NOT NULL NUMBER(10)
 CREATED                    NOT NULL DATE
 CREATED_BY                          VARCHAR2(200)
 REMOVED                             DATE
 REMOVED_BY                          VARCHAR2(200)

The type of query that we were having issues was of the form:

SELECT COUNT(*)
FROM NS_ON_DOMAINS
WHERE KEY = :B4
AND SUFFIX = :B3
AND INSTANCE = :B2
AND NS_ID = :B1
AND REMOVED IS NULL

First thing to bear in mind, is that key, suffix is pretty selective. ns_id, can be highly selective, but it can also be extremely unselective. I used the technique described by Greg Rahn to determine why the optimizer was choosing a particular plan that was providing a response time of the order of 3 minutes.

---------------------------------------------------------------------------------------------
| Id  | Operation   |Name|Starts|E-Rows|A-Rows|A-Time|Buffers|Reads                         |
---------------------------------------------------------------------------------------------
| 1 |SORT AGGREGATE  |  |  1 |     1 |     1 |00:02:07.29 | 19668 | 17459                   |
|*2 | TABLE ACCESS BY INDEX ROWID| NS_ON_DOMAINS|1 | 1 | 0 |00:02:07.29 |19668 |17459       |
|*3 |  INDEX RANGE SCAN          | IX_NS_ON_DOMAINS_NS | 1 |1 |49495 |00:00:00.32 |178 |177 |
---------------------------------------------------------------------------------------------

So the optimizer has chosen to use the IX_NS_ON_DOMAINS_NS with the NS_ID value, and thinks this will be highly selective. Very often this turns out to be the case, however in a particular case, this value is completely non-selective and the actual numbers of rows is quite huge.

Forcing the optimizer to use the other obvious index, the one on key, suffix, and instance we have the following:

-----------------------------------------------------------------------------
| Id  | Operation| Name |Starts|E-Rows|A-Rows|A-Time|Buffers|Reads          |
------------------------------------------------------------------------------
|  1|SORT AGGREGATE             |    |  1 |  1 | 1 |00:00:00.01 |4 |      1 |
|*2| TABLE ACCESS BY INDEX ROWID| NS_ON_DOMAINS |1 |1 |0 |00:00:00.01 |4 |1 |
|*3|  INDEX RANGE SCAN  | IX_NS_ON_DOMAINS_DOMAIN |1|3|0 |00:00:00.01 |4|1  |
-----------------------------------------------------------------------------

Now the optimizer is over estimating the number of rows it thinks it will find via this access path. This tells us why for the particular values of the bind variables we were looking at, that the plan chosen by the optimizer is not the most optimal path to the data. The optimizer is being tricked by the fact we have a very large skew on this data.

In the end we decided to use a hint for this particular query to force it to use the better index.

One Response

  1. Alexis Gil Gonzales Says:

    Nice case.
    It would be interesting to see what an event 10053 said…

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.

Recent Posts

Highest Rated

Categories

Archives

Meta: