Wednesday, September 9, 2009

More about ORDER BY clause


Every one knows the purpose and the use of ORDER BY clause in SQL Server, but most may not know how ORDER BY works, in-side SQL Server. The implementation of T-SQL is based on set theory. Each table is considered as a set. Let’s take an example of Customer and Order tables. They are actually considered as two sets. The records of the table are identical to members or elements in sets. The T-SQL statements will not work as sequential language execution. Consider the T-SQL statement below.


use AdventureWorks
GO

SELECT TOP 100 SH.CustomerID,SH.SalesOrderNumber,SD.OrderQty,SD.UnitPrice FROM Sales.SalesOrderHeader SH
INNER JOIN Sales.SalesOrderDetail SD
ON SH.SalesOrderID=SD.SalesOrderID
ORDER BY SH.CustomerID


When you execute the query above, it will not execute sequentially within the SQL Server. Even though the SELECT statement is appeared first it will not execute as the first statement. Instead it will execute at a later part of the query (The execution order of a T-SQL query is beyond the scope of the article). During execution, T-SQL query will go through several virtual tables. The final result set of the query also displays as a virtual table.



But when you include the ORDER BY clause in your query the final result will not be a table, instead it returns as a cursor output. Because, a set does not have predetermined order of its members. It is a logical collection of members and the order of the members should not matter. In other words you can not perform an ORDER BY in a set simply by using set concepts. This is the reason the SQL Server uses cursors for this operation. ORDER BY is a costly operation to SQL Server (The use of cursors is anyway a costly operation).


Consider the T-SQL query below.



--Query1
USE AdventureWorks
GO

SELECT SalesOrderID,OrderQty,ProductID,UnitPrice FROM Sales.SalesOrderDetail
ORDER BY SalesOrderID
GO


Look at the execution plan of this query.



You may wonder why the execution plan does not contain the SORT operator. The obvious reason is SalesOrderID is the clustered index of the table, Sales.SalesOrderDetail which means the data is physically ordered by using the key of SalesOrederID column. So there will not be any additional cost for this sorting.

Let’s look at the query below;


--Query2

USE AdventureWorks
GO

SELECT SalesOrderID,OrderQty,ProductID,UnitPrice FROM Sales.SalesOrderDetail
ORDER BY UnitPrice
GO
Look at the execution plan of this query.



Now you can see the SORTing is happening here and it consumes 71% percent from the total cost of the query.
Consider this query;



--Query3
USE AdventureWorks
GO

SELECT SalesOrderID,OrderQty,ProductID,UnitPrice FROM
(SELECT SalesOrderID,OrderQty,ProductID,UnitPrice FROM Sales.SalesOrderDetail
ORDER BY UnitPrice) A




I have used ORDER BY clause withing the sub-query, let’s execute this and see the result.

What do you expect as the result of this query?

Think before it executes.

Below is the output;



Msg 1033, Level 15, State 1, Line 3
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.


Were you able to figure out why we are getting this error?

As I already explained, the query will execute by creating several virtual tables, keeping the intermediate results and final result also a virtual table. But ORDER BY clause works quite differently. It returns a cursor. So the sub-query is returning a cursor result while outer query is implementing a virtual table. This is the reason why you can not use the ORDER BY clause in sub-queries, views, inline functions, etc (as mentioned in the error message).

The error message gives a hint to resolve the issue.

Consider the query below;


--Query4

USE AdventureWorks
GO

SELECT SalesOrderID,OrderQty,ProductID,UnitPrice FROM
(SELECT TOP 100 PERCENT SalesOrderID,OrderQty,ProductID,UnitPrice FROM Sales.SalesOrderDetail
ORDER BY UnitPrice) A





Now you can see the result of the query and the execution plan.

The difference of query 3 & 4 is the TOP 100 PERCENT statement. When you use the TOP operator (which is not a relational operator rather it is a T-SQL specific) the ORDER BY clause will not use a cursor when implementing. It uses a virtual table thus it prevents the above error and also the cost is very low with compared to the query 2.



Conclusion



As a rule of thumb it is a good practice to avoid ORDER BY clause unless you really need the rows sorted. Use ORDER BY clause in sub-queries with TOP 100 PERCENT if you really need it. A query with ORDER BY clause can not be used as a table expression that is, a view, inline table-valued function, sub query, derived table, or common table expression (CTE).

SQL version used for this article is, Microsoft SQL Server 2005 - 9.00.3073.00 (Intel X86) Copyright (c) 1988-2005 Microsoft Corporation Enterprise Edition on Windows NT 5.2 (Build 3790: Service Pack 2)

No comments:

Post a Comment

How to interpret Disk Latency

I was analyzing IO stats in one of our SQL Servers and noticed that IO latency (Read / Write) are very high. As a rule of thumb, we know tha...