I'm Keyvan Nayyeri, a 25 years old Ph.D. student at
the Computer Science department of
the University of Texas at San Antonio.
I'm also
a Software Architect and Developer and previously held a B.Sc.
degree in Applied Mathematics.
This is my blog where I publish content about various topics specifically Programming Languages and Compilers, Software
Engineering and Programming.
One of my recent blog posts was about a tip to use TOP clause to reduce locks for DELETE and UPDATE commands in SQL Server 2005. Such performance tips are very interesting for myself and I love to learn them because they can't be find easily on many official resources such as books or articles.
Another performance tip (or concept) that I'm going to cover in this post is the usage of Common Table Expressions in SQL Server 2005 that may improve the performance of your code significantly.
As you may know, Common Table Expression (CTE) is a new feature introduced in T-SQL since SQL Server 2005 and they come handy when you want to work with a subset of data in your queries.
CTE can be a great replacement for many sub-queries in T-SQL to improve their performance a lot and this is one of the main reasons to use them. The other reason is the simplicity that they bring to T-SQL query syntax. The syntax to use Common Table Expressions in your code is simple (you can find it on MSDN) and you fetch a set of data via CTE to use it in your queries.
Here I don't want to talk about the usage of this feature but want to compare the performance difference between CTE and older usage of sub-queries in some cases where they don't have any performance effect! Yes, they don't have any performance effect in some cases. You may wonder about this sentence because many developers think that CTE should have a better performance than sub-queries but this isn't a rule and it depends on your case and your data so you need to compare different approaches to find the best.
To illustrate this, let me show you an example. Like the other post I use a stored procedure to create a sample table with 1000 rows.
CREATE PROCEDURE dbo.InsertData
AS
DECLARE @Counter int
SET @Counter = 1000
WHILE (@Counter > 0)
BEGIN
INSERT INTO SampleTable
(Value) VALUES (@Counter * 2)
SET @Counter = @Counter - 1
END
RETURN
Now suppose that I want to write a stored procedure to select data from this table in several pages based on a page size and page index. One common solution would be using a sub-query to find row identifiers to select and then select items based on this sub-query. This stored procedure can look like this:
CREATE PROCEDURE dbo.SelectData1
@PageSize int,
@PageIndex int
AS
-- Set the page bounds
DECLARE @PageLowerBound int;
DECLARE @PageUpperBound int;
SET @PageLowerBound = @PageSize * @PageIndex;
SET @PageUpperBound = @PageSize - 1 + @PageLowerBound;
WITH RowsToSelect AS
(
SELECT
ID
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS Row, ID
FROM SampleTable
)
AS ItemsWithRowNumbers
WHERE
(Row - 1 >= @PageLowerBound) AND (Row - 1 <= @PageUpperBound)
)
SELECT
ID, Value
FROM SampleTable
WHERE
ID IN
(SELECT ID FROM RowsToSelect)
RETURN 0;
The other solution based on the CTE is presented below.
CREATE PROCEDURE dbo.SelectData2
@PageSize int,
@PageIndex int
AS
-- Set the page bounds
DECLARE @PageLowerBound int;
DECLARE @PageUpperBound int;
SET @PageLowerBound = @PageSize * @PageIndex;
SET @PageUpperBound = @PageSize - 1 + @PageLowerBound;
SELECT
ID, Value
FROM SampleTable
WHERE
ID IN
(
SELECT
ID
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS Row, ID
FROM SampleTable
)
AS ItemsWithRowNumbers
WHERE
(Row - 1 >= @PageLowerBound) AND (Row - 1 <= @PageUpperBound)
)
RETURN 0;
As I stated above and can prove it here, the second stored procedure doesn't have any performance difference in comparison with the first one. The only difference is in the simpler syntax of the query.
Running the first stored procedure and the second one in a single batch and monitoring their execution plan, I get the following result.

As you see there is no significant effect between the performance of the second approach in comparison with the first one while you might expect something else.
CTE may have a better performance in some scenarios because it provides more information for SQL Server engine that improves the performance. But as a general statement and based on my own experiences it has a better performance when you deal with multiple subsets of data and not a single one. As long as you use more data sets you notice a better performance for CTE-based queries.
As the final conclusion of this post I want to say that CTE doesn't always come with a better performance and you need to test its performance with your data before considering this.
SQL Server 2005/2008 Common Table Expression
Feb 06, 2009 2:00 AM
#
Now this is something neat. if you use many subset of data and subquery, your performance may hit. Select
AJ
Jul 10, 2009 3:28 AM
#
Nice article really very helpful but please mention the scenarios where CTE is better than SubQuery.
Rahul
Aug 05, 2009 7:18 AM
#
Can somebody will explain performance comparision between Views(Not indexed) and CTE? I still have confusion over it.
Leave a Comment