There are no comments yet...Kick things off by filling out the form below.
Performance Effect of Common Table Expressions in SQL Server 2005
One of my recent blog posts was about a tip to use TOP clause to reduce locks for DELETE and UPDATE commands in SQL Server 2005. Such performance tips are very interesting for myself and I love to learn them because they can't be find easily on many official resources such as books or articles.
Another performance tip (or concept) that I'm going to cover in this post is the usage of Common Table Expressions in SQL Server 2005 that may improve the performance of your code significantly.
As you may know, Common Table Expression (CTE) is a new feature introduced in T-SQL since SQL Server 2005 and they come handy when you want to work with a subset of data in your queries.
CTE can be a great replacement for many sub-queries in T-SQL to improve their performance a lot and this is one of the main reasons to use them. The other reason is the simplicity that they bring to T-SQL query syntax. The syntax to use Common Table Expressions in your code is simple (you can find it on MSDN) and you fetch a set of data via CTE to use it in your queries.
Here I don't want to talk about the usage of this feature but want to compare the performance difference between CTE and older usage of sub-queries in some cases where they don't have any performance effect! Yes, they don't have any performance effect in some cases. You may wonder about this sentence because many developers think that CTE should have a better performance than sub-queries but this isn't a rule and it depends on your case and your data so you need to compare different approaches to find the best.
To illustrate this, let me show you an example. Like the other post I use a stored procedure to create a sample table with 1000 rows.
CREATE PROCEDURE dbo.InsertData
AS
DECLARE @Counter int
SET @Counter = 1000
WHILE (@Counter > 0)
BEGIN
INSERT INTO SampleTable
(Value) VALUES (@Counter * 2)
SET @Counter = @Counter - 1
END
RETURN
Now suppose that I want to write a stored procedure to select data from this table in several pages based on a page size and page index. One common solution would be using a sub-query to find row identifiers to select and then select items based on this sub-query. This stored procedure can look like this:
CREATE PROCEDURE dbo.SelectData1
@PageSize int,
@PageIndex int
AS
-- Set the page bounds
DECLARE @PageLowerBound int;
DECLARE @PageUpperBound int;
SET @PageLowerBound = @PageSize * @PageIndex;
SET @PageUpperBound = @PageSize - 1 + @PageLowerBound;
WITH RowsToSelect AS
(
SELECT
ID
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS Row, ID
FROM SampleTable
)
AS ItemsWithRowNumbers
WHERE
(Row - 1 >= @PageLowerBound) AND (Row - 1 <= @PageUpperBound)
)
SELECT
ID, Value
FROM SampleTable
WHERE
ID IN
(SELECT ID FROM RowsToSelect)
RETURN 0;
The other solution based on the CTE is presented below.
CREATE PROCEDURE dbo.SelectData2
@PageSize int,
@PageIndex int
AS
-- Set the page bounds
DECLARE @PageLowerBound int;
DECLARE @PageUpperBound int;
SET @PageLowerBound = @PageSize * @PageIndex;
SET @PageUpperBound = @PageSize - 1 + @PageLowerBound;
SELECT
ID, Value
FROM SampleTable
WHERE
ID IN
(
SELECT
ID
FROM
(
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS Row, ID
FROM SampleTable
)
AS ItemsWithRowNumbers
WHERE
(Row - 1 >= @PageLowerBound) AND (Row - 1 <= @PageUpperBound)
)
RETURN 0;
As I stated above and can prove it here, the second stored procedure doesn't have any performance difference in comparison with the first one. The only difference is in the simpler syntax of the query.
Running the first stored procedure and the second one in a single batch and monitoring their execution plan, I get the following result.
As you see there is no significant effect between the performance of the second approach in comparison with the first one while you might expect something else.
CTE may have a better performance in some scenarios because it provides more information for SQL Server engine that improves the performance. But as a general statement and based on my own experiences it has a better performance when you deal with multiple subsets of data and not a single one. As long as you use more data sets you notice a better performance for CTE-based queries.
As the final conclusion of this post I want to say that CTE doesn't always come with a better performance and you need to test its performance with your data before considering this.
[advertisement] Axosoft OnTime 2008 is four developer tools in one: bug tracking, project wiki, feature management, and help desk. It manages your development process so developers can focus on coding. Installed or Hosted – Free Single-user license -- Free 30-day team trial.
No Comments : 03.20.08