Wednesday, August 12, 2009

Microsoft SQL Server 2008 R2 CTP

The first community technology preview (CTP) of Microsoft SQL Server 2008 R2 available for download for MSDN and TechNet subscribers.

For more details about SQL Server 2008 R2 and for other related links See - http://www.microsoft.com/sqlserver/2008/en/us/R2.aspx

If you have any questions regarding SQL Server 2008 R2 you can visit the forums - http://social.msdn.microsoft.com/Forums/en-US/category/sqlserverprerelease

Tuesday, August 11, 2009

ROLLUP and ORDER BY

In one of my previous posts, I discussed on how useful GROUPING function is while writing a ROLLUP/CUBE query.  One more way GROUPING function can help you is – in ordering the results returned by ROLLUP/CUBE queries.

The reason I’m writing this post is, sometime back I seen somebody writing a weird ORDER BY statement to get the desired ordering after writing a query using a ROLLUP operator.  First he didn’t use the GROUPING function in SELECT statement and 2nd his ORDER BY was something like :

ORDER BY CASE ColumnName
            WHEN 'Total Of ColumnName'
            THEN 'Zzzzzzzz'
            ELSE ColumnName
         END

Just to get sub total returned by ROLLUP at the bottom of result set.  Well of course if he had the knowledge about GROUPING then he wouldn’t have written such CASE statement in ORDER BY.

As you may know that GROUPING function returns 1 when the row is added by either the CUBE or ROLLUP operator, and 0 when the row is not the result of CUBE or ROLLUP.  So you can easily use this property of the GROUPING for ordering the result set.

Lets have a look at the following example, 1st create some sample data.  

-- create sample table Sales

CREATE TABLE Sales
(
ID INT,
FName VARCHAR(30),
Zone VARCHAR(30),
Sale INT
)
GO

-- Load sample data

INSERT INTO Sales SELECT
1, 'Mangal', 'East', 20 UNION ALL SELECT
2, 'Mangal', 'East', 150 UNION ALL SELECT
3, 'Mangal', 'West', 50 UNION ALL SELECT
4, 'Ram', 'East', 45 UNION ALL SELECT
5, 'Ram', NULL, 80 UNION ALL SELECT
6, 'Ram', NULL, 40 UNION ALL SELECT
7, 'Sachin', 'West', 50 UNION ALL SELECT
8, 'Sachin', 'West', 40
GO

-- Test sample data

SELECT Id, FName, Zone, Sale
FROM Sales
GO

The sample data :

ID FName Zone Sale
1 Mangal East 20
2 Mangal East 150
3 Mangal West 50
4 Ram East 45
5 Ram NULL 80
6 Ram NULL 40
7 Sachin West 50
8 Sachin West 40

And here is the expected output :

FName Zone Total
Mangal East 170
Mangal West 50
Mangal All Zone 220
Ram East 45
Ram Unknown 120
Ram All Zone 165
Sachin West 90
Sachin All Zone 90
All Names All Zone 475

As you can see in the expected output, all the FNames are ordered in ascending order and their total SUM is at the bottom, same for the Zone column.  For ordering the result in that way just use the GROUPING(column_name) in ORDER BY just before the column_name.  See the following query, esp the ORDER BY clause:

SELECT CASE GROUPING(fname)
        WHEN 1 THEN 'All Names'
        ELSE ISNULL(Fname, 'Unknown')
        END AS FName,
        CASE GROUPING(Zone)
        WHEN 1 THEN 'All Zone'
        ELSE ISNULL(Zone, 'Unknown') END as Zone,
        SUM(Sale) AS Total
FROM Sales
GROUP BY Fname, Zone WITH ROLLUP
ORDER BY  GROUPING(fname),FName,GROUPING(Zone),Zone

Simple, isn’t it?  Now you don’t need to write a CASE statement in ORDER BY, just use the GROUPING function.  If you will be doing the ORDERING in application layer, then you will need to get in the GROUPING(fname) and GROUPING(zone) column in the SELECT list as well.

Mangal

Friday, August 7, 2009

TSQL Challenges

I just like to thank Jacob Sebastian who is a fellow SQL Server MVP and founder of the www.tsqlchallenges.com for offering me an opportunity to be part of the TSQL Challenges team. I’m really happy to be a part of a team consists of people like Jacob, Alejandro Messa, Peter Larsson (all 3 are SQL Server MVPs), Adam Haines (a moderator of MSDN SQL Server forums), Rui Carvalho and many other talented people.

Here is a brief description about TSQL Challenges site: TSQL Challenges constantly aim at helping people to enhance their SET based query writing skills. With TSQL Challenges, sometimes you learn stuff that you don’t know, sometimes you will see better ways of doing stuff that you already know and sometimes you will be able to use your expertise to help others to learn TSQL querying skills. Even SQL Server experts love TSQL Challenges because every challenge inspires them to come up with new better ways of solving the given problem.

The Mission: The entire “TSQL Challenge” team will focus on fulfilling our mission; “helping people to enhance their SET based query writing skills”. We will come up with more and more interesting TSQL Challenges that encourages you to look for alternate logics and inspires you to think outside the regular thought process.

I would like to invite my readers to participate in a TSQL Challenge - www.tsqlchallenges.com

Also like to thank Jacob again for a warm welcome and kind word he has put in introduction post - Introducing new “TSQL Challenge” Team Members

So I hope I will come with up some interesting SQL puzzles that will challenge your SQL skills and also you will a fun solving them.

Mangal

Tuesday, August 4, 2009

SQL Server 2008 Service Pack (SP) 1 on Microsoft Update as a Required Automatic Update

SQL Server 2008 Service Pack 1 will soon be available through Automatic Update starting from September. 

For the latest information you can read it from SQL Server Setup blog - SQL Server 2008 Service Pack (SP) 1 on Microsoft Update as a Required Automatic Update

For better understanding of Automatic Update see - Update Your PC Automatically

Wednesday, July 22, 2009

UNION Vs UNION ALL

Many times you may have heard this “Use UNION ALL over UNION whenever possible.”  The question arises - why?  To answer this question in one statement  - UNION ALL performs faster compare to UNION. 
Then again question arises - Why UNION ALL performs faster?  Also - Why whenever possible, why not always?

Let me answer the 2nd question 1st – Though both UNION and UNION ALL combines the results of two or more queries into a single result set, there is fundamental difference between these two.  UNION returns only DISTINCT result set, while UNION ALL returns basically all rows available, this includes duplicates.

Lets see the following example:

-- create 2 tables A and B.
CREATE TABLE A
(
ID INT,
Names VARCHAR(10)
)
GO
CREATE TABLE B
(
ID INT,
Names VARCHAR(10)
)
GO
-- insert data into table A
INSERT INTO A VALUES(1,'Mangal');
INSERT INTO A VALUES(5,'Sham');
INSERT INTO A VALUES(2,'Ram');

-- insert data into table B
INSERT INTO B VALUES(2,'Ram');
INSERT INTO B VALUES(3,'Shiv');
INSERT INTO B VALUES(4,'John');

-- test sample data
SELECT id, Names
FROM A
GO
SELECT id, Names
FROM B
GO

Here is how the data of the table A and B looks like :

samle

Note that id=2 and names=Ram is there in both the tables.  That will help us in understanding the difference between UNION and UNION ALL.  Now lets execute the following 2 queries, 1st is with UNION and 2nd is with UNION ALL.

-- with UNION
SELECT id, Names
FROM A
UNION
SELECT id, Names
FROM B
GO

-- with UNION ALL
SELECT id, Names
FROM A
UNION ALL
SELECT
id, Names
FROM B
GO

The result: 

Result

Observations :
1. 1st query with UNION returns 5 rows, and UNION ALL query returns 6 rows.
2.  Row for ID=2(for Ram) appears twice in UNION ALL result set.
3. Result set for UNION is sorted on ID column.  For UNION ALL all the rows of table A appeared 1st followed by rows of table B(no sort).

As you can see, UNION eliminates any duplicate rows from final result set while UNION ALL returns basically all rows available including duplicates.  That is the cause of UNION being slow.  For each row UNION operator checks whether the entire row exists in previous rows or not.  And for making this validation UNION by default 1st sort the result set on the 1st available column of the result set.  In our example UNION has sorted the result set on ID column even though I haven’t specified any ORDER BY clause.  If you see Name “Sham” (which is in table A) appeared last in the UNION result because it has the highest id 5 while it appeared on 2nd row of UNION ALL result.  A look at the query execution plan can help you visualizing it better :

plan 

As you can see cost of the UNION query is 73% compare to 27% for UNION ALL.  And measure reason being the “Distinct Sort” that UNION operator performs to sort and to eliminate the duplicate rows.  While UNION ALL doesn’t really bother about sort and duplicates.  And that is why UNION is slow compare to UNION ALL. 

So again going back to question – why not use UNION ALL always?  And one more question to be added - when to use which one?

- You should  use UNION when you don’t want the duplicates in your final result set, and you are not sure (or may be you are sure) that duplicate records exists in the different queries involved in the UNION.

- You should be using UNION ALL when :
1.  You are not bothered about the duplicate rows in the result.
2.  You are sure there are no duplicates in different queries involved in UNION.  e.g. if you are combining results from 2 or more different years(sales orders) with each query reruns result for individual year with some unique id for each row.  Or combining result for 2 or more different departments.

All this long I’m talking about UNION and UNION ALL as if they are 2 different things all together.  Are they?  Not exactly.  Reason I’m saying this because, when one of my friend asked me about UNION ALL and I advised him to look into the Books online, and he came back complaining me that “books online doesn’t say anything about UNION ALL”.  Reason – he was thinking that Books online must be having some separate section dedicated to UNION ALL, as if it is different from UNION.

Actually, the ALL is just an optional argument in the UNION syntax.  For more on UNION you can refer the books online - http://msdn.microsoft.com/en-us/library/ms180026.aspx.  

Mangal