Sunday, November 1, 2015

Fastest way to get the Row Count of a table

Finding the row count of a table is probably the most frequently performed task by SQL developers/programmers.  So the obvious thing we all look for is: what is the fastest way to find the row count of a table?  Of course there are few ways, I’ll discuss them in this post.  However, if you ask me the same question my immediate response without blinking an eye will be using built-in stored procedure SP_SPACEUSED.  It is very easy to use with simple syntax:
--------------------------------------------------------------------------------------------------------
sp_spaceused your_table_name
--------------------------------------------------------------------------------------------------------
and within a second you will get the rowcount of the given table, doesn't matter how big the table is. To know more about the SP_SPACEUSED please see: sp_spaceused (Transact-SQL)

Remarks:

Now there are certain things you need to keep in mind while using sp_spaceused:
1. sp_spaceused relies on DMV sys.dm_db_partition_stats to get the rowcount.  So there is a chance that you will get the approximate row count if statistics are not updated.  To get the exact count you can update the statistics using DBCC UPDATEUSAGE .  However this can take long time if the table is really huge. 

2. If the table is part of the default dbo schema then you don’t need to pass the schema name in the syntax.  However if the table is created under some different or custom schema then you need to pass the schema name also.  But if you try to execute sp_spaceused schema_name.your_table_name you will get the error:
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near '.'.

This become even more difficult if schema name has your domain name as well, e.g. your table name is domain\user_name.table_name and you try to use sp_spaceused domain\user_name.table_name you will get the error:
Msg 102, Level 15, State 1, Line 1
Incorrect syntax near '\'.

In that case you just need to pass your table name within the single quotes:
--------------------------------------------------------------------------------------------------------
sp_spaceused 'schema_name.your_table_name'
go
-- or
sp_spaceused
'domain\schema_name.your_table_name'
--------------------------------------------------------------------------------------------------------

Now coming to back some positives:  This is my favorite method despite the fact that sp_spaceused sometimes can give me approximate row count.   First reason I like sp_spaceused is, it is extremely fast, syntax is simple, I don’t need to remember some DMV names.  Another reason is, I can use it even when some DML operations are being performed on the table. For example you are inserting large number of rows in a table, query is still executing and you want to know how many rows are inserted in the table.  sp_spaceused can you give you the row count of a table even middle of a INSERT query.  Very very useful in tracking the progress of big INSERT query. 

Now having said that row count we get from sp_spaceused or the DMV sys.dm_db_partition_stats is not always accurate, personally I have never observed any difference.   Even if there is a difference I guess it is negligible. 

Now quickly discuss the couple of other methods to find the row count of a table.

Using DMV sys.dm_db_partition_stats:

This is another very quick method to get the rowcount of a table.
--------------------------------------------------------------------------------------------------------
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('table_name')
AND index_id < 2;

--------------------------------------------------------------------------------------------------------
Interesting thing is even sp_spaceused relies on the same table to get the row_count.  But of course writing and remembering sp_spaceused table_name is much easier than this, right?  You can also get the row count using sysindexes, but I’m skipping that.

Using COUNT function:

Now this is the most reliable, known but also a slow method to find the row count of a table, especially if the table size  is in millions or billions.   Actually I don’t need to talk about the COUNT function, because everyone knows about it, but I’m mentioning it in this post because of a myth.  There is some misconception going around that if you write COUNT(1) instead of COUNT ( * ) you will get faster result.  That is a total myth.  Many experts have tested it, written about it and you can find many articles online comparing the performance of COUNT(1) and COUNT ( * )  and concluding that there is absolutely no performance difference between them.   Here is one such article by Beaulin Twinkle: SQL Server: count(*) or count (1) or count(”) Which is better?

Thanks
Mangal Pardeshi

Thursday, October 29, 2015

LPAD and RPAD in SQL Server using FORMAT

In SQL Server 2012 Microsoft introduced a new T-SQL function FORMAT.  This function returns a value formatted with the specified format and optional culture.  This function can be very useful in formatting Numeric and Date values.  In this post I’ll show you a simple use of FORMAT function to do the padding similar to LPAD and RPAD functions in Oracle.

First lets create some sample data:
--------------------------------------------------------------------------------------------------------
CREATE TABLE #Temp(Id INT)
GO

--Now insert the table Temp with some sample data:
INSERT INTO #Temp SELECT
1 UNION SELECT
2 UNION SELECT
12 UNION SELECT
123 UNION SELECT
1234 UNION SELECT
12345
GO

SELECT ID
FROM #TEMP

--------------------------------------------------------------------------------------------------------

First lets see how to do the LPAD, will talk about RPAD separately because it need some extra efforts.  Will try to do the padding till 5 digits.  Means if there is only 1 digit in column then we need to add extra 4 zeros to the left.  With FORMAT function it actually becomes very very easy.  If you know the syntax of FORMAT function =>> FORMAT ( value, format [, culture ] )  then you just need to put ‘00000’ (number of digits you want to do the padding) at the place of format. 
--------------------------------------------------------------------------------------------------------
SELECT ID ,
FORMAT(ID, '00000') AS LPAD_EXAMPLE
FROM #TEMP

--------------------------------------------------------------------------------------------------------

For the RPAD you need to do little differently.  While defining the format of ‘00000’ we need to add ‘#’ (which represent the value/column) before it:
--------------------------------------------------------------------------------------------------------
SELECT ID ,
FORMAT(ID, '#''00000') AS RPAD_EXAMPLE
FROM #TEMP

--------------------------------------------------------------------------------------------------------

See the combine result of above 2 queries:

LPAD_RPAD

For more about the function see: FORMAT (Transact-SQL)

Also check my post about new T-SQL functions introduced in SQL Server see: New T-SQL programming enhancements in SQL Server 2012

In my future posts I’ll try show more different ways we can use FORMAT function.

Thanks
Mangal Pardeshi

Monday, October 26, 2015

Maximum number of Tables in a SQL Database

Recently somebody asked this question on Stackoverflow forums “How many tables we can create on a SQL Server database”.  And I thought I should write about it on my blog.   Though for normal SQL programmers this shouldn’t matter and mostly some weird people ask this question in interviews, and I don’t know why. 

Anyways, since we are talking about it let me answer this question.  Actually in SQL Server, Microsoft has not defined any limit on number of tables but there is a limit on the number of Objects in a database. 

According to Maximum Capacity Specifications for SQL Server Tables per database are limited by number of objects in a database.  Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

So if there are NO other objects created in a database you can create 2,147,483,647 number of tables in SQL Server database.

And if I’m I’m right then I guess this limit is pretty much constant since SQL Server 2005 to SQL Server 2014.

To know more about the maximum sizes and numbers of various objects defined in SQL Server components you can check out: Maximum Capacity Specifications for SQL Server

Thanks
Mangal Pardeshi

Saturday, October 24, 2015

How to Calculate Running Totals in SQL Server

Back in 2009 I wrote this post about How to Calculate the Running Totals in SQL Server.  Back then(before SQL Server 2012)  this was one of the “difficult” queries to write.  We used to do that by writing a  co-related sub-query or some time even Cursor was used for calculating running totals.

Then Microsoft came up with this much awaited enhancement in OVER() clause in SQL Server 2012.  Now the OVER() is extended fully to aggregate functions like SUM, COUNT and AVG.  In simple words now we can write ORDER BY in OVER() clause for aggregate functions as well.  And this feature has made running total calculations extremely easy and efficient.

On the same line with my previous post, the different scenarios I’ll be covering in this post are -

  • Running Total for all records.
  • Running Total on each Group of Records.
  • Running Total on each Date.

Create Sample Data:
--------------------------------------------------------------------------------------------------------

-- Create Table
CREATE TABLE #Sales
( ShipId INT,
OrderId INT,
ShipDate DateTime,
Quantity INT
)
GO
-- Load Sample Data
INSERT INTO #Sales
VALUES
(1,1, '2015-10-01 12:06:16:820PM',10),
(2,1, '2015-10-02 04:16:16:820PM',15),
(3,2, '2015-10-01 01:26:16:820PM',20),
(4,2, '2015-10-03 02:40:16:820PM',12),
(5,2, '2015-10-04 01:05:16:820PM',15),
(6,3, '2015-10-03 05:25:16:820PM',25),
(7,3, '2015-10-04 03:50:16:820PM',50),
(8,3, '2015-10-05 07:40:16:820PM',30),
(9,4, '2015-10-06 03:30:16:820PM',20),
(10,4, '2015-10-07 04:20:16:820PM',10),
(11,5, '2015-10-05 05:05:16:820PM',35)

--------------------------------------------------------------------------------------------------------
This is how our sample data looks like:
Sample

Case I : Running Total For all records.  Now this is the simplest case where I’ll show you how to calculate the running totals for all the records in the order of ShipDate.  If you remember old ways then we needed to write some weird looking co-related sub query to achieve that.  But this can be easily achieved now by simply SUM(Quantity) OVER(ORDER BY ShipDate).
Note: I’m assuming you know the basics of OVER() clause.

--------------------------------------------------------------------------------------------------------
SELECT *
,SUM(Quantity) OVER(ORDER BY ShipDate) AS RunningTotal
FROM #SALES
ORDER BY ShipDate
--------------------------------------------------------------------------------------------------------
Here is the result returned by above query :
All

Case II: Running Total on each Group of Records.
Now instead of calculating the running total for each record we will calculate the running total for each OrderId.  To do that we just need to ad PARTITION BY OrderID in the OVER clause.

--------------------------------------------------------------------------------------------------------
SELECT *
,SUM(Quantity) OVER(PARTITION BY OrderId ORDER BY ShipDate) AS RunningTotal
FROM #SALES
ORDER BY OrderId, ShipDate
--------------------------------------------------------------------------------------------------------
Output:
Order

Case III: Running Total on each Date.
Now we will see how to calculate the Running Total for each day. For that of course we need to put ShipDate in the PARTITION BY but also need to remove the time portion from the date.  We can do that by simply converting ShipDate to DATE format.  See below query:

--------------------------------------------------------------------------------------------------------
SELECT *
,SUM(Quantity) OVER(PARTITION BY CONVERT(DATE, ShipDate) ORDER BY ShipDate) AS RunningTotal
FROM #SALES
ORDER BY ShipDate
--------------------------------------------------------------------------------------------------------
Output:
Date

I hope you like this.  Do give me your feedback.

Thanks
Mangal Pardeshi

Thursday, October 22, 2015

New T-SQL programming enhancements in SQL Server 2012

In SQL Server 2012 and 2014 Microsoft introduced many new features and enhancements. In fact there were 14 new in-built functions introduced in SQL Server 2012.  However I noticed that for some reason I don’t see many people using them.  In my company we upgraded all our servers to 2014 from 2008R2 almost a year back but I see none of the developers using them or even aware.  Also in last 6 months I interviewed many people and one of the question I asked to candidates that are you aware of any new features of SQL Server 2012/14?  While most of the people talked about the changes in SSIS, project deployment etc but very few knew about the 14 new in-built functions introduced in SQL Server 2012. 

So I thought of writing about these new in-built functions. In this post I’m not going to write the details of these function.  This is just going to be a reminder that now we have 14 more in-built functions that can help us in writing SQL code. 

Conversion functions

1. PARSE (Transact-SQL) : PARSE function converts the string expression to the requested data type. And if it fails to translate then raises an exception.

2. TRY_CONVERT (Transact-SQL): Returns a value cast to the specified data type if the cast succeeds; otherwise, returns null.

3. TRY_PARSE (Transact-SQL): Returns the result of an expression, translated to the requested data type, or null if the cast fails

You might wonder what is the difference between PARSE and CONVERT or may be PARSE and TRY_PARSE; I’ll try to explain that in my next few posts.

 

Date and time functions

4. DATEFROMPARTS (Transact-SQL): Returns a date value for the specified year, month, and day.

5. DATETIME2FROMPARTS (Transact-SQL): Returns a datetime2 value for the specified date and time and with the specified precision.

6. DATETIMEFROMPARTS (Transact-SQL): Returns a datetime value for the specified date and time.

7. DATETIMEOFFSETFROMPARTS (Transact-SQL): Returns a datetimeoffset value for the specified date and time and with the specified offsets and precision.

8. EOMONTH (Transact-SQL): Returns the last day of the month that contains the specified date, with an optional offset.

9. SMALLDATETIMEFROMPARTS (Transact-SQL): Returns a smalldatetime value for the specified date and time.

10. TIMEFROMPARTS (Transact-SQL): Returns a time value for the specified time and with the specified precision.

 

Logical functions

11. CHOOSE (Transact-SQL): Returns the item at the specified index from a list of values.

12. IIF (Transact-SQL): Returns one of two values, depending on whether the Boolean expression evaluates to true or false.

 

String functions

13. CONCAT (Transact-SQL): Returns a string that is the result of concatenating two or more string values.

14. FORMAT (Transact-SQL): Returns a value formatted with the specified format and optional culture.

To know more about each of these new functions you can click on the names of the functions.  In my future posts I’ll try explain each of these functions in details with some examples and situation where they can be useful.

 

To know more about the new features introduced in the SQL Server please follow below links:

What's New in SQL Server 2012
What's New in SQL Server 2014

A Sneak Peek into the Future: What's New in SQL Server 2016

Thanks
Mangal Pardeshi