Monday, October 26, 2015

Maximum number of Tables in a SQL Database

Recently somebody asked this question on Stackoverflow forums “How many tables we can create on a SQL Server database”.  And I thought I should write about it on my blog.   Though for normal SQL programmers this shouldn’t matter and mostly some weird people ask this question in interviews, and I don’t know why. 

Anyways, since we are talking about it let me answer this question.  Actually in SQL Server, Microsoft has not defined any limit on number of tables but there is a limit on the number of Objects in a database. 

According to Maximum Capacity Specifications for SQL Server Tables per database are limited by number of objects in a database.  Database objects include objects such as tables, views, stored procedures, user-defined functions, triggers, rules, defaults, and constraints. The sum of the number of all objects in a database cannot exceed 2,147,483,647.

So if there are NO other objects created in a database you can create 2,147,483,647 number of tables in SQL Server database.

And if I’m I’m right then I guess this limit is pretty much constant since SQL Server 2005 to SQL Server 2014.

To know more about the maximum sizes and numbers of various objects defined in SQL Server components you can check out: Maximum Capacity Specifications for SQL Server

Thanks
Mangal Pardeshi

Saturday, October 24, 2015

How to Calculate Running Totals in SQL Server

Back in 2009 I wrote this post about How to Calculate the Running Totals in SQL Server.  Back then(before SQL Server 2012)  this was one of the “difficult” queries to write.  We used to do that by writing a  co-related sub-query or some time even Cursor was used for calculating running totals.

Then Microsoft came up with this much awaited enhancement in OVER() clause in SQL Server 2012.  Now the OVER() is extended fully to aggregate functions like SUM, COUNT and AVG.  In simple words now we can write ORDER BY in OVER() clause for aggregate functions as well.  And this feature has made running total calculations extremely easy and efficient.

On the same line with my previous post, the different scenarios I’ll be covering in this post are -

  • Running Total for all records.
  • Running Total on each Group of Records.
  • Running Total on each Date.

Create Sample Data:
--------------------------------------------------------------------------------------------------------

-- Create Table
CREATE TABLE #Sales
( ShipId INT,
OrderId INT,
ShipDate DateTime,
Quantity INT
)
GO
-- Load Sample Data
INSERT INTO #Sales
VALUES
(1,1, '2015-10-01 12:06:16:820PM',10),
(2,1, '2015-10-02 04:16:16:820PM',15),
(3,2, '2015-10-01 01:26:16:820PM',20),
(4,2, '2015-10-03 02:40:16:820PM',12),
(5,2, '2015-10-04 01:05:16:820PM',15),
(6,3, '2015-10-03 05:25:16:820PM',25),
(7,3, '2015-10-04 03:50:16:820PM',50),
(8,3, '2015-10-05 07:40:16:820PM',30),
(9,4, '2015-10-06 03:30:16:820PM',20),
(10,4, '2015-10-07 04:20:16:820PM',10),
(11,5, '2015-10-05 05:05:16:820PM',35)

--------------------------------------------------------------------------------------------------------
This is how our sample data looks like:
Sample

Case I : Running Total For all records.  Now this is the simplest case where I’ll show you how to calculate the running totals for all the records in the order of ShipDate.  If you remember old ways then we needed to write some weird looking co-related sub query to achieve that.  But this can be easily achieved now by simply SUM(Quantity) OVER(ORDER BY ShipDate).
Note: I’m assuming you know the basics of OVER() clause.

--------------------------------------------------------------------------------------------------------
SELECT *
,SUM(Quantity) OVER(ORDER BY ShipDate) AS RunningTotal
FROM #SALES
ORDER BY ShipDate
--------------------------------------------------------------------------------------------------------
Here is the result returned by above query :
All

Case II: Running Total on each Group of Records.
Now instead of calculating the running total for each record we will calculate the running total for each OrderId.  To do that we just need to ad PARTITION BY OrderID in the OVER clause.

--------------------------------------------------------------------------------------------------------
SELECT *
,SUM(Quantity) OVER(PARTITION BY OrderId ORDER BY ShipDate) AS RunningTotal
FROM #SALES
ORDER BY OrderId, ShipDate
--------------------------------------------------------------------------------------------------------
Output:
Order

Case III: Running Total on each Date.
Now we will see how to calculate the Running Total for each day. For that of course we need to put ShipDate in the PARTITION BY but also need to remove the time portion from the date.  We can do that by simply converting ShipDate to DATE format.  See below query:

--------------------------------------------------------------------------------------------------------
SELECT *
,SUM(Quantity) OVER(PARTITION BY CONVERT(DATE, ShipDate) ORDER BY ShipDate) AS RunningTotal
FROM #SALES
ORDER BY ShipDate
--------------------------------------------------------------------------------------------------------
Output:
Date

I hope you like this.  Do give me your feedback.

Thanks
Mangal Pardeshi

Thursday, October 22, 2015

New T-SQL programming enhancements in SQL Server 2012

In SQL Server 2012 and 2014 Microsoft introduced many new features and enhancements. In fact there were 14 new in-built functions introduced in SQL Server 2012.  However I noticed that for some reason I don’t see many people using them.  In my company we upgraded all our servers to 2014 from 2008R2 almost a year back but I see none of the developers using them or even aware.  Also in last 6 months I interviewed many people and one of the question I asked to candidates that are you aware of any new features of SQL Server 2012/14?  While most of the people talked about the changes in SSIS, project deployment etc but very few knew about the 14 new in-built functions introduced in SQL Server 2012. 

So I thought of writing about these new in-built functions. In this post I’m not going to write the details of these function.  This is just going to be a reminder that now we have 14 more in-built functions that can help us in writing SQL code. 

Conversion functions

1. PARSE (Transact-SQL) : PARSE function converts the string expression to the requested data type. And if it fails to translate then raises an exception.

2. TRY_CONVERT (Transact-SQL): Returns a value cast to the specified data type if the cast succeeds; otherwise, returns null.

3. TRY_PARSE (Transact-SQL): Returns the result of an expression, translated to the requested data type, or null if the cast fails

You might wonder what is the difference between PARSE and CONVERT or may be PARSE and TRY_PARSE; I’ll try to explain that in my next few posts.

 

Date and time functions

4. DATEFROMPARTS (Transact-SQL): Returns a date value for the specified year, month, and day.

5. DATETIME2FROMPARTS (Transact-SQL): Returns a datetime2 value for the specified date and time and with the specified precision.

6. DATETIMEFROMPARTS (Transact-SQL): Returns a datetime value for the specified date and time.

7. DATETIMEOFFSETFROMPARTS (Transact-SQL): Returns a datetimeoffset value for the specified date and time and with the specified offsets and precision.

8. EOMONTH (Transact-SQL): Returns the last day of the month that contains the specified date, with an optional offset.

9. SMALLDATETIMEFROMPARTS (Transact-SQL): Returns a smalldatetime value for the specified date and time.

10. TIMEFROMPARTS (Transact-SQL): Returns a time value for the specified time and with the specified precision.

 

Logical functions

11. CHOOSE (Transact-SQL): Returns the item at the specified index from a list of values.

12. IIF (Transact-SQL): Returns one of two values, depending on whether the Boolean expression evaluates to true or false.

 

String functions

13. CONCAT (Transact-SQL): Returns a string that is the result of concatenating two or more string values.

14. FORMAT (Transact-SQL): Returns a value formatted with the specified format and optional culture.

To know more about each of these new functions you can click on the names of the functions.  In my future posts I’ll try explain each of these functions in details with some examples and situation where they can be useful.

 

To know more about the new features introduced in the SQL Server please follow below links:

What's New in SQL Server 2012
What's New in SQL Server 2014

A Sneak Peek into the Future: What's New in SQL Server 2016

Thanks
Mangal Pardeshi

Saturday, July 27, 2013

Different methods to execute large query in small batches

Today I’ll show you some different methods to run a DML operation like UPDATE in small batches for a large table.  Though I’m going to show only UPDATE statements, methods are applicable to DELETE and INSERTS as well.   When you want  to run an UPDATE  query(or other DML operations) over a very huge table having  over a billion rows or even  100s of Million it is recommended that you should not update them at one go.  DML queries on a very large number of rows can cause performance issues, transaction log might get full, affects concurrent users, eats up lot of server resources, runs for hours and many other.   Good practice is that you should break the number of records in small batches of few thousand and update them.  This way you will use minimum server resources and most importantly you will prevent the transaction log file from getting full.

First lets create one table and insert some sample data.

--------------------------------------------------------------------------------------------------------

-- Create the table
CREATE TABLE #A
(
ID INT,
CREATED_DATE  DATE,
METHOD VARCHAR(10)
)

-- Insert 10000 rows in the table
;WITH MANGAL AS
(
SELECT 1 AS ID, CONVERT(DATE,'20000101') AS CREATED_DATE
UNION ALL
SELECT ID+1, DATEADD(DD, 1, CREATED_DATE)
FROM MANGAL
WHERE ID <= 9999

)
INSERT INTO #A(ID, CREATED_DATE)
SELECT ID, CREATED_DATE
FROM MANGAL

OPTION (MAXRECURSION 10000);

-- Check sample data
SELECT *
FROM #A

--------------------------------------------------------------------------------------------------------

This is not the most efficient way to generate some random data using a recursive CTE but I like it so I’m using it.

I. Using a TOP and GO:

This is the simplest method to run a query in a small batches.  I frequently use this method in development when I want to update some records quickly without thinking much.  Note the GO 10 in the query.  An integer after GO will execute the preceding query specified number of times.  In this case update statement will execute 10 times(I know there are 10000 rows in the table and I’m using TOP 1000, simple math).

------------------------------------------------------------------------------------------------------
UPDATE TOP (1000) #A
SET METHOD = 'I'
WHERE METHOD IS NULL
GO 10

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

All the queries are quite simple  and they are self explanatory so I’m not going to explain them in detail.

II. Using ROWCOUNT and GO:

Same query but without the TOP operator.  Here I’m using a ROWCOUNT function to limit the number of rows in the batch.  ROWCOUNT causes SQL Server to stop processing the query after the specified number of rows are returned.

------------------------------------------------------------------------------------------------------
SET ROWCOUNT 1000
UPDATE #A
SET METHOD = 'II'
WHERE METHOD <> 'II'
GO 10

SET ROWCOUNT 0

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------ IMPORTANT:  Using SET ROWCOUNT will not affect DELETE, INSERT, and UPDATE statements in a future release of SQL Server. Avoid using SET ROWCOUNT with DELETE, INSERT, and UPDATE statements in new development work, and plan to modify applications that currently use it. For a similar behavior, use the TOP syntax.

III. Using TOP and GOTO:

Now instead of GO I’ll use GOTO to run the batch multiple times.  The GOTO statement causes the execution of the T-SQL batch to stop processing the following commands to GOTO and processing continues from the label where GOTO points.  Here I’ll use GOTO to keep processing the particular label until it’s @@ROWCOUNT becomes zero.

------------------------------------------------------------------------------------------------------ UPDATE_BATCH:

UPDATE TOP (3000) #A
SET METHOD = 'III'
WHERE METHOD <> 'III'
IF @@ROWCOUNT > 0  GOTO UPDATE_BATCH

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

IV. Using ROWCOUNT and GOTO:

Same GOTO concept with ROWCOUNT.

------------------------------------------------------------------------------------------------------

SET ROWCOUNT 3500
UPDATE_BATCH:

UPDATE  #A
SET METHOD = 'IV'
WHERE METHOD <> 'IV'
IF @@ROWCOUNT > 0  GOTO UPDATE_BATCH

SET ROWCOUNT 0
-- verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

V. Using WHILE and BREAK:

Here I’m taking the help of an infinite WHILE loop ( WHILE 1=1 ) to run the batch multiple times and using the BREAK to exit the loop when @@ROWCOUNT becomes zero.

------------------------------------------------------------------------------------------------------

WHILE (1=1)
BEGIN

    BEGIN TRANSACTION
       
        UPDATE TOP(3333) #A
        SET METHOD = 'V'
        WHERE METHOD <> 'V'
   
        IF @@ROWCOUNT = 0
            BEGIN
            COMMIT TRANSACTION
            BREAK
        END
    COMMIT TRANSACTION
END

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

Same can be implemented using ROWCOUNT as I already shown.

VI. Using a Sequence column and WHILE:

Now an entirely different method.  Not using GO, GOTO or BREAK.  This method is more systematic where you have more control on how query is going to execute.  Note that when you use TOP or ROWCOUNT you actually have no control on which rows are going to get updated.  Sorting is completely dependent on the query plan created by the query engine.

------------------------------------------------------------------------------------------------------

DECLARE @C INT
DECLARE @I INT
DECLARE @L INT
DECLARE @MAX INT

SET @C = 1
SET @I = 2200
SET @L = @C + @I
SET @MAX = (SELECT MAX(ID) FROM #A)

SELECT @C AS C, @I AS I, @L AS L, @MAX AS MAX

WHILE @C < @MAX
BEGIN 

    UPDATE #A
    SET METHOD = 'VI'
    WHERE ID >= @C
    AND ID < @L

SET @C = @L -- OR SET @C = @C + @I
SET @L = @L + @I

END

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

VII. Using a date range and WHILE:

On similar lines if we don’t have a reliable sequence column or we want to use a date column to update the records by month or year.

------------------------------------------------------------------------------------------------------

DECLARE @C DATE
DECLARE @I DATE
DECLARE @MAX DATE

SELECT    @C = MIN(CREATED_DATE),
        @MAX = MAX(CREATED_DATE)        
FROM #A

SET @I =  DATEADD(YEAR, DATEDIFF(YEAR, 0, @C)+1, 0)

--SELECT @C AS C, @I AS I , @MAX AS MAX

WHILE @C <= @MAX

BEGIN
    UPDATE #A
    SET METHOD = 'VII'
    WHERE CREATED_DATE >= @C
    AND CREATED_DATE < @I

SET @C = @I
SET @I = DATEADD(YEAR, 1, @I)

--SELECT @C, @I
END

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

VIII.  Using WHILE without GO, GOTO or BREAK:

One last method from me without using GO, GOTO and BREAK but relatively simpler compare to last 2 methods.  Because different people will advise you not to use GO, GOTO and BREAK in SQL.

------------------------------------------------------------------------------------------------------

DECLARE @ROWCOUNT INT = 1
DECLARE @C INT = 2800

WHILE (@ROWCOUNT > 0)
BEGIN
    UPDATE TOP (@C) #A
    SET METHOD = 'VIII'
    WHERE METHOD <> 'VIII'

SET @ROWCOUNT = @@ROWCOUNT
END

-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

Same can be done using ROWCOUNT.

IX. Final

Well this is the query I found some 4-5 years back when I was searching for same thing on Google.  I don’t remember where I saw it or who wrote it.

------------------------------------------------------------------------------------------------------
DECLARE @C INT

SET ROWCOUNT 1000
WHILE (1=1)

BEGIN
   BEGIN TRANSACTION
  
   UPDATE #A
   SET METHOD = 'X'
   WHERE METHOD <> 'X'

   SET @C = @@ROWCOUNT

   IF @@Error <> 0
    BEGIN
    Print 'Problem Updating the records'
    ROLLBACK TRANSACTION
    BREAK
    END

   IF @C = 0
   BEGIN
      COMMIT TRANSACTION
      BREAK
   END
   COMMIT TRANSACTION
END
SET ROWCOUNT 1000
-- Verify result
SELECT *
FROM #A

------------------------------------------------------------------------------------------------------

Some notes:  Both TOP and ROWCOUNT can be parameterized.  We can pass a variable instead of direct number.  But of course since ROWCOUNT is soon going to be deprecated from future SQL Server versions one should avoid using it.  I was being lazy not to use transactions (commits and rollback ) wherever possible in above examples but one should use them to add more control, clarity to code and also for better understanding. 

I have not really tested that which query is better, I leave that up to you.  Idea was to share different methods to execute a query in small batches.  But I would go for method 6 or 7 on production environment.  Do let me know your comments, suggestions and what do think of all these methods.

Namaste
Mangal

Sunday, November 14, 2010

ORDER BY Enhancements in SQL Server

Today I am going to talk about one new feature that is been introduced in the SQL Server Denali.   And that is:  one interesting enhancement in ORDER BY clause.  Now with ORDER BY clause you can also specify the OFFSET and FETCH options. 

From Books Online of Denali:
OFFSET: Specifies the number of rows to skip before it starts to return rows from the query expression. The value can be an integer constant or expression that is greater than or equal to zero.

FETCH: Specifies the number of rows to return after the OFFSET clause has been processed. The value can be an integer constant or expression that is greater than or equal to one.

This is how the new syntax of ORDER BY clause looks like in SQL Server Denali:
------------------------------------------------------------------------------------------------------
ORDER BY order_by_expression
    [ COLLATE collation_name ]
    [ ASC | DESC ]
    [ ,...n ]
[ <offset_fetch> ]
<offset_fetch> ::=
{
    OFFSET { integer_constant | offset_row_count_expression } { ROW | ROWS }
    [
      FETCH { FIRST | NEXT } {integer_constant | fetch_row_count_expression } { ROW | ROWS } ONLY
    ]
}

------------------------------------------------------------------------------------------------------

Take a look at following example:

------------------------------------------------------------------------------------------------------
USE AdventureWorks
GO
SELECT
VendorID
    ,Name
FROM Purchasing.Vendor
ORDER BY VendorID OFFSET 5 ROWS
GO

------------------------------------------------------------------------------------------------------

Output:

Offset

In the example I wrote 5 as OFFSET.  You can see in the output 1st five rows are skipped and we got VendorIds starting from 6.  Here SQL Server 1st orders the data on the Column specified in ORDER BY clause(i.e. VendorID).  The next query uses the clause OFFSET 5 ROWS to skip the first 5 rows and return all remaining rows.

Now lets see the example of FETCH:

------------------------------------------------------------------------------------------------------
USE AdventureWorks
GO
SELECT
VendorID
    ,Name
FROM Purchasing.Vendor
ORDER BY VendorID OFFSET 5 ROWS
    FETCH NEXT
3 ROWS ONLY
GO

------------------------------------------------------------------------------------------------------

Output:

FETCH

As you can SQL Server has fetched only 3 rows and that also after skipping 1st five rows.  That is because I specified 5 as OFFSET and 3 as FETCH NEXT.  Here FETCH NEXT 3 ROWS ONLY to limit the rows returned to 3 rows from the sorted result set.

Also:
1. offset_row_count_expression can be a variable, parameter, or constant scalar subquery. When a subquery is used, it cannot reference any columns defined in the outer query scope. That is, it cannot be correlated with the outer query.

2. ROW and ROWS are synonyms and are provided for ANSI compatibility.

3. In query execution plans, the offset row count value is displayed in the Offset attribute of the TOP query operator.

You can find all this information and more on: ORDER BY Clause (Transact-SQL)

Thanks
Mangal