I watched a terrific webcast yesterday afternoon from Microsoft on SQL Server 2008 (Katmai) programmability. It didn’t go in depth for all changes in 2k8, but it did go over the highlights I suppose you could say. It was very informational to me. New data types There are four new datatypes in SQL Server 2008, although they did not deprecate any of the existing types. They are all date and time centric, one is now timezone aware, and all of them have much greater ranges and precision than were available before. It is a little ironic because VistaDB 2.x had a date and a time as two separate types, and now SQL Server does as well.
Date – Range 1/1/1 – 9999/12/31. A much higher range for dates is allowed.
Time(n) – Time only with N digits of precision after the seconds.
DateTimeOffset(n) – Date, time, and timezone information. All are stored in UTC with an offset to the TZ setting. This is supposed to become a new system type as well.
DateTime2(n) – No timezone support, but large date range and precision increases over the standard DateTime.
NOTE: All of (n) maxes out at 7 digits. This gives you 100 nanoseconds of precision and is actually the highest resolution Windows supports anyway.
With the new types also comes new extentions to SQL functions like DATE_PART. You can now specify MICROSECOND, TZOFFSET, and ISO_WEEK to get that information out of a DateTimeOffset type.
TSQL changes
MERGE keyword added from SQL 2006 standard (with some extensions added). MERGE takes multiple DML operations and combines them to be run as a single set operation in the database. It also allows you to perform some new types of operations on the MERGE’d data. You can specific what is to happen when data is MATCHED, NOT MATCHED, and SOURCE NOT MATCHED. The presenter gave one example of this in merging two tables of data DML changes (Update operations). One table was stock sales, the other was a stock portfolio. The SOURCE NOT MATCHED was used to delete any records in the stock portfolio whose stock had a zero balance.
Table-Value Params (TVP)
This is a pretty major change. In recognition of how many people pass arrays to Stored Procs (SP) and then put the data into temp tables, Microsoft has basically made a simpler way of performing these steps. You can now create a User Defined Type (UDT) for the structure you want to pass into an SP. Then you can take a large amount of data and pass it in a single variable (and in a single trip to the server). When the SP has the data it can then perform select, update, delete on the variable like a temp table. The variable and all it’s changes are only kept in RAM and are deleted with the SP returns. There were a lot of examples of how to take existing temp table code and move to TVPs instead. I would say it was an interesting change, but makes the management of the custom types more difficult since you have to create a new type for each temp table you would have created before. And these UDTs must exist at the dbo level (not internal to the SP). I think a lot of people will not want to add that level of permissions to everyone, and will stick with temp tables in SPs instead.
New sys functions
New sys.functions to determine dependencies and objects in the database. From the new need to create so many custom types a new feature was added to show every object that is a dependency for a given object. You can pass in a stored proc and it will tell you that it needs custom type X in order to run properly, it also references the following tables, etc. It is mostly an admin tool to figure out why something is not running, or for SQL management reporting tools like Red Gate who build reports for DBAs showing them information about their server.
UDT and CLR changes
SQL Server 2005 has a maximum length of 8k bytes for all user defined types (UDTs). SQL Server 2008 has upped this limit to varbinary(max) – 2 GB. This was needed for the TVP custom types. You can now build up the structure on the client and pass it to the server in a single call as a Table Value Parameter. Of course older clients will not know this, and they will be told the type is a varbinary(max) rather than the new TVP. The CLR runtime (the Dot Net runtime hosted inside SQL Server) was previously limited to 8k of state space as well. This has been changed to also be 2GB.
The idea being that more stored procs will be written in C#/VB.Net in the future and the complex needs could not be met with the smaller state. Now through TVPs you can pass GB’s of data to a stored proc (is that REALLY such a good idea?). Static methods in managed assemblies are now treated as User Defined Functions (UDFs). This is something we have been working on within VistaDB as well. And users can now create multi column user defined aggregates; you were previously limited to a single column.
Large data from client to server
How is all this large data going to get from the client to the server? Well, obviously there is a new version of SQL Client, and new versions of the native libraries as well. Microsoft is shipping a new ODBC driver for 2k8 that includes all the new types, but is not backward compatible with older clients. So you will have to install the old ODBC and new driver side by side if you have clients that need to take advantage of both. But the biggest change is how data is transferred from the client to the server. It no longer has to happen in a single packet. Data can now be streamed between the two. This is not a filestream as we think of it in Dot Net, it is a more generic stream that is internal to the drivers and SQL Server. But the effect is the same, you can pass large objects to the server and they are split up and read by the server a block at a time.
No discussion was included for what happens if multiple clients are streaming 2 GB datasets to the server as temp tables at the same time – who gets bandwidth priority, what about RAM / VM? Questions in that direction were told to “ask later by email”.
DML Group By Changes / DDL Changes
GROUP BY is getting a new syntax to ease the syntax required for UNION ALL operations. You can now specific a GROUP BY GROUPING SETS ( (year,quarter,country), (year), (country), ()) to get grouped sets of each of these columns. Previously it would have taken 4 queries to get the same results, or a complex set of UNION ALL with subqueries. DDL is also being updated to include better event trigger support. Apparently the current events will not fire a trigger if the delete/insert/update operation are being performed as a part of a system proc (sp_rename for example). This has been corrected to force those events to send triggers as well. A new set of XSD files is also being included to expose the events as XML Triggers, but no examples were given about how this will work.
TSQL changes and the 5 stages of Deprecation
The biggest change is that you can now perform multi row inserts! This has been a part of the spec for a long time, but SQL Server never supported it. Now you can perform an INSERT VALUES INTO table (val1), (val2), (val3); to get them all inserted in a single execute. Set operators have been expanded to include += -= *= and /= for variables. Variables can now also be initialized when declared. This is a nice way to shorten up your TSQL code that tends to look choppy when you have to declare a variable and then initialize it on the next line.
DECLARE var1 as varchar(10) = ‘123’;
Deprecation has been a hot topic within Microsoft since 2k5 came out. They are getting a lot more serious about deprecating old features, and forcing them to be removed. Now there are 5 stages of deprecation varying from proposed, to no longer supported. Triggers now fire for any event that is flagged in any stage of deprecation making it easier to just run your code and see if you are using deprecated calls. But you will now also start to receive exceptions in your code for calls to features that are in DEPRECATION_FINALSUPPORT – meaning this is going away very, very soon.
And you can’t get around these by just setting your SQL level back to 80 like so many current programmers do because they don’t want to learn the new syntax. Setting the compatibility level back ALSO triggers an event that you are using something deprecated, and it appeared as though might require special privileges to do so now. I personally find this is a good thing. I see so many SQL Procs that set a compatibility level back to 70 (SQL 7!) at the start and then go on doing horrendous things that should be causing errors. I think it will have the unwanted side effect of slowing down the adoption of SQL 2008 though. Many IT shops will fight the change because they have to learn something new, or update broken procs.
Beyond referential data
Ok, this is a little weird to me. Microsoft talked for a while about how they are adding new spatial geometry to the core engine. This is to allow point, line, GPS type operations on data. They are creating system types to allow you to measure distance and speed, etc. Does this sound like a way to beat Google and their Google Earth SQL demo in some artificial benchmark? Microsoft had (might still have) the Terradata project to show maps of the entire earth from with SQL Server as a huge demo of how far SQL Server could scale. The complete system cost millions and ran on something like 400+ machines. It was an impressive demo, and then Google came out and did it with Open Source and Google Maps. I think Microsoft is building in internal functions to make their own map apps run faster for a benchmark. This is one of the reasons I hate benchmarks – everyone loads the deck in their favor before they run. This went on to include stories about storing semi structured data like binary trees as a Hierarchy type, including the ability to sort the tree, insert, shift, etc. Sounds like another special purpose need that very, very few people are ever going to need but will boost some TPC benchmark. Maybe I misunderstood the purpose of these slides, but it sure seemed this way to me.
Filtered Indexes – cool new feature
One new feature I really liked was the ability to build partial indexes. Say you have an index on a table that has millions of rows. But you know you only need to query data for the past 30 days. Normally you would move the rest of the data out of the table to keep your indexes smaller. Now you can specify when you build the index what rowset you want to be used. The following will be legal.
- CREATE INDEX PastMonth on Products(ID, SALE) WHERE SalesDate > ‘2007/11/01’;
Your index will then be created based upon the where clause. All lookups can still be done on the table, but the non indexed data will result in row scans. An example was given where this can result in an index savings of 50% or more in a lot of scenarios where users know the ranges of their queries in advance. Another internal cool feature is that SPARSE data fields are no declarable. If you have a column in your database that is normally <null> you can declare it as SPARSE and no storage is taken up in the datarow for that column. All of the columns that do contain data are stored outside the datarow. This can result in big space savings in certain situations.
Wrapping up
Obviously this didn’t cover everything Microsoft has done in SQL Server 2008 (formerly known as Katmai). It was a nice high lever overview though. The webcast was recorded and should be up on MSDN webcast on demand in the next few days. I personally don’t feel any of this is groundbreaking, it is mostly incremental changes for specific needs. This is what you expect from a very mature and stable product. They are making optimizations for usage cases they find a lot of customers use, and tweaking the engines with features where it makes sense.