Jun 292008
 

Enterprise applications store their data in a relational database. Our code reads the data stored in tables with many complex joins and business rule laden queries. We take the results of those queries and construct an equally complex business entity that is used by our application logic. Most developers, myself excluded, hate working with the database. Writing, modifying, or even seeing T-SQL causes some developers to itch. LINQ to SQL serves as a partially effective Hydrocortisone to relieve the itch. But they still need to maintain the schema, write SQL-mindful LINQ queries, and deal with the constant DataContext updates.

 

Imagine a world where you no longer need to translate your complex business entities to and from relational tables.  A world where there is no database backing store. A world where we create our business entities and store them in memory. Even better, in memory on a shared resource. Does it sound like an inconceivable futuristic developer heaven? Well it probably is, but this is really cool stuff in the works.

 

Enter the Microsoft project code-named “Velocity.” The blurb on the overview page reads:


“Velocity” is a distributed in-memory application cache platform for developing scalable, high-performance applications. “Velocity” can be used to cache any CLR object and provides access through simple APIs. The primary goals for “Velocity” are performance, scalability and availability.


I have been working with the Digipede Network, the leading grid computing software solution, for a few months. The Velocity architecture sounds remarkably similar to Digipede’s. I have seen the great benefits of the Digipede Network and have high expectations for Velocity.


The Digipede Network, for those of you that haven’t seen it yet, consists of a central Digipede Server and one or many Digipede Agents. The server receives client requests and assigns tasks to the agents. The client uses the Digipede API to communicate with the server. The API pretty much wraps client-to-server and server-to-client WSE2 web service calls. This architecture allows you to take almost any CPU-intensive process and spread the workload among tens or hundreds of commodity or server grade machines. The result is a very high performing and easily scaled system with few code changes from what you do today.


Digipede Network Diagram:


Digipede Network Diagram


Digipede only works in this configuration, while Velocity has two proposed deployment models. You can have a “caching tier”, similar to Digipede’s Server and Agent configuration, or you can house Velocity as a Caching Service directly in IIS7. I don’t know how communications will be handled between the client API and the “caching tier”, but I assume it will be some sort of service calls (WCF perhaps). All CLR objects stored in the Velocity cache must be marked [Serializable] just as task worker classes must be to work with Digipede.


The Velocity API looks simple enough too. It exposes intuitive Get() and Put() methods where you call the cache by name. I can see how versioning of the cached objects might get tricky. Your application will also need a new configSection that specifies the deployment mode, locality, and also contains the list of cache hosts. As this is a distributed solution, the standard virtual machine playground doesn’t work too well to really test this out.


This looks promising, and I’ll be following the progress of the project closely.


Download Velocity


Download the Velocity CPT 1 samples

 Posted by at 4:11 am
Jun 222008
 

A few months after SQL 2005 was released and hit the productions servers, some people started experiencing some odd behavior in their stored procedures. Simple stored procedures that normally return in 0 seconds would take upwards of a minute to return. Even more strange was the fact that the same query, outside of a stored procedure, would still return in 0 seconds.


It never affected me personally… until today. Three years late to the party. It’s funny how much more interested I am in the causes and solutions for this apparent problem when it affects me. “Parameter Sniffing” is the term Microsoft uses to describe the feature that causes this odd behavior. While it appeared as an issue when I encountered it today, I found that the feature is not only well-intentioned but quite useful.


The execution plan is generated and cached the first time your stored procedure is called. When the execution plan is being created, SQL Server reads the input parameters and uses them to optimize the execution plan for those parameters. This is called “parameter sniffing.” If the input parameters used in the first call to the stored procedure are atypical for the overall use of the stored procedure, a less than ideal execution plan will be cached for all subsequent calls.


Simply dropping and recompiling the stored procedure does not seem to affect the cached execution plan. Updating statistics on the tables used in the stored procedure will cause the execution plan to be regenerated on the next call of the stored procedure. However, if the same or similar atypical parameters are used on the first execution of the stored procedure, an equally sub-optimal execution plan will be cached.


You can turn off parameter sniffing. This is accomplished by assigning the input parameter values to local variables inside the stored procedure and then using the local variables within the stored procedure. When the execution plan is created, SQL Server will look at the table statistics to optimize the query for the “average” use. It does this by looking at the tables used in the query and analyzing row counts, etc. to find a reasonable plan that will likely suit a majority of situations.


My stored procedure was bringing back multiple resultsets to be used to create a hierarchical structure in code. It works essentially like the following:


CREATE PROCEDURE [dbo].[usp_Order_GetOrderDetails]
(
   @StartOrderId INT,
   @EndOrderId INT
)
AS
BEGIN

   SELECT *
   FROM Order
   WHERE OrderId BETWEEN @StartOrderId AND @EndOrderId
 
   SELECT *
   FROM OrderLineItem
   WHERE OrderId BETWEEN @StartOrderId AND @EndOrderId
END


I was testing the stored procedure for full day using the same ID for @StartOrderId and @EndOrderId. Since the intended use of this stored procedure is almost always @EndOrderId = @StartOrderId + 1000, this makes a big difference when calculating the estimate number of rows returned. I forced SQL Server to assume that my execution plan should be based on an ID range of 1 instead of 1000. Turning off parameter sniffing lessens these effects.


To turn off parameter sniffing, it would look like this:


CREATE PROCEDURE [dbo].[usp_Order_GetOrderDetails]
(
   @StartOrderId INT,
   @EndOrderId INT
)
AS
BEGIN
   DECLARE @Start INT
   DECLARE @End INT
   SET @Start = @StartOrderId
   SET @End = @EndOrderId
 

   SELECT *
   FROM Order
   WHERE OrderId BETWEEN @Start AND @End
 
   SELECT *
   FROM OrderLineItem
   WHERE OrderId BETWEEN @Start AND @End
END


This immediately improved the performance of my stored procedure. The time to complete reduced from ~2 minutes to ~2 seconds for my typical 1000 ID range (I know 2 seconds is a lot, but these tables have millions and millions of rows). But only one piece of code in the application calls this stored procedure, and 99 out of 100 times it will have a range of 1000 IDs. Why would I want SQL Server to guess how many Orders I will typically bring back when I know the exact number?


I should have the optimal execution plan if I update statistics on Order and OrderLineItem, and then call usp_Order_GetOrderDetails 1, 1000 after I compile this stored procedure. This sounds like a lot of work to me, and I did not notice any performance boost by doing this. I chose to leave parameter sniffing off.


The only drawbacks to turning off parameter sniffing is the weird looking SQL and the inevitable questions during code review about the crazy input parameter to variable mapping. But when you school the doubters on the causes and effects of parameter sniffing, it will put another notch in your guru stick.


From what I have read, this was not a new feature in SQL 2005. I can’t, however, find any mention of it in SQL 2000 books online, and this feature never showed its face in SQL 2000.

 Posted by at 2:24 am
Jun 192008
 

Juval Löwy mentioned the Microsoft Service Trace Viewer in a webcast today. If you ever wondered exactly what WCF does under all of those covers, check this out.

First things first. Enable tracing on the client and host applications using the WCF Configuration Editor. Enable the verbose trace level and check all of the listener settings. This will add all of the necessary <system.diagnostics> settings in your config file. The next time you start each of the applications, a .svclog file will be created that will be used by the Service Trace Viewer.

Start your host, start your client, run through the test cases that you want to analyze in the viewer. After your test run is complete, open the viewer, located at C:\Program Files\Microsoft SDKs\Windows\v6.0A\bin\SvcTraceViewer.exe. “Open” the host.svclog file, and then “Add” the client.svclog file. Both “Open” and “Add” are menu items under “File”.

Start on the Activity tab, look through the host and client activities that occurred. Everything from ServiceHost construction through ServiceHost closing shows up. This is very cool, especially when analyzing the differences between different security, session, and reliability settings.

When you are done looking through the activities, check out the Graph tab. Here you can look at the interactions between the client and host, as well as looking at the details of each activity (at the top right). At the bottom right, you will also notice the formatted and xml details of this activity.

This is a very cool tool for both debugging and training. Below is my lame test projects, if you want to skip past the configuration and check out the tool. My .svclog files are located in the Client and Host folders.

SvtTest.zip (190.32 KB)

Enjoy! Thanks to Juval for the direction.

 Posted by at 2:15 am