Tuesday, January 25, 2011

3 fast & easy ways to accelerate database operations when using LINQ & Entity Framework

Hi all,
Lately I had to handle a very large database.
I have been using Entity Framework model and LINQ queries, and encountered some performance difficulties. I didn't had time to start writing caching wrappers, so I came up with 3 quick main solutions:

Reading table rows was slow:
1. First thing was pretty obvious and probably known to you all: Indexing my table.
Indexing improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.
Indexing is based on sorting the table, so INSERT (or UPDATE of the index key) will cause the database engine to search the right place for the new record.
Use it when your application performs massive read operations against a few write operations.
Although indexing can be useful, it's important to follow those configuration guidelines (also considering whether to define Clustered or Non-Clustered index):
http://msdn.microsoft.com/en-us/library/ms179560.aspx


2. Most people don't know about a great .NET class named "CompiledQuery".
As you probably know, each time you run a LINQ query it is being translated to a SQL Statement. So if the translation engine (SQOT - Standard Query Operator Translation) translates the same query each time, why not doing it only once?
That's when "Compiled Query" comes into play.
Suppose you need to get a city by city ID many times, This is how it gets done:
public static readonly Func<MoviesContext, string, IQueryable<City>> CityByID =
CompiledQuery.Compile((MoviesContext context, string cityID) =>
context.Cities.Where(p => p.CityID == cityID));
First, you declare the method delegate as 'static' because all of the threads are using the same compiled query. Second, it's better to declare it as readonly - because the declaration does not suppose to change across the program.
Let's take a look at the Func arguments. The first argument is the type of the Entity Framework model, the second represents the city ID type, and the last one is the return type (query result type).
"CityByID" will be used to call the compiled query method.
(MoviesContext context, string cityID) - Sending the EF context and city id to the compiled query method.
context.Cities.Where(p => p.CityID == cityID) - Performing the query
CityByID(new MoviesContext(), id); - Calling the compiled query
Important: Using methods that cause a change in the query (For example: First(), FirstOrDefault and so on) will produce a not compiled query, and will take more time rather than using a regular LINQ with those kind of methods. You can solve it by doing: compiledQuery.AsEnumerable().First();
That way, the compiled query is being performed on the database (LINQ to SQL), and only afterwards selecting the first element (LINQ To Entities).

Inserting table rows was slow:
3. Try to minimize use of the famous ObjectQuery.SaveChanges() method (Reminder: This method persists all changes in the EF object context to the database) .
Suppose you have to add some new cities records to your database, It will be slower doing it that way:
foreach (City city in cities)
{
context.Cities.AddObject(city);
context.SaveChanges();
}
rather than keeping the "SaveChanges()" method out of the foreach loop:
foreach (City city in cities)
{
context.Cities.AddObject(city);
}
context.SaveChanges();
When SaveChanges() gets called it creates an IEntityAdapter object if one doesn't already exist. Each call will initiate database connection.
Now a little bit off-topic in order to explain how SaveChanges method is working:
After the provider creation, it generates many SQL Commands with the changes.
For example: after inserting one row to city table, I've looked at the SQL Profiler and found out the generated SQL Command:
"exec sp_executesql N'insert [dbo].[City]([CityID], [CityName])
values (@0, @1)
',N'@0 nchar(10),@1 nchar(10)',@0=N'8b3f07 ',@1=N'fde39c3 '"
In conclusion: Use less calls to SaveChanges in order to decrease the amount of round trips to the database.
You can read about SaveChanges() performance comprasion over here:

To sum up I have made performance tests (considering 2,3 paragraphs on a non-indexed table):
1. Inserting 10,000 city rows: SaveChanges() after each insert to the object context VS. SaveChanges() after insert all cities to the object context.

Inside Loop: 00:01:04.3431638
Outside Loop: 00:00:17.4012899

2. Getting 10,000 different cities (CityID by CityName): Using CompiledQuery VS. Not using CompiledQuery
Compiled: 00:56.025
Not Compiled: 00:01:19.989

Thank you Blogger, hello Medium

Hey guys, I've been writing in Blogger for almost 10 years this is a time to move on. I'm happy to announce my new blog at Med...