VistaDB.Net Logo

The right SQL goes a long way

by Jason Short 22 May 2009

I got a query from a customer this week that was taking a long time to run.    When I first looked at the query my first thought was that the indexes were wrong, or just the joins across 4 tables were killing the performance.  But almost all of the lookup tables were really simple value lookups.

The submitted query took 1 minute 55 seconds on my machine with his data.  After rewriting the query, and adding a few indexes I was able to get it to run in 0.86 seconds.  That is an AMAZING performance change for a fairly simple rewrite of the query.

Problem Query using multiple tables with no joins

I have rewritten the table and column names to make this more generic.

SELECT  
  VisitID as ID, 
  VisitDate, 
  Minutes, 
  RegionName, 
  Reason, 
  Name, 
  City 
FROM 
  Visits, 
  Clients, 
  Regions, 
  VisitReasons 
WHERE 
VisitReasons.VisitReasonID = Visits.VisitReasonID  AND 
Clients.ClientID = Visits.ClientID AND 
Clients.RegionID = Regions.RegionID AND 
Visits.VisitDate >= '2009/01/01' AND Visits.VisitDate < '2010/01/01' AND 
Visits.Available = 1 
ORDER BY Visits.VisitID

Rewrite to use JOINs

Notice the FROM clause is grabbing from 4 different tables.  This is a common Access style query.  But what that is in effect going to do is SELECT * from each of those tables.  The WHERE clause restricts the constraints, but by that point we have already loaded the complete schema for the table, indexes etc.

SELECT 
  VisitID as ID, 
  VisitDate, 
  Minutes, 
  RegionName, 
  Reason, 
  Name, 
  City 
FROM 
  Visits 
    LEFT JOIN VisitReasons on VisitReasons.VisitReasonID = Visits.VisitReasonID 
    LEFT JOIN Clients on Clients.ClientID = Visits.ClientID 
    LEFT JOIN Regions on Regions.RegionID = Clients.RegionID 
WHERE 
  Visits.VisitDate >= '2009/01/01' AND Visits.VisitDate < '2010/01/01' AND 
  Visits.Available = 1 
ORDER BY Visits.VisitID

Notice the joins.  Basically by only selecting FROM 1 table we are telling the engine it is the most important thing for us to load.  Everything else falls from there.  The other tables are joined, but with tight constraints so they only load the correct amount of data from a primary key.

Indexes too

The other change was to add an index on the columns in the WHERE clause.  Those two columns were not indexed, and only the date one really made a big difference, but it is still a good idea to have the indexes there.

Why?

Why is this such a big deal?  Well, SQL Server and Access both rewrite your queries for you when they notice things like this.  That query optimization is one thing that we are still working on.  It is really hard to get it right and not mess up the original intent of the query.  This is an area I feel we still have a long way to go.  But you can acheive these results through careful writing of your queries.  If you have a query that seems to perform poorly, try to do it another way.

 

Simplified Database

I did simplify the database and schema to make sense for the purposes of this demo, but here is the database with just a few rows in it.  The customers had about 100,000 rows total.

Sample Database

ExampleMultiJoin.vdb3 (316.00 kb)

Sample Queries

Blog-SampleQueries.vsql3 (902.00 bytes)

Comments

23 May 2009 #

trackback

The right SQL goes a long way

You've been kicked (a good thing) - Trackback from DotNetKicks.com

DotNetKicks.com

23 May 2009 #

js_vistadb

I did not include the customers schema or data in this sample.  It was totally rebuilt just for this sample.  I only included a few rows in the sample, so the performance difference is not really visible.  But when the row count starts to climb it makes a big difference.

js_vistadb United States

23 May 2009 #

Kendall Miller

This mirrors my general experience with SQL Queries (whether using SQL Server or not):  You're either fine, or hosed. At my last company we had a massive production SQL server - 8 processors, 32GB of RAM, EMC SAN...  And it could scale like nobody's business, but we'd have queries that took 10 minutes on our cert environment (fairly weedy little server) that would take 5 minutes on the big boy.  In short, our experience was that if you messed up the query, no amount of hardware would save you.  If you had the query written so it was reasonable on a normal server it would be awesome on the big server.  That's one of the really interesting aspects of set-based systems (like SQL).

Because of this "good or dead" aspect, I always preferred to write queries with explicit joins and otherwise be as clear as possible about what was expected to make the optimizer's job as easy as possible.  It just isn't worth getting a stored query plan that reflects a bad guess.

Kendall Miller United States

23 May 2009 #

Yianni Bourkelis

Thank you Jason for sharing this with us.

Before some months I was also trying to optimize a complex SQL query.

In the WHERE clause I had 4 columns.

Setting only one column as an index made the query perform better than setting all 4 (or 3 or 2) columns as indexes.

Yianni Bourkelis Greece

Comments are closed

Powered by BlogEngine.NET and VistaDB

Log in