I got a query from a customer this week that was taking a long time to run. When I first looked at the query my first thought was that the indexes were wrong, or just the joins across 4 tables were killing the performance. But almost all of the lookup tables were really simple value lookups.
The submitted query took 1 minute 55 seconds on my machine with his data. After rewriting the query, and adding a few indexes I was able to get it to run in 0.86 seconds. That is an AMAZING performance change for a fairly simple rewrite of the query.
Problem Query using multiple tables with no joins
I have rewritten the table and column names to make this more generic.
SELECT
VisitID as ID,
VisitDate,
Minutes,
RegionName,
Reason,
Name,
City
FROM
Visits,
Clients,
Regions,
VisitReasons
WHERE
VisitReasons.VisitReasonID = Visits.VisitReasonID AND
Clients.ClientID = Visits.ClientID AND
Clients.RegionID = Regions.RegionID AND
Visits.VisitDate >= '2009/01/01' AND Visits.VisitDate < '2010/01/01' AND
Visits.Available = 1
ORDER BY Visits.VisitID
Rewrite to use JOINs
Notice the FROM clause is grabbing from 4 different tables. This is a common Access style query. But what that is in effect going to do is SELECT * from each of those tables. The WHERE clause restricts the constraints, but by that point we have already loaded the complete schema for the table, indexes etc.
SELECT
VisitID as ID,
VisitDate,
Minutes,
RegionName,
Reason,
Name,
City
FROM
Visits
LEFT JOIN VisitReasons on VisitReasons.VisitReasonID = Visits.VisitReasonID
LEFT JOIN Clients on Clients.ClientID = Visits.ClientID
LEFT JOIN Regions on Regions.RegionID = Clients.RegionID
WHERE
Visits.VisitDate >= '2009/01/01' AND Visits.VisitDate < '2010/01/01' AND
Visits.Available = 1
ORDER BY Visits.VisitID
Notice the joins. Basically by only selecting FROM 1 table we are telling the engine it is the most important thing for us to load. Everything else falls from there. The other tables are joined, but with tight constraints so they only load the correct amount of data from a primary key.
Indexes too
The other change was to add an index on the columns in the WHERE clause. Those two columns were not indexed, and only the date one really made a big difference, but it is still a good idea to have the indexes there.
Why?
Why is this such a big deal? Well, SQL Server and Access both rewrite your queries for you when they notice things like this. That query optimization is one thing that we are still working on. It is really hard to get it right and not mess up the original intent of the query. This is an area I feel we still have a long way to go. But you can acheive these results through careful writing of your queries. If you have a query that seems to perform poorly, try to do it another way.
Simplified Database
I did simplify the database and schema to make sense for the purposes of this demo, but here is the database with just a few rows in it. The customers had about 100,000 rows total.
Sample Database
ExampleMultiJoin.vdb3 (316.00 kb)
Sample Queries
Blog-SampleQueries.vsql3 (902.00 bytes)