VistaDB.Net Logo

Transactions can hurt performance

by Jason Short 24 April 2009

Sometimes I scratch my head at conventional wisdoms for database applications.  Often they are based upon old concepts from xbase systems, and sometimes they are programming tasks that people just learned to do one way and never want to change.

Transactions are like that

Here are some situations to ask about transactions.  Would you do any of the following?

  1. transaction around a select * from a table?
  2. transaction around a single insert to a single table
  3. transactions around single insert with related child to second table
  4. transactions deposit and laon modifications
  5. Schema changes

Answers to the above scenarios:

  1. Never - does nothing buy you anything because a lot of spin up and tear down happens for no reason.
  2. Never - Can't think of any scenario where a single failure can't be caught by a try catch (concurrency) and you can retry or log the error.  Nothing actually went wrong.
  3. Yes - If the second table could have bad referential integrity due to the first insert failure.  
  4. Yes - Classic example if the deposit fails the loan table needs to get that information.  
  5. No - Schema changes either succeed or fail.  Do not put them transactions.  You can't make 3 sets of schema changes and then roll them back.  The engine definately does not support that scenario.

Some ORM tools don't think about what they are doing

A recent sample from a user using an ORM tool showed me all of the above scenarios wrapped in transcations.  And of course the user doesn't want to think about these issues because they have an ORM.  They expect the ORM to deal with them.  Doesn't always work that way.

An interesting twist in the ORM tool output was the scenario number 4 above.  The insert had a transaction, but it was split into two of them (one for each table).   The first transaction opened  and set a flag if it failed to commit.  Then opened a second one and inserted anyway, and at the end checked a variable to see if it should commit or rollback.  The entire chain of events could have stopped at the first failed insert.    Not to mention two transactions in this just killed the entire purpose of them being in a transaction.  One operation was already committed before the other even started.  What would happen if the second insert failed?  The first could no longer be rolled back.

I think the lesson here is that ORM tools abstract you from your database, but that does not mean they have thought through all the business case scenarios.  You still need to check that output and how it operates for failure cases.  Don't assume they know how to handle it for you magically, it will come back to haunt you later.

Performance

In this particular case I did some testing and turned off transactions everywhere in the engine with a new flag on the ConnectionString.  (I think we will add this in 3.6 for some other reasons I will mention below)

The operation involved 9 threads each doing 100 inserts into a database.  Each loop through the inserts are to be worst case spin up per loop (everything is torn down and rebuilt on purpose).

My initial test from the sample was to just run it.  

  • Locking and transactions on :   3 mins 31 sec
  • Second test with no transactions shared access Pooling=false:  29 sec
  • Test with no Transactions shared access Pooling = True:  17 sec
  • Test with no Transactions exclusive mode:  8 seconds

That means the locking and transactions introduced a huge amount of overhead to the system.  Especially when the transactions were almost never needed due to the way the code was written.

Ouch

Interesting to note that only scenarios 1 and 2 could have been possible with SQL CE due to multi process locking issues.  In fact most embedded database engines would have just locked the entire database at many of these to avoid the complexity of multi user updates.

Multi user file locking hurts performance badly.  In this case I am making a change for this user to allow him to set all transactions off and continue running.  It will probalby speed up his app 50% of more from where it is today.  We may introduce this as an option in 3.6 to allow others to run in this mode as well.

Entity Framework has similar problem

Our work with the Entity Framework has similar problems.  They do wrap everything in transactions (even when not needed), but sometimes they are needed.  Performance in EF has been slow, and I am wondering how much of it is due to this transaction issue.

Some general rules of thumb for VistaDB

Go EXCLUSIVEREADWRITE whenever you can.  This is the fastest mode.  When you do need a transaction, make sure it is for balanced inserts in multiple tables.  Do not put things like schema changes in transactions - they will either succeed or fail, there is no way to rollback schema changes. 

Locking changes coming from 4.x line might give us some relief at this level, but we have a long ways to go before we will push those changes live.

EDIT: 

This post went live before I was done, sorry.  I often make multiple edits to a post before hitting Publish, but it is a confusing step in Graffiti to save as a draft when you are editing.

Comments

24 April 2009 #

Good article and very true, transactions management does hurt performance and it should not be used carelessly.


25 April 2009 #

It is always a pleasure to read you Jason.


In EXCLUSIVEREADWRITE mode, in case the application quits unexpectedly, leaving the connection open, does the lock remains?


25 April 2009 #

admin

Hello!


In Exclusive mode the locks are all handled as soft locks in RAM.  So you have no issue with the locks being left as they are not kept on disk.  This makes it easier to recover in some situations.


admin

25 April 2009 #

admin

How would it be wrong?  And how could the trigger with a failure not be considered an error period.  In our case the insert would not complete if the trigger is a part of the insert (On Insert) and it fails.  I would have to see some edge case to make the case one way or the other.  But saying to wrap everything in a transaction "just in case" is like saying to put try catches in every function in your entire app - EVERYWHERE just in case.  Your app will suffer, no inlining, poor stack performance, lots of heap allocations, etc.  You put try catch blocks around points of failure.  Not everywhere just in case.


If the trigger fails what is the EF going to do any differently?  


Throw and error to your app saying "Something failed, good luck".  The EF doesn't have any knowledge about what you intend to do with a failure either.  It throws them all up and lets your app determine the corrective measure.


admin

25 April 2009 #

What about triggers? The EntityFramework has no way of knowing if there are triggers in your DB, so even a single insert statement has to be inside a transaction to ensure consistency. Of course most of the time it's not required, but better be slow than wrong.


Comments are closed

Powered by BlogEngine.NET and VistaDB

Log in