Sometimes I scratch my head at conventional wisdoms for database applications. Often they are based upon old concepts from xbase systems, and sometimes they are programming tasks that people just learned to do one way and never want to change.
Transactions are like that
Here are some situations to ask about transactions. Would you do any of the following?
- transaction around a select * from a table?
- transaction around a single insert to a single table
- transactions around single insert with related child to second table
- transactions deposit and laon modifications
- Schema changes
Answers to the above scenarios:
- Never - does nothing buy you anything because a lot of spin up and tear down happens for no reason.
- Never - Can't think of any scenario where a single failure can't be caught by a try catch (concurrency) and you can retry or log the error. Nothing actually went wrong.
- Yes - If the second table could have bad referential integrity due to the first insert failure.
- Yes - Classic example if the deposit fails the loan table needs to get that information.
- No - Schema changes either succeed or fail. Do not put them transactions. You can't make 3 sets of schema changes and then roll them back. The engine definately does not support that scenario.
Some ORM tools don't think about what they are doing
A recent sample from a user using an ORM tool showed me all of the above scenarios wrapped in transcations. And of course the user doesn't want to think about these issues because they have an ORM. They expect the ORM to deal with them. Doesn't always work that way.
An interesting twist in the ORM tool output was the scenario number 4 above. The insert had a transaction, but it was split into two of them (one for each table). The first transaction opened and set a flag if it failed to commit. Then opened a second one and inserted anyway, and at the end checked a variable to see if it should commit or rollback. The entire chain of events could have stopped at the first failed insert. Not to mention two transactions in this just killed the entire purpose of them being in a transaction. One operation was already committed before the other even started. What would happen if the second insert failed? The first could no longer be rolled back.
I think the lesson here is that ORM tools abstract you from your database, but that does not mean they have thought through all the business case scenarios. You still need to check that output and how it operates for failure cases. Don't assume they know how to handle it for you magically, it will come back to haunt you later.
Performance
In this particular case I did some testing and turned off transactions everywhere in the engine with a new flag on the ConnectionString. (I think we will add this in 3.6 for some other reasons I will mention below)
The operation involved 9 threads each doing 100 inserts into a database. Each loop through the inserts are to be worst case spin up per loop (everything is torn down and rebuilt on purpose).
My initial test from the sample was to just run it.
- Locking and transactions on : 3 mins 31 sec
- Second test with no transactions shared access Pooling=false: 29 sec
- Test with no Transactions shared access Pooling = True: 17 sec
- Test with no Transactions exclusive mode: 8 seconds
That means the locking and transactions introduced a huge amount of overhead to the system. Especially when the transactions were almost never needed due to the way the code was written.
Ouch
Interesting to note that only scenarios 1 and 2 could have been possible with SQL CE due to multi process locking issues. In fact most embedded database engines would have just locked the entire database at many of these to avoid the complexity of multi user updates.
Multi user file locking hurts performance badly. In this case I am making a change for this user to allow him to set all transactions off and continue running. It will probalby speed up his app 50% of more from where it is today. We may introduce this as an option in 3.6 to allow others to run in this mode as well.
Entity Framework has similar problem
Our work with the Entity Framework has similar problems. They do wrap everything in transactions (even when not needed), but sometimes they are needed. Performance in EF has been slow, and I am wondering how much of it is due to this transaction issue.
Some general rules of thumb for VistaDB
Go EXCLUSIVEREADWRITE whenever you can. This is the fastest mode. When you do need a transaction, make sure it is for balanced inserts in multiple tables. Do not put things like schema changes in transactions - they will either succeed or fail, there is no way to rollback schema changes.
Locking changes coming from 4.x line might give us some relief at this level, but we have a long ways to go before we will push those changes live.
EDIT:
This post went live before I was done, sorry. I often make multiple edits to a post before hitting Publish, but it is a confusing step in Graffiti to save as a draft when you are editing.