Generics and Dictionaries are fast data caches
There is a point when we would all love to go back and fix old things we have written. Maybe you learn a new language and wish you could go back and rewrite that old project in it, or how to use a new library that would have improved an old project. Generics and the generic collection objects are like that for me.
I have been spending a lot of time lately refactoring (err, rewriting) pieces of code that have been in production for a long time. Some of this code has been in production for 5+ years with no problems. Hardware has gotten better, performance has always been good enough for the hardware. Then the generics bug hit me.
Scaling up hurt this class bad
I have a class that tracks most recently used item in a cache. It is fairly simple and works fine for the size cache I needed. Now suddenly due to a change in requirements I needed to scale that cache to about 5x the current size, the old class didn't meet the need any longer. In fact it scaled horribly once I had more than doubled the current cache size. Was this the case of a bad algorithm for the need, or was it the implementation holding me back?
Generics to the rescue
Armed with my newfound respect for generics from the engine core of VistaDB I decided to rework that class to use generics. It was using an IList and HashTables internally to store the keys and values. After about three hours of work I had a bright new shiny class all written in a generic fashion that used the Dictionary internally and a LinkedList<> generic for the objects.
Testing the old and new head to head (caching the exact same objects and performing the same lookups) was quite an amazing comparison. This is using the same algorithm, just implemented using a different programming methodology.
23x improvement!
Yep. 23 times faster. Both in C#, but one using what I now call 1.1 style coding, the other written using generics and generic collections. The test involved caching 10,000 items (custom class that contains complex fields), and then performing 20,000 lookups across the data with some present and some missing. This stressed all parts of the class, loading, expiring, lookup.
Old MRU: 176639 ms
New MRU: 7580 ms
Wow. That is about all I can say. I was already a huge fan of generics and generic collections, but they are so far ahead of the older collection objects I think you should stop using them entirely.
VistaDBTypes are very similar to this
This does not directly relate to anything in the engine, it is a single class from another project that I was bringing up to date. But it did make me think long and hard about our current VistaDBTypes class. Currently we have a series of custom classes that are implemented from interfaces and base classes. Those columns are then put into collections (non generic) and looked up heavily as the engine runs through the rows.
I think if we were to re-write the VistaDBTypes to be true generics (from a struct as their base to avoid reference count problems) we could then use generic collections internally to allow much faster looping and access to the data.
The only problem is that changing that part of the code is a super major big deal. Not something we will do for 3.4 (too much risk). But I think we may fork a branch in the revision control system to test those theories. Maybe for 4.0....
Similar Posts
- Strongly typed performance gains
- Looking ahead to 2008, and back at 2007
- Set any 2008 Resolutions or goals?
