To Collect or not to Collect
One of the first things it seems most people do when they get the code to VistaDB is remove the GC.Collect() call we have in the MinimizeMemory code for the Tree class. This class specifically is used to hold portions of the database, indexes, etc in RAM. Periodically it is combed to attempt and minimize the number of nodes kept in RAM. Does it run faster? Yep. But not as much as you think. Does it use more memory? Yep. A LOT more than you may think.
Rule #1 – never call GC.Collect()
That is sort of like saying never use unmanaged code, or never use DLL Import. It is not a concrete rule, and Microsoft agreed with us that we should use it in this instance.
The 2.0 GC is confused
The GC under Dot Net 2 is quite easily confused by complex objects. In fact the SP 1 for the Dot Net 2 framework deals with some specific memory pinning issues we reported early this year (I am sure others did as well). Microsoft acknowledged the problem to us, and it took almost 7 months before we got a service pack for them (and you guys complain about having to wait two week s for a new build). Microsoft does not say you should never call the GC.Collect. In fact it is a best practice if you know you just released a large number of objects that are still in Gen 0 and can be cleaned up quickly, rather than waiting until they hit Gen 1 or 2 which is a lot more expensive to clean. In our case usually when we call the Collect function it is right after we have released potentially hundreds of complex objects out of the node tree structure. These objects are exactly the type of thing you want to get cleaned quickly. We used to run out of RAM quite easily on 1 GB machines. Was it due to us leaking memory? No, it was just a simple matter of a lazy GC combined with the fact that by the time the objects were getting to Gen 2 they were pinned forever in RAM. Under the Dot Net 2.0 (initial release) of the framework we would see situations where RAM was exhausted and the GC would still fail to release objects that were eligible.
Proof – or close enough
Ok, so how did we test this and prove it was a Garbage Collection issue? Well, the actual proof took a LOT of time to nail down and get them to believe us. But we basically wrote some NUnit tests to churn through objects and properly dispose them. The Garbage Collector would lose track of large complex objects, even when they were correctly assigned to null and cleared. I have a summary for the test below.
Dot Net 2 Runtime Results Without Collect Runtime: 259 Seconds Peak Memory: 289,742 Kb Ending Memory: 165,756 Kb With Collect Runtime: 290 Seconds Peak Memory: 68,788 Kb Ending Memory: 58,088 Kb
So we take a modest runtime hit of 10%, but we cut our memory usage by 400% at peak, and 200% at the end of the application run. I think you will agree that the runtime hit is well worth the decreased memory usage. Even if you think that the 10% is too a high a price remember that the memory would never be freed. Even if you were paging like crazy. I decided to re-run the test with the SP1 for the Dot Net 2 framework.
Without Collect Dot Net 2 SP1 Runtime: 249 Seconds Peak Memory: 106,928 Kb Ending Memory: 61,144 Kb
Hmm, now things are a lot closer. The speed is faster, and the memory usage is much, much better than it was before. This made me wonder about the performance of the GC under 3.5 using Visual Studio 2008. Same test, just now it is running a 3.5 version of VistaDB, and a 3.5 version of the test app.
Without Collect Dot Net 3.5 Runtime Runtime: 233 Seconds Peak Memory: 62,880 Kb Ending Memory: 70,824 Kb With Collect Dot Net 3.5 Runtime Runtime: 278 Seconds Peak Memory: 53,992 Kb Ending Memory: 56,300 Kb
Wow, it appears the GC under 3.5 is much improved over the 2.0 runtime! The runtime is also slightly faster as well. This makes me very happy indeed. Now maybe we can think about removing the calls to Collect under the 3.5 runtime since the memory difference is pretty minor.
Test, test, and retest
I am a very pragmatic programmer. I tend to disbelieve any absolute statement about something until I have tested it myself. I can remember in 1.0 the foreach loops were almost four times slower than making the for() loop yourself. I banned them and never used them in my apps. Then under 2.0 a close friend told me they were identical in speed, but I wouldn’t believe him until I found my old test app and ran it against the runtime. Sure enough, the foreach loop had been sped up to be almost identical to the hand coded loop. The next time you read a post from someone telling you to never do something in your code stop and ask yourself why. Did the person actually test it? Or are they passing along something that “everyone” knows but has not tested themselves? There are so many ways to do things in programming. That is one reason why programming is not a science; there is no one right answer. Take time to experiment occasionally and see if there are better ways to do the things you know. I think we will probably start to make a 3.5 framework build available starting with VistaDB 3.3. The runtime appears to be faster, and the Garbage Collector appears to be much improved as well.
Similar Posts
- The GC does not solve all memory leaks
- Set any 2008 Resolutions or goals?
- Looking ahead to 2008, and back at 2007
