The GC does not solve all memory leaks

written by Jason Short on Thursday, October 18 2007

Using statements and memory leaks to databases

This question was recently asked in the forum and I see it so often in code samples sent in by users that I thought I would take a little time to discuss using statements and how they help solve memory leaks with database code.  I remember when I first started learning C# (it was called Next Generation Web Services – NGWS back then) I thought it was great that I would no longer have to worry about memory leaks.  They were a thing of the past.  Being a long time C and then C++ programmer I was delighted to learn that delete was a thing of the past.  I was more than willing to pay a small runtime cost for a garbage collector if I could stop worrying about memory leaks. Just a few years before that I had literally spent 4 man months tracking down a memory leak in a WIN32 C++ app.  And I think we would still be there looking for it if the product Bounds Checker had not been shown to us.  It was such a subtle leak, and only leaked about 8 bytes per minute under load.  The problem was that it was also leaking a handle, and in any version of Windows once all the handles are gone it does not matter how much ram you have, the app crashes. Enter the GC. 

I was delighted to write code and not worry about the objects I created.  You could still put a destructor on your classes, but it really didn’t seem necessary to me unless I had unmanaged resources.  About this time I wrote the first version of the Emerald Spam Shield filtration engine in C#.  The same code had taken almost six months to write in C++ and was very fragile to buffer overrun attacks and all sorts of code management problems. The C# code took just over three weeks. The rewrite was in part a way to get away from that fragile code and start clean, and the fact that it was in an environment that was free of the delete operator was an added bonus to me.  I have seen enough off by one and other pointer errors in my career to be happy they were gone.  No more having to worry about a rookie coming in behind my code (afterall, I would never leak memory) and forgetting to clean up after himself.

Reality sets in

Preliminary tests went well.  The C# version of the filter engine ran about 75% the speed of the C++ version.  The memory usage was about four times as much, but I was willing to pay that price for less headaches and faster development time.  After much internal testing, we deployed the first filter on a test machine and let it run over a weekend with actual spam flowing through it.  When I came in on Monday the machine had crashed with an out of memory error! How could this be?  I thought we were free from these memory concerns.  After spending the entire day trying to track down why it was happening we found the arch nemesis of Dot Net – arrays of strings were the culprit.  

Strings in Dot Net are immutable.  They can never, ever change.  So each time you touch one for any reason you are actually creating an entire new object and copying the old one to the new one.  It sounds great in theory, until you start looking at how many strings a mail filtering app uses.  We were creating over 60,000 temp string objects for a single email! Since that time I have learned many tricks to save temp object creation (StringBuilder is your friend, and so are RegEx expressions).  We eventually got the number of temp objects down to under 300 per email, but the single largest factor in releasing memory was setting things to null.  If you build an array of strings and fail to go through the complete array and set each entry to null the memory can hang around for a VERY long time.  Allegedly much of this has been fixed in Dot Net 2, but I have never taken to the time to recreate our experiments. The problem is object references. Take an array of objects and reference them in another object like this:

SpamReasons sr = new SpamReasons(); sr.add( ReasonsFound[i] );

Looks simple enough right?  Not so fast, I just created a reference to that ReasonsFound array inside a SpamReasons object.  Complex objects are always passed by ref.  So by calling that add() function I thought I was safely making a copy of the reasons, not so.  I was actually building a chain of references. So later on when I let the ReasonsFound array fall out of scope the entire array was being held in RAM because a few of the items in it were being held.

Database objects are just as bad

Take a look at this piece of code below.  It will use over 800MB of RAM, by making a few simple changes in the code we can correct that use only 28MB.

public static void LeakMemory(string dbname) 
{    VistaDBConnection sqlconn = new VistaDBConnection();    
sqlconn.ConnectionString = "Data source=" + dbname;    
sqlconn.Open();    
// Loop through users purchases and print out the PDF for each purchase    
VistaDBCommand cmd = new VistaDBCommand("select userid from purchases", conn);  
VistaDBDataReader OuterDR = cmd.ExecuteReader();  
while( OuterDR.Read() )  
{       CustomerObj customer = new CustomerObj();       
customer.CustomerID = (int)OuterDR["userid"];       
// Find all the customers invoices for their registrations       
DbCommand Cmd1 = sqlconn.CreateCommand();       
Cmd1.CommandText = "SELECT * from sales where salesid = " + customer.CustomerID;       
DbDataReader Rdr1 = Cmd1.ExecuteReader();       
while( Rdr1.Read() )       
{            InvoiceObj inv = new InvoiceObj();           
inv.SKU = (string)Rdr1["SKUSOLD"];            
inv.Serial = (string)Rdr1["SERIALNUM"];            
inv.RegDate = (DateTime)Rdr1["SALESDATE"];            
inv.InvoiceID = (int)Rdr1["INVOICEID"];            
inv.CustomerID = (int)custid;            
customer.Invoices.Add(inv);       
  }       
// Other logic for printing out PDF's...   } }

FIX: Add a simple using() statements to the VistaDBCommand, and each of the Datareader objects. What is happening here?  The inner Cmd1 and Rdr1 will leak memory like crazy.  By adding a using statement around these two lines you will force the GC to Close and Dispose of the objects in a timely fashion. And just so you know this is actual code sent in on a bug report for VistaDB leaking memory.  It sure looks simple doesn’t it?  The app was running out of RAM on a 1 GB machine and took over 25 minutes to complete on a larger RAM machine.  By making the changes to add using statements the code never used more than 28MB of RAM, and completed in less than 25 seconds.

What you write impacts how well VistaDB performs

This is probably the number one type of support ticket we get on an ongoing basis.  I hope you can see that what you write in your app can dramatically impact how well VistaDB runs.  Why doesn’t this happen with SQL Server or SQL CE?  They are not managed code, and are not in your process space.  Many users send code like this and then point to another database as “proof” it is not their code. The same example above will use up to 290MB of RAM when connected to a SQL Server without the using statements.  The difference is that VistaDB is running the same process as the application, and the GC does not clean up either one.  SQL Server is a separate process, and manages memory in a totally different way.

The GC does not solve all your memory problems

Would the GC eventually clean up the memory above?  Yes, if the CPU was released with a Thread.Sleep() or some other interrupt it might cleanup.  But you are forcing it to work a lot harder, and your memory usage will go up and down like a ramp.  If you see steady climbing of memory usage in your app followed by a sudden drop to a much lower level there is a good chance you are missing an opportunity to clean up a resource.

Tools to watch your memory J

ust because we run in a managed runtime does not mean we can ignore memory usage today.  In fact I now think it is harder for junior programmers to learn the memory tricks because they have no concept of how the heap works, or how references can pin things in RAM.  The GC is only as good as the programmer using it.  Today I personally use RedGate Ants to monitor memory usage for my applications.  It is a great tool for watching temp objects, number of objects allocated, and number of objects still live in the system.  It is interesting to note that many times even Ants cannot determine who leaked the memory, but you can usually figure it out with a little detective work against your app.

Similar Posts

  1. Set any 2008 Resolutions or goals?
  2. SQL Server 2008 (Katmai) Information
  3. Looking ahead to 2008, and back at 2007

Comments are closed

Options:

Size

Colors