Tuesday, September 23, 2008

Conserving Resources When Writing BLOB Values to SQL Server and Streaming BLOB Values back to the Client

 Reading and writing blob data to/from your database can be a resource hog, because it normally involves holding the entire stream in memory. Holding 10kb, 20, 100kb in memory might not be an issue, however as you start allowing larger file uploads, your application starts to feel the stress and easily starts to run out of resources.

Particularly, say you had a relatively small file (2mb) your allowing users to upload from your pages. User A uploads, it's only 2mb, you have not exceeded your memory allocation yet. However, as more users try to upload simultaneously, now your allocating more memory slots at a time, eg. if they were 5 simultaneous users, that's 2*5 = 10mb, 100 users, then it's 2*100.

This is all too stressful on your webserver, or atleast was until ASP.NET 2.0. Since, things have changed and now asp.net supports setting a diskThreshold that allows you to specify the amount of data buffered in server memory for the request. So, say you wanted to keep the default diskThreshold of 256 bytes, when the file is larger than 256bytes, asp.net begins to write the filestream to disk Versus keeping it in memory. This is just great because i remember back in the day, this was quite annoying. Glad it's fixed. This means we do nothing as far as the upload is concerned from the client to your webserver.

Still, if we are storing the stream in SQL Server, now we have a whole new scenario, and the question remains, how do we stream this data to sql server while conserving memory ? Should we stream the entire data as is in one lump ? Wont that allocate memory on our database server before writing to the database field ? This is what i want to address. Basically, what we want to do is save our file stream to sqlserver in smaller chunks, that way our database server will allocate only enough memory for that little chunk before writing it to the database field, in this way conserving memory at the cost of write speed(no free lunch, i'm afraid).

However, the price to pay in latency for streaming chunks of data to sql server might be well worth, because the alternative is to tax your database server when memory is available and to crash it (I assume it may crash when it runs out of memory. I have not tried personally. You may try that yourself if your feeling curious. Do not forget to share your findings if you do :P)

Take, SQL SERVER Express edition as an example which has the imposed limit of 1gb memory. That's not a whole lot is it ? Streaming the data to sql server in chunks is actually not as taxing as it may seem. True, the write performance is degraded but you can regulate it by increasing/decreasing the size of your chunk to find a compromise.

So, how do we do this ? SQL Server 2005 has a new feature that enables it to update a field with .WRITE ; as you stream your chunks it just appends it to the existing field. You can read about it here : http://msdn.microsoft.com/en-us/library/ms177523.aspx

To Quote the documentation from the above link :
Because the .WRITE clause cannot be used to modify a NULL column, the column is first populated with temporary data. This data is then replaced with the correct data by using the .WRITE clause

Now, that's not cool! We can work around that nicely though. Ineffect, it does not even feel like were working around anything. I've basically followed microsofts own documentation located at : http://msdn.microsoft.com/en-us/library/aa719771.aspx, which had the proper code, however, instead of .Write, it uses an older method that is not compatible with  the newer varbinary,varchar(max),nvarchar(max) datatypes supported by SQL Server 2005  as opposed to the older not recommeded types -> image, text, and ntext.

Following is an extract of the code i use to stream chunks of data to sql server using the .Write method. The purpose of the sample code is to display how you can go about writing your own ofcourse. I have just ripped the piece of code out of my DAL with a few minor adjustments, so just running a copy/paste of the code below will or may not yield much depending on your level of expertise. I am just way too lazy to write a completely contained example so bear with me. As it is, this article has gotten longer than I want it to be.

public void InsertFile(string applicationName, Stream data,
  string mimeType, string fileName)
 {
  int fileId = -1;
  using (SqlConnection myConnection = GetSqlConnection())
  {
   myConnection.Open();
   SqlCommand cmd = new SqlCommand(DbOwner +
    ".DataFile_Insert", myConnection);
   cmd.CommandType = CommandType.StoredProcedure;
   cmd.Parameters.Add("@applicationName",
    SqlDbType.NVarChar, 256).Value = applicationName;
   cmd.Parameters.Add("@mimeType",
    SqlDbType.NVarChar, 50).Value = mimeType;
   cmd.Parameters.Add("@length",
    SqlDbType.Int, 4).Value = data.Length;
   cmd.Parameters.Add("@fileName",
    SqlDbType.NVarChar, 256).Value = fileName;
   SqlParameter fileIdParam =
    cmd.Parameters.Add("@fileId", SqlDbType.Int, 4);
   fileIdParam.Direction = ParameterDirection.Output;
   cmd.ExecuteNonQuery();
   // now insert in chunks
   fileId = (int)fileIdParam.Value;
  }
  if (fileId > -1)
  {
   //The size of the "chunks" 
   // 128 bytes, regulate at will
   InsertFileByChunks(fileId, data, 128);
  }
 }

 note how we are calling InsertFileByChunks from our insert method above ?
public void InsertFileByChunks(int fileId, Stream data, int bufferLen)
{
   using (SqlConnection myConnection = GetSqlConnection())
   {
      myConnection.Open();
      SqlCommand cmd = new SqlCommand(DbOwner +
         ".DataFile_InsertChunk", myConnection);
      cmd.CommandType = CommandType.StoredProcedure;
      cmd.Parameters.Add("@fileId",
      SqlDbType.Int, 4).Value = fileId;
      SqlParameter paramData = cmd.Parameters.Add("@data",
      SqlDbType.VarBinary, 128);
      SqlParameter paramOffset = cmd.Parameters.Add("@offset",
            SqlDbType.BigInt);
      cmd.Parameters.Add("@length",
         SqlDbType.Int, 4).Value = bufferLen;
      using (BinaryReader br = new BinaryReader(data))
    {
         byte[] buffer = br.ReadBytes(bufferLen);
         int offset = 0;
         while (buffer.Length > 0)
       {
          paramData.Value = buffer;
          paramOffset.Value = offset;
          cmd.ExecuteNonQuery();
          offset += bufferLen;
          buffer = br.ReadBytes(bufferLen);
       }
    }
    data.Close();
}
}

 As you can see the above method just writes chunks of data at a time(the accompanying stored procedure DataFile_InsertChunk uses .Write to update the field with the new chunks as they come in).

And now the accompanying stored procedures DataFile_Insert and DataFile_InsertChunk :

CREATE PROCEDURE [dbo].[DataFile_Insert]
@applicationName nvarchar(256),
@mimeType nvarchar(50), 
@length int, 
@fileName nvarchar(256),
@fileId int output
AS
BEGIN
DECLARE @applicationId UNIQUEIDENTIFIER
SELECT  @ApplicationId = ApplicationId FROM dbo.aspnet_Applications 
WHERE LOWER(@ApplicationName) = LoweredApplicationName

INSERT INTO dataFile (applicationId, data, mimeType, [length],  
[fileName], dateCreated, lastDateModified)
VALUES (@applicationId, 0x0, @mimeType, @length, @fileName,  
GETDATE(), GETDATE())
SET @fileId = SCOPE_IDENTITY()    
END 

 Note above how we introduce the value 0x0 (null) as per the article on MSDN into the data field ? "data" is the name of the field that will be holding our data stream and is a varbinary type. The reason we introduce the value 0x0 is because .Write cannot write into a NULL field.  This is the workaround i was talking about earlier and it does not effect the appended chunks in anyway.

CREATE PROCEDURE [dbo].[DataFile_InsertChunk]
@data varbinary(MAX), 
@length int, 
@fileId int output,
@offset bigint
AS
BEGIN
UPDATE dataFile SET data.WRITE(@data, @offset, 
@length) WHERE fileId = @fileId    
END

 As you can see, it all boils down to this. Quite simple indeed. the .WRITE method takes as argument your chunk, the offset and the length. Not much for me to add here. Oh one thing you may also want to note is that .WRITE is called via the field to which it needs to write to. so the syntax takes the form of -> fieldName.WRITE(...) ; in my case the field name is "data" as you can note from the code in the stored proc above.

We have now successfully stored our file stream in chunks. Next we want to see how to retrieve our filestream in a similar manner, that way we do not hog resources on our webserver when binary content is requested, requiring us to stream it out from the database server to the webserver where our code streams it out again to the client. Firstly, I hope you are already familiar with an HttpHandler, if not you can look it up on google.

An HttpHandler/Module is general knowledge so i'm not going to get into it. I've mapped somefile.ashx to my HttpHandler, that way, every request that comes in will be handled by my custom handler, where i request the file stream from the database and then stream it out again in chunks to the client. My HttpHandler looks like this :

public class BinaryDataHandler : IHttpHandler
{
   public void ProcessRequest(HttpContext context)
   {
      int fileId;
      bool success = int.TryParse(
    context.Request.QueryString["fileId"],
      out fileId);
      if (success)
    {
       IDataReader r = FileManager.GetFileById(fileId);
       StreamData(r, context);
    }
}
   void StreamData(IDataReader r, HttpContext context)
   {
      if (r.Read())
    {
       DataFileItem dfi = new DataFileItem();
         dfi.Length = (r["length"] is DBNull) ?
            -1 : (int)r["length"];
         dfi.MimeType = (r["mimeType"] is DBNull) ?
         string.Empty : (string)r["mimeType"];
         if (dfi.MimeType != string.Empty)
          context.Response.ContentType = dfi.MimeType;
       dfi.FileName = (r["fileName"] is DBNull) ?
         string.Empty : (string)r["fileName"];
       dfi.Extention = (r["extention"] is DBNull) ?
         string.Empty : (string)r["extention"];
       context.Response.AddHeader("Content-Disposition",
         string.Format("inline;filename={0}{1}",
             dfi.FileName, dfi.Extention));
       context.Response.AddHeader("Content-Length",
         dfi.Length.ToString());
         int dataOrdinal = r.GetOrdinal("data");
         if (!(r[dataOrdinal] is DBNull))
           StreamByteArrayInChunks(r, dataOrdinal, context);
    }
}
   public void StreamByteArrayInChunks(IDataReader r, int ordinal,
         HttpContext context)
   {
      //102400 is 100kb at a time buffer
      byte[] buffer = new byte[102400];
      int index = 0;
      while (true)
    {
         long count = r.GetBytes(ordinal,
       index, buffer, 0, buffer.Length);
         if (count == 0)
       {
            break;
       }
         else
       {
          index = index + (int)count;
            context.Response.BinaryWrite(buffer);
            context.Response.Flush();
         }
      }
      //closes datareader + underlying db connection
    r.Close();
}
   // Override the IsReusable property.
   public bool IsReusable
   {
      get { return false; }
}
}

 A few things i will explain quickly from the above code is how we retrieve an IDataReader from our DAL instead of retrieving a stream. The reason is i want to retrieve the filestream in chunks from the DataReader and every chunk i read from the database, i want to write it out to the client, as i receive the chunks without storing them in memory on my webserver where my application resides.

Another thing you may want to notice is the usage of the Http Header Content-Disposition and Content-Length ; basically, it was important for me to flush out the file with the original filename when the user choses to save the file and Content-Disposition helped there. Whereas Content-Length helped ensure the filesize on the receiving end (the client) since without it i was getting errors unzipping a file for example where winzip complained that the file was corrupt :-)

So that's pretty much it. And oh, the method in my DAL that returns the IDataReader follows :

public override IDataReader GetFileById(int fileId)
{
   SqlConnection myConnection = GetSqlConnection();
   myConnection.Open();
   SqlCommand cmd = new SqlCommand(DbOwner +
      ".DataFile_GetFileById", myConnection);
   cmd.CommandType = CommandType.StoredProcedure;
   cmd.Parameters.Add("@fileId",
      SqlDbType.Int, 4).Value = fileId;
   SqlDataReader r = cmd.ExecuteReader(
      CommandBehavior.CloseConnection);
   return r;
}


 With that i think we have succeeded in conserving resources on both our database server and our Web server. One last thing, when you got this all working, experiment by changing the buffer size and regulate it to the write speed that is acceptable as per your standards. Basically from my tests on my dev machine, i was getting a download speed of 500KB - 517KB/sec with a buffer size of 100kb (102400 bytes). While setting the same buffer size to 10kb (10240bytes), i was getting a download speed of 50KB - 60KB/sec ;

These tests are ofcourse based on what i see as i request a file handled through my custom httphandler above. These values are more or less the same i reckon when streaming the same data to sql server in chunks or a good approximation, so use that as a starting point to regulate the desired speed/latency required by your app based on how much resources your webserver/database server has. Normally, uploads are usually restricted to a certain amount by most ISP's so it means you can set a much smaller chunk size when streaming from your application to sql server. And increase the chunk size when streaming from your webserver to the client (since your requesting clients download speeds are significantly high). It will also depend on the bandwidth availability of your webserver and other factors ofcourse, but that's off topic.

Update 23  september  2008 :

One last thing i forgot to mention but my friend filip duyck brought up is that now since we are sending the data in chunks to sql server (whilst reusing the same connection), it's still additional hits to our database server. So consider that too while you evaluate how big you want your chunks to be.

3 comments:

  1. Petar! That totally slipped my mind. I know i was considering it at some point, just weren't there yet :-)

    Many thanks

    ReplyDelete
  2. petar from http://ppetrov.wordpress.com/ made a comment that got lost during the migration process to blogger, so here it is again:

    I think you should consider using a SqlTransaction.

    Probably something like this is more robust :

    public static void InsertFile(string applicationName, Stream data,

    string mimeType, string fileName)

    {

    SqlTransaction transaction = null;

    try

    {

    using (SqlConnection myConnection = GetSqlConnection())

    {

    myConnection.Open();

    transaction = myConnection.BeginTransaction();

    using (SqlCommand cmd = new SqlCommand("DbOwner" + ".DataFile_Insert", myConnection, transaction))

    {

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.Add("@applicationName", SqlDbType.NVarChar, 256).Value = applicationName;

    cmd.Parameters.Add("@mimeType", SqlDbType.NVarChar, 50).Value = mimeType;

    cmd.Parameters.Add("@length", SqlDbType.Int, 4).Value = data.Length;

    cmd.Parameters.Add("@fileName", SqlDbType.NVarChar, 256).Value = fileName;

    SqlParameter fileIdParam = cmd.Parameters.Add("@fileId", SqlDbType.Int, 4);

    fileIdParam.Direction = ParameterDirection.Output;

    cmd.ExecuteNonQuery();

    var fileId = (int)fileIdParam.Value;

    InsertFileByChunks(cmd, fileId, data, 128);

    }

    transaction.Commit();

    }

    }

    catch (SqlException sqlException)

    {

    if (transaction != null)

    {

    transaction.Rollback();

    }

    // ... Log the exception

    }

    finally

    {

    if (transaction != null)

    {

    transaction.Dispose();

    }

    }

    }

    private static void InsertFileByChunks(SqlCommand cmd, int fileId, Stream data, int bufferLen)

    {

    cmd.CommandText = "DbOwner" + ".DataFile_InsertChunk";

    cmd.CommandType = CommandType.StoredProcedure;

    cmd.Parameters.Add("@fileId", SqlDbType.Int, 4).Value = fileId;

    SqlParameter paramData = cmd.Parameters.Add("@data", SqlDbType.VarBinary, 128);

    SqlParameter paramOffset = cmd.Parameters.Add("@offset", SqlDbType.BigInt);

    cmd.Parameters.Add("@length", SqlDbType.Int, 4).Value = bufferLen;

    using (BinaryReader br = new BinaryReader(data))

    {

    byte[] buffer = br.ReadBytes(bufferLen);

    int offset = 0;

    while (buffer.Length > 0)

    {

    paramData.Value = buffer;

    paramOffset.Value = offset;

    cmd.ExecuteNonQuery();

    offset += bufferLen;

    buffer = br.ReadBytes(bufferLen);

    }

    }

    }

    ReplyDelete