Wednesday, July 25, 2007

Using Excel documents as a data source in ASP.Net

This is something that I mentioned last blog, finally I've gotten around to posting it, on a side note, I'm also trying out the new 'blogging' feature in Word 2007, go to the office button, click new, and start a new blog post, it'll run through setting it up, and then away you go.. hopefully it works!

The scenario I had, was that we have a spreadsheet setup for collecting payments for the 'bar tab' we have for our social committee, who sell chocolates, chips, beer etc at a low cost. This is tallied up on a spreadsheet, and an email is sent around with amounts. I thought it would be good to have this data on each individual users intranet homepage. So I set about making a web part for it.

We didn't want to change the spreadsheet to an ASP form, as it was working fine as it is, and I didn't have the time to code one and teach the social committee treasurer how to use it. and I didn't have the time to code one and teach the social committee treasurer how to use it. So I thought, surely there is a way to extract the data out of an excel spreadsheet, and sure enough there is.

First off, I setup my cache as mentioned in my previous post 'using cache to determine a time for content reloading' so it only queries the excel spreadsheet once every 12 hours, so if the cache is still there, it doesn't get the new data, if not then it gets new data and adds something to cache. I decided that instead of having the data in cache, which could be deleted by IIS, and would require regular expressions to query properly, I would store it in a SQL table. So effectively I was synchronizing a couple of columns out of an excel spreadsheet with a SQL table.

I setup my SQL table, and then set about talking to the excel sheet. I stored it under the a folder on the server, and just gave the social committee people permissions to access and edit it – which probably isn't the most secure approach, but it was fine for what I am doing, and fairly idiot proof.

To talk to the table, you set up an 'OleDbConnection', I setup mine as follows:

Dim DBConnection = New OleDbConnection( _

"Provider=Microsoft.Jet.OLEDB.4.0;" & _

"Data Source=" & Server.MapPath("~/SocCom/collection.xls") & ";" & _

"Extended Properties=""Excel 8.0;HDR=Yes""")

There I've got my data source as a mapped path to the xls file, and had its properties as an excel file. It connects using the Microsoft jet engine – similar to Access.

From then, I added a Try/Catch statement to catch errors which occur when someone is editing the file while it is trying to be accessed.

Try

DBConnection.Open()

Dim SQLString As String = "SELECT [Name], [Balance] FROM [Collection Summary$] WHERE [Name] is not null and [Balance] is not null"

Dim DBCommand = New OleDbCommand(SQLString, DBConnection)

Dim DBReader As Data.IDataReader = DBCommand.ExecuteReader()

There I've opened the connection, and selected the columns with 'Name' and 'Balance' in the topmost cell – they need to be enclosed in square brackets. The From part of the SQL statement has [Collection Summary$] as that is the sheet name, the $ sign is required at the end (though my sheet name is only 'Collection Summary').

Also take note that I have added a where statement that does not show Null values of each column, this is important, as if you do not do this, then it will take more than 5 seconds to do the query, and will return a whole heap of null values, as it queries all the values in the spreadsheet, null cell or not.

That will return like a normal SQL query, you could bind it to a data grid, or whatever you like, in my case, I first deleted all the old values from my SQL table (I could have updated them – however employees leave and join, and spelling mistakes are corrected etc, so its better just to start a fresh each time) then did a 'While DBReader.Read' to loop through the values, writing them to variables, and inserting each to the database as it went.

At the end of that loop, I closed the DBReader, and DBConnection. Then added my catch statement:

Catch ex As System.Data.OleDb.OleDbException

Finally End Try

ReadFromDB()

The exception was the System.Data.OleDB.OleDbException – which I tested was when the spreadsheet was already in use. As you can see I did nothing on the event of an error – as it doesn't matter if you can't read the data, I just didn't want the page to spit out an error – I wanted it to continue, which it does, and goes to the next line of code which is a call to my 'ReadFromDB()' sub procedure, which queries the database to get the data from the SQL table. So if there is an error, it goes to the ReadFromDB Sub, if not, it still goes there.

It then reads from that and prints my data. Done!

Edit: As a side note on my blogging with Word experiment, it worked great, however the formatted text I copied from Visual Studio 2005 added extra spaces, which is because it shows up as huge text in my editing menu, so each line break is size 72 or whatever it is. Other than that, worked brilliantly and was very easy.

Saturday, July 7, 2007

ASP.Net: Using cache to determine a time for content reloading (rather than reading data from cache)

I have been experimenting with this recently, ASP.Net has a good, simple, quick caching model. Cache.Insert("PageReloadTimer", "Yes", Nothing, Datetime.Now.Addminutes(5), Timespan.Zero) will give you a cache of the word "Yes", named PageReloadtimer, with no dependencies, for 5 minutes until it is deleted. Caching is of course great for optimizing performance, as you can cache data instead of getting it from the source, which is great for screen scrapes and other data feeds. However, it could also be used as a simple timer, I'm sure there are other ways to do this, but this seems simple enough (comments are appreciated though). The scenario I had, is on my companies local intranet, we want to display data from an excel file, as you will see in my next post, excel files can be used in ASP.net as a datasource, just as if it were an SQL database. However the performance is obviously reduced (if you don't use a where statement pages take a very long time to load - so a where statement is essential, even if it just eliminates nulls). I decided we would like this to be stored in our SQL database, for easy querying, and so it will be backed up, as it was financial data. So I made the page query the excel sheet, then insert all of the data into our SQL database, I decided this only needs to be done every say, 12 hours, so, to tell it to do this I simply did a Cache.Insert("PageReloadTimer", "Yes", Nothering, Datetime.Now.Addhours(12), Timespan.zero) then added an if statement to the start of the excel query, If Not Cache("PageReloadTimer") is nothing then..... This skips the excel loading, and goes straight to reading the data from the SQL database. Works perfectly. I could even change the "Yes" which doesnt really mean anything, to a timestamp of when it was updated, allowing me to add "this information was last updated on ..." to the page. I could of course, have cached all the actual data, and used regular expressions to query it, but that wouldn't have given me the advantages of an SQL database, and would have been extra unnecessary coding. You can read more on caching at 4 Guys from Rolla, and OnDotnet.