Transforming TXT Files into XML Using Linq to Xml (XLinq)

 

Some weeks ago I started working on a little code sample to demonstrate the Xml transformation capabilities of Linq to Xml (aka XLinq). The code was originally intended to be used during a demo on a major developer’s conference here in Brasil (http://www.baboo.com.br/absolutenm/templates/content.asp?articleid=25189&zoneid=224). I decided to post this sample along with some explanations here, mainly because it seemed to have caught the attention of some folks who attended my session. So please bear with me and give yourself a chance to fall in love with this wonderful technology as I did as soon as I started working with it.

 

Our goal here is to take a group of log files from IIS (Internet Information Services) and extract some analytical page access information from them. The log files generated by IIS are stored in the \%WINDIR%\System32\Logfiles\W3SVC1, as the following picture shows:

 

 

 

 

  

Well, it seems to be like a lot of work, right? Yes! It is. But as we are going to use LINQ to accomplish our goal, most part of the complexity will be abstracted from us, the developers. We are going to employ a query semantics that will turn the code much simpler that its procedural counterpart would be. Besides that, the Linq to Xml API will make the transformation to the Xml format a very natural task.

 

Before showing you the code, let me say that although my language of choice is C#, I decided, for the purpose of this demo, to write this transformation using VB 9. The main motivation behind that decision was the fact that VB 9 will support the concept of Xml Literals and Xml axis members, which still don’t have a correspondence in C# 3.0. Maybe those concepts will be incorporated in C# 3.0 as well, but the decision is up to Microsoft and we don’t have a definitive position until now. That said, let’s see the code:

 

 

 

        Dim xmlContent As XElement = _

            <IISLog>

                <%= _

                From logFile In New DirectoryInfo(Me.LogFilesDirectory).GetFiles() _

                    Select GetXmlFromLogFile(logFile) _

                %>

            </IISLog>

 

        Dim summary As XElement = _

            <Summary>

                <%= _

                From entry In xmlContent...<Entry> _

                    Where entry.<Status>.Value = "200" _

                    Group By entry.<Url>.Value _

                    Select _

                        <Entry>

                            <Url><%= It.Key %></Url><Hits><%= Count(It) %></Hits>

                        </Entry> _

                %>

            </Summary>

 

        Return summary

 

    End Function

 

    Private Function GetXmlFromLogFile(ByVal logFile As FileInfo) As XElement

 

        Dim sr As StreamReader = New StreamReader(logFile.FullName)

        Dim fileContent As String = sr.ReadToEnd()

 

        Dim logIis = _

            <Date Id=<%= logFile.CreationTime() %>>

                <%= _

                From line In fileContent.Split(Environment.NewLine) _

                    Where Not line.StartsWith("#") _

                    Select _

                        <Entry>

                            <Time>

                                <%= line.Split(" ").Skip(0).Take(1) %>

                            </Time>

                            <Ip>

                                <%= line.Split(" ").Skip(1).Take(1) %>

                            </Ip>

                            <Url>

                                <%= line.Split(" ").Skip(3).Take(1) %>

                            </Url>

                            <Status>

                                <%= line.Split(" ").Skip(4).Take(1) %>

                            </Status>

                        </Entry> _

                %>

            </Date>

 

        Return logIis

 

    End Function

 

This code is all we need to get the work done. Impressive, isn’t it? The magic lays on the set semantics we are employing here by means of the Language Integrated Query features. The creation of the final Xml document is also facilitated by the Xml literal features of VB 9. In a future post I will show how this code could be written in C# 3.0, which has some conceptual differences that were brilliantly pointed out by Anders Hejlsberg and Amanda Silver in this post that I started at the XLinq MSDN Forum: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=574140&SiteID=1.

 

Notice that the code above accomplishes its task in two phases. The first phase creates a plain Xml document from the TXT file, which is stored into the xmlContent variable, which is of type XElement. The second phase takes this Xml fragment and transforms it into another document (summary), this time containing the summary information that composes the final Xml format.

 

At the end of the demo, I showed a Windows Forms application written in C# that queries the Xml document created by the previous code and plots a bar chart with the selected files and their specific hits count. For example, the following query would show a chart as shown in the figure below:

 

var succefullRequests =

   from entry in log.Elements("Entry")

   where entry.Element("Url").Value.EndsWith("aspx") &&

         !entry.Element("Url").Value

                              .EndsWith

                               (

                                        "login.aspx",

                                        StringComparison.CurrentCultureIgnoreCase

                               )

   select new

   {

      Url = entry.Element("Url").Value,

      Hits = (int) entry.Element("Hits")

   };

 

 

var top10 = (from request in succefullRequests

             orderby request.Hits ascending

             select request).Take(10);

 

 

 

 

The first Linq query above gets all the aspx pages from the Xml document, taking off the Login.aspx page. After that, the result of this query is used in the second query where only the top 10 most accessed pages are retrieved. This result set is finally plotted onto the chart. Notice that the first query is projecting an anonymous class that has two properties: Url and Hits. This clearly shows the flexibility we will have when using Linq and Linq to Xml in the near future, when this technology finally gets released.

 

The chart in the figure was created using pure GDI+ code. Shame on me, because I didn`t have enough competency to make it a WPF code. Maybe in the future can I take some time and do this.

 

That’s all for this post. I hope you have gotten some interest in this subject of Linq and Linq to Xml and also that I could have shown an interesting example of how these technologies will change the way we write (and read) code in the future.

 

Thanks for your time!

 

    Private Function GenerateXmlLog() As XElement

 

 

Each file stores information about the requests received by IIS for a given web site on a given day. The content of each file looks something like this:

 

 

The lines beginning with the # char are just comments. Each other line represents a specific hit to a web server resource, specifying the time the request occurred, the IP address of the requesting computer, the HTTP method used (GET, POST, etc), the resource location and finally the HTTP status code of the request (200 for a successful request, 404 for a page not found status, etc).

 

Our intent is to read all the lines of each of the files existing in the directory, and count the number successfully accesses of each file, producing a resulting Xml document that is similar to the one shown below:

 

 

 

#1 Guilherme Magalhaes on 5.12.2009 at 4:19 PM

Iaí Fábio! Estava procurando exatamente como usar LINQ com logs to IIS e caí aqui :)

Valeu pelo artigo!

Abraços,

Guilherme

#2 john on 5.14.2009 at 9:29 PM

Thanks for the great reference post.

#3 Sulumits Retsambew on 8.08.2009 at 8:17 AM

Interesting code info. Thankful.

#4 Italian Translation on 8.10.2009 at 3:10 PM

I was looking for that info nearly 4 days.Can I copy this article on my blog?If so,please mail me,and we can exchange uesful articles in the future.

Regards,

Nick

#5 Italian lessons on 4.28.2010 at 7:14 AM

The code that you have provided is extremely useful and you have produced a very clear and precise article.

#6 Posicionamiento google on 7.07.2010 at 5:47 PM

Great code , was looking fot this.

#7 Iphone 4 cases on 8.01.2010 at 5:56 PM

I can see that you are putting a lot of time and effort into your blog and detailed articles! I am deeply in love with every single piece of information you post here (there are not many quality blogs left:).By,the way if you are looking for link exchange with quality iphone 4 related,please drop an email-I will be glad to add your link.

Regards,

Chris

#8 cuisinart coffee maker on 8.15.2010 at 5:01 AM

I love reading your post. Thanks bro.

Regards,

cuisinart coffee maker

#9 California Real Estate Lawyers on 8.21.2010 at 8:22 AM

Useful information shared..Iam very happy to read this article..thanks for giving us nice info.Fantastic walk-through. I appreciate this post.

#10 Optimize Windows on 8.31.2010 at 5:22 AM

Most part of the complexity will be abstracted from us, the developers. We are going to employ a query semantics that will turn the code much simpler that its procedural counterpart would be.

Leave a Comment