•      Powered by
 

Using the IsMatch method in Regular Expressions to screen scrape a webpage

by Steve Schofield

This code-tip I discovered when developing a webservice to 'screen scrape' a webpage to determine if a certain text phrase was present.  Regular expressions are best suited for achieving this task however they aren't the easiest to learn.  The System.Text.RegularExpressions namespace in .NET 2.0 has a handy function called IsMatch that achieves what I wanted.   The code snippet below accepts two arguments (the URL to monitor, the Text to search for), makes an HTTP request and reads the webpage into a stream.  The stream is searched for the text passed into the method.  The one thing I discovered when using the 'IsMatch' method the text is case and space sensitive.  For example, if you are searching 'http://www.iislogs.com' for text in the Title of the page, searching for 'IIS Logs - ' is what would exactly be searched for.  I hope this example helps in your Regular Expressions adventure,  happy coding!

Public Function URLListed(ByVal URL As String, ByVal strArgument As String) As String
        Dim blnListed As String
        blnListed = readWebPage(URL, strArgument)
        Return blnListed
End Function

Private Function readWebPage(ByVal strSource As String, ByVal strArgument As String) As String
        Dim strLine As String
        Dim objSR As System.IO.StreamReader = Nothing
        Dim objResponse As WebResponse = Nothing
        Dim objRequest As WebRequest = System.Net.HttpWebRequest.Create(strSource)
       
        Try
                objResponse = objRequest.GetResponse
                objSR = New System.IO.StreamReader(objResponse.GetResponseStream(), System.Text.Encoding.ASCII)
               
                Do While objSR.EndOfStream = False
                        strLine = objSR.ReadLine()
                        If Regex.IsMatch(strLine, strArgument) Then
                                Return "Listed"
                                Exit Function
                        End If
                Loop

        objSR.Close()
        objResponse.Close()
        Return "Not Listed"

    Catch f As Exception
           Return "Errored:" & f.Message.ToString()
    End Try
End Function
 

tio

Terms of Use | Privacy Statement ©2005-2006 IISLogs.com. All rights reserved - Powered by IIS7 - info @ www.IIS.net