|
Using the IsMatch method in Regular Expressions to screen scrape a webpage
by Steve Schofield
This code-tip I discovered when developing a webservice to 'screen scrape' a
webpage to determine if a certain text phrase was present. Regular expressions
are best suited for achieving this task however they aren't the easiest to
learn. The
System.Text.RegularExpressions namespace in .NET 2.0 has a handy
function called IsMatch that achieves what I wanted. The code
snippet below accepts two arguments (the URL to monitor, the Text to search
for), makes an HTTP request and reads the webpage into a stream. The
stream is searched for the text passed into the method. The one thing I
discovered when using the 'IsMatch' method the text is case and space
sensitive. For example, if you are searching 'http://www.iislogs.com'
for text in the Title of the page, searching for 'IIS Logs - ' is what
would exactly be searched for. I hope this example helps in your Regular
Expressions adventure, happy coding!
Public Function URLListed(ByVal URL As String, ByVal strArgument As String)
As String
Dim blnListed As String
blnListed = readWebPage(URL,
strArgument)
Return blnListed
End Function
Private Function readWebPage(ByVal strSource As String, ByVal strArgument As
String) As String
Dim strLine As String
Dim objSR As System.IO.StreamReader =
Nothing
Dim objResponse As WebResponse =
Nothing
Dim objRequest As WebRequest =
System.Net.HttpWebRequest.Create(strSource)
Try
objResponse = objRequest.GetResponse
objSR = New System.IO.StreamReader(objResponse.GetResponseStream(),
System.Text.Encoding.ASCII)
Do While objSR.EndOfStream = False
strLine = objSR.ReadLine()
If Regex.IsMatch(strLine, strArgument) Then
Return "Listed"
Exit Function
End If
Loop
objSR.Close()
objResponse.Close()
Return "Not Listed"
Catch f As Exception
Return "Errored:" &
f.Message.ToString()
End Try
End Function |
|
|