Check page broken link using VB script
Posted by rajivkumarnandvani on January 5, 2010
Hi,
I created a script that will check broken link in a page means valid url is that or not by using Microsoft.XMLHTTP object
rem create link object description
Set alllinkob = Description.Create()
alllinkob(“micclass”).value =”Link”
alllinkob(“html tag”).value =”A”
rem get alllink objects
set objAlllinkObj = Browser(“Google”).Page(“Google”).ChildObjects(alllinkob)
rem run a loop as per link on the page
For a =0 to objAlllinkObj.count-1
rem get link url
url = objAlllinkObj(a).getroproperty(“url”)
rem call function
call geturlstatus(url )
Next
Set objAlllinkObj = nothing
rem Clear browser cache
Public Function ClearBrowserCache()
On Error Resume Next
rem TEMPORARY_INTERNET_FILES file path in system
Const TEMPORARY_INTERNET_FILES = 32’&H20& ‘Decimal 32 is equivalent to hex value &H20&
rem create file system object
Set objCacheFSO = CreateObject(“Scripting.FileSystemObject”)
rem create shell application object
Set objShell = CreateObject(“Shell.Application”)
rem create temporary folder path object
Set objFolder = objShell.Namespace(TEMPORARY_INTERNET_FILES)
Rem delete all file under temporary folder
objCacheFSO.DeleteFile(objFolder.Self.Path & “\*.*”)
Rem get cache folder path..
sPath = objCacheFSO.GetFolder(objFolder.Self.path) & “\Content.IE5\”
Rem create cache folder object..
Set objFolders = objCacheFSO.GetFolder(sPath)
For Each objFName In objFolders.SubFolders
‘WScript.Echo sPath & objFName.Name
objCacheFSO.DeleteFolder sPath & objFName.Name, True
Next
ClearBrowserCache= True
Set objFolder=Nothing
Set objShell=Nothing
Set objCacheFSO=Nothing
err.clear
End Function
Public Function geturlstatus(url )
On Error Resume Next
Call ClearBrowserCache()
Set webService = nothing
Set webService= CreateObject(“Microsoft.XMLHTTP”)
webService.open “GET”, url, False
webService.Send
pagestatus = webService.status
If pagestatus < 200 or pagestatus >399 Then
print “In valid request “& pagestatus &” ” & url
geturlstatus = 0
else
geturlstatus = 1
print “valid request “& pagestatus &” ” & url
End If
Set webService = nothing
err.clear
End Function
rem __________________________________________
url = “http://en.wikipedia.org/wiki/List_of_HTTP_status_codes”
call geturlstatus(url )
Jayachandra said
Hi Rajiv,
Nice Post but when i tried to execute it is not working and can you explain in each term so that it would be easy to understand the coding .
thnxs in advance..
rajivkumarnandvani said
Hi jay,
Sure i will update the description of each step with in few days becoz this week very tight schedule.
sumantu said
its a nice program but can u please explain each of the steps in ClearBrowserCache()
function so that i can understand the code better,further may i ask is there any other way to identify broken links?
thanks in advance.
Sumantu
rajivkumarnandvani said
Hi sumantu,
i am just deleting the cache folder by finding it path.Instead of that you use QTP accessibility checkpoint which is automatically check broken link for more information uou can refer QTP help file.
Thanks and best of LUCK..
chinnu said
Hi,
Good to see,having one doubt here
Set webService= CreateObject(“Microsoft.XMLHTTP”)
webService.open “GET”, url, False
webService.Send
pagestatus = webService.status
what those 3 lines will do and i have excuted by passing an url but i got invalid repot.
when i observed that pagestatus is giving empty.
please tell how and what way i need to execute
thanks,
chinnu
rajivkumarnandvani said
Hi chinnu,
This service just send the URL and check the status code return by response
Are you passing the URL variable value ?
like this
Public Function geturlstatus(url )
On Error Resume Next
Call ClearBrowserCache()
Set webService = nothing
Set webService= CreateObject(“Microsoft.XMLHTTP”)
webService.open “GET”, url, False
webService.Send
pagestatus = webService.status
If pagestatus > 399 Then
print “In valid request “& pagestatus &” ” & url
geturlstatus = 0
else
geturlstatus = 1
print “valid request “& pagestatus &” ” & url
End If
Set webService = nothing
err.clear
End Function
rem __________________________________________
url = “http://en.wikipedia.org/wiki/List_of_HTTP_status_codes”
call geturlstatus(url )
For more information you can refer the below mentioned link
http://en.wikipedia.org/wiki/XMLHttpRequest
http://www.jibbering.com/2002/4/httprequest.html
chinnu9999 said
Thanks a lot,
i have one more doubt here,suppose am running my application in the hidden mode and i need to get all the child objects from the application,will it work?
i tried to run my application using IE in the hidden mode ,it was working but am not able to get child objects count …givinig an error “General error”
is there any way to identify the objects from the application?
rajivkumarnandvani said
Hi chinnu,
I am not clear with this. can you explain how are you running the application in hidden mode using IE. I think without create Browser/page object you could not get the child objects.
chinnu9999 said
sORRY RAJIV,
SystemUtil.run “iexplorer”,”myurl”,””,””,0
set child=Browser().page().childobjects()
msgbox(child.count)
this is the code…am trying to check using task manager …iE running in hidden mode…but line 2 giving general error
Thanks,
chinnu
rajivkumarnandvani said
hi chinnu,
are you writing this child=Browser().page().childobjects() msgbox(child.count)
in bracket() is something or blank .if blank you have to mention the object properties
Set alllinkob = Description.Create()
alllinkob(“micclass”).value =”Link”
alllinkob(“html tag”).value =”A”
rem get alllink objects
set objAlllinkObj = Browser(“Google”).Page(“Google”).ChildObjects(alllinkob)
chinnu9999 said
hi rajiv,
in bracket i mentioned object properties
set child=Browser(“title:=myname”).page(“title:=myname”).).childobjects()
msgbox(child.count)
here my intension is during the application is running in hidden mode…is ther any way to get all the child objects from the application?
rajivkumarnandvani said
hi,
if u create the browser and page object as u mentioned in previous reply then you will get the all child objects. when you getting general error ??
are u providing the child objects properties??
Please mention your full script what are you doing?
refer this link how to use child object/pass child object
chinnu9999 said
Hi Rajiv,
SystemUtil.run “iexplorer”,”myurl”,””,””,0
Dim mch:Set mch=Description.Create
mch(“micclass”).Value=”Link”
mch(“html tag”).Value=”A”
set child=Browser(“title:=myname”).page(“title:=myname”).childobjects(mch)
msgbox(child.count)
chinnu9999 said
Getting error here
set child=Browser(“title:=myname”).page(“title:=myname”).childobjects(mch)
kalsy said
right syntax is
Set mch=Description.Create()
you forgot paratheses.
rajivkumarnandvani said
thanks.
chinnu9999 said
Hi kalsy,
that is not the problem,even if u try as your way also,same error getting
Thanks,
chinnu
chinnu9999 said
i think you are not undestanding the problem,here actullay problem application is running in the hidden mode..check ‘0’ in the systom util.run
SystemUtil.run “iexplorer”,”myurl”,””,””,0
rajivkumarnandvani said
oh.. sorry ,finally i got your problem(communication gap 😉 ). Ya in hidden mode, i also checked that you can not the get child objects even you can not perform any operation.
but actual meaning of the mode is
Mode Description
0 Hides the window and activates another window.
no idea regarding this thanks for update me. i will search on that is any other way to do this.
vijay bhagwat said
Hi Rajiv,
Can you explain
webService.open “GET”, url, False
rajivkumarnandvani said
just read the comments of this post you will get your answers. 😉
For more information you can refer the below mentioned link
http://en.wikipedia.org/wiki/XMLHttpRequest
http://www.jibbering.com/2002/4/httprequest.html
vijay bhagwat said
Thanks Rajiv
Emily said
Hi Rajiv,
Thanks a lot for the code! I copied it and ran it from QTP. I added one line of code of ” print “PageStatus is ” & pagestatus “, but it shows empty for pagestatus variable. Would you please help me? Thanks again!!
rajivkumarnandvani said
Hi Emily,
Thanks.in that case you have to debug the code.
First you check did you getting the all chilld links( url of links).[ check url variable value ]
also check is ur script is running fine for single given url.
Best of luck 🙂
Emily said
Thanks for the response, Rajiv! I did get all the child links , it is running fine for single given url. But if I passed an invalid url such as “http://www.abcde.fff”, it did not trigger “print “In valid request “& pagestatus &” ” & url” condition, that was why I added the debug code of “print “PageStatus is ” & pagestatus” but it returned empty for pagestatus. I am puzzled….
Ravi said
Hi Rajiv,
Great article and thanks for sharing the code. I made some minor tweaks to the code to identify broken links and works fine for some of the websites. Its not helping when the link created with a javascript or dynamic link. I see the result as “In valid request 0 javascript:void();”
Can you please help me in this regard.
Thanks in advance.
Ravi
rajivkumarnandvani said
Hi Ravi,
Thanks. in that case you can add a check for url text javascript:void(0).like
url = objAlllinkObj(a).getroproperty(“url”)
if url = “javascript:void(0)” then
do not call the function
else
call the function
end
Ravi said
Hi Rajiv,
Thanks for the quick response and code snippet for the exception handling 🙂
Actually I was looking for a way to retrieve the url for these kind links(dynamic/javascript links) using the geturlstatus function.
Also can you please share your contact number to call you in this regard, if thats ok with you.
Thanks in advance.
Ravi
rajivkumarnandvani said
Hi Ravi,
for extracting the url from dynamic/javascript links using this function will not work . I think in that case you use the QTP with Dom object get all the links & click on each link then check the page and browser back
rahul said
Thanks Rajiv for the code snippet.
I was looking 4 something like this where i don;t have to click on every link and then back button to come back to the original page. I still don;t know much about Microsoft XMLHTTP so would go through it properly.
But other wise, i was running the code for yahoo page and i found most links returned 0(zero) which can;t be the case so where is the code failing … can u check ?
Thanks Again ..
Rahul
rajivkumarnandvani said
Hi Rahul,
Thanks for reading the blog. It was giving zero when the link url contains the javascript because when we are getting the (url = objAlllinkObj(a).getroproperty(“url”) ) value. you can use the print statement for which url value it is giving the error. You can exclude those urls if found javascript in url value using the if else statement.
Thanks
Archana said
Thanks Rajiv for the snippet. It was very useful. Had couple of queries.
1) This reports an invalid URL if we do not specify the protocol (e.g. if I give http://www.google.com instead of http://www.google.com). Have you checked this or I am doing something incorrect at my end?
2) When you navigate from http to an https URL (i.e unsecure to secure), this script reports the secure url as invalid. Again, have you observed this?
Thanks in advance,
Archana.
rajivkumarnandvani said
Hi Archna,
Thanks for reading the blog.
1.) Yes you have to provide the protocol(http://www.google.co.in/) else it will not work.
2.) No its is working fine for me I have checked.
may be your IE browser Native XMLHTTP is not enabled. refer the below link.
http://msdn.microsoft.com/en-us/library/ms537505%28v=vs.85%29.aspx#what
Public Function geturlstatus(url )
On Error Resume Next
Call ClearBrowserCache()
Set webService = nothing
Set webService= CreateObject(“Microsoft.XMLHTTP”)
webService.open “GET”, url, False
webService.Send
pagestatus = webService.status
If pagestatus 399 Then
msgbox “In valid request “& pagestatus &” ” & url
geturlstatus = 0
else
geturlstatus = 1
msgbox “valid request “& pagestatus &” ” & url
End If
Set webService = nothing
err.clear
End Function
call geturlstatus(“https://inet.idbibank.co.in/”)
Thanks 🙂
Archana said
Hi Rajiv,
Thanks for the quick response. Point 1 is OK.In point 2, what I was trying to say was, assume your URL was “http://www.google.com”
so call geturlstatus(“http://www.google.com/”)
Now assume that inside Google’s page there is a secure link, e.g. https://inet.idbibank.co.in/
so, does the script now print “valid request” for the IDBI link? we were getting an “invalid request” for such a scenario.
rajivkumarnandvani said
Hi,
There is no point wherever you call becuase for each url we are calling the
function and releasing. It will work.
you can check what url you are getting for https using msgbox like. may be
you are getting invalid URL. please check it again
url = objAlllinkObj(a).getroproperty(url)
rem call function
msgbox url
call geturlstatus(url )
Archana said
Hi,
Will try this out and see. Thanks for your useful inputs.
Kotesh said
Hi Mohan,
I need your help in QTP . I have so many reports which generates in different format (.html, XLS, PDF and CSV). I have to compare this reports each and summarize the report in XLS in the below way summarize
1) How many reports having conflict ##
2) Report1 having conflicts ##
3) Report2 —do———- ##
Whenb I Click on the Num ## It should go to specific report
Baranidharan said
Hi Rajiv,
Thanks for your code snippet.. I would need help on this as its not working for me. i.e., When the QTP executes the step Webservice.Send , it throws the error message “No such Interface is supported”. And I’m using IE 7.0 and QTP 11. Kindly require your support on this.
Thanks,
Barani
rajivkumarnandvani said
Hi,
It could be a prob of xml http object that is not supported by ur environemnt
Please try with another http object
Set webService= CreateObject(“MSXML2.XMLHTTP.3.0″)
Please have a look
http://www.jibbering.com/2002/4/httprequest.html
http://www.visualbasicscript.com/CreateObjectquotMicrosoftXMLHTTPquot-m53045.aspx
Baranidharan said
Hi Rajiv,
Thanks for your updates..
I found that the issue still remains the same..
I found that the above code works good if the the URL is pretty straight forward and starts with “http or https..” But my URL is a dynamic one. for eg., URL = javascript:DisplayDocument(“02A6A252C52394ABD67C4C3B8C3F0A9804CD4CC6B50040F4536708CE527A261E4A968EFA862B0ECF8E48160420650482A456AAA0EDB0D77E8CB0FAA40D743D6A6C5BF48745FBDD2608D35919E34C34A46668198D63BE3511B0A758E4B7D50C287D8B2B791C74E258″,”../”).
If this is the case, I’m not sure about this usage
Kindly let me for any updates,
Thanks,
Barani
rajivkumarnandvani said
Yes, you are right. You can not check for dynamic urls
Baranidharan said
Hi Rajiv,
i got it resolved..
Just to update you..
I contacted dev team to know more on this..and finally i came to know that the above mentioned GUID need to be appended to the following url which resolved the issue
“http://qa-atvantage.lexisnexis.com/DisplayNewsArticle.aspx?documentid=…”
Thanks,
Barani
sandeep said
Thanks Everyone especially Rajiv for your code snippet..This forum is really helpful.Love to read more on this.
rajivkumarnandvani said
Thanks sandeep for reading the blog 🙂
loungr said
i have a question ? what’s the importance of removing Cache ?