Close Search Box
Search Box

Search: From:

Close
Newsletter

9Tutorials to your Inbox



Get All URLs on a Page

Get All URLs on a Page
Author lv1 (3100/5000)
1,972 views
1 Star2 Star3Star4 Star5 Star (1 votes, average: 5 out of 5)

In this article, I show a class that can be used to find and display all of the urls on a web page. What for you may ask? Well, in my experience as a web developer, I have found a class like this to be very useful. Sometimes, you may want to use this class a a basis for a more complex application that crawls your site checking for bad or broken links. In other cases, you may simply want to check an individual page to make sure your links are formatted correctly, or don’t contain any obsolete pages. You could also easily change this class to look for other items within your page, like specific text or tags. Who knows, this may be the start of a specialized spider that crawls sites on the internet looking for something specific.

I think you get the picture. Of course, to make this class do all those wonderful things, you would have to expand on what I am presenting here. However, I believe this is a good start. The class has one public method - RetrieveUrls. The method calls two private methods. The RetrieveContents method will issue a request to the web page, and retreive the contents. The GetAllUrls method will use a regular expression to find all of the urls on the page. This method writes the matches to the screen, as well as saving them in a log file. Of course, if you prefer, you could modify the method to save the matches somewhere else, like an array or a database table.

Using the code

The class is listed below. Have fun!

del.icio.us:Get All URLs on a Page digg:Get All URLs on a Page spurl:Get All URLs on a Page newsvine:Get All URLs on a Page blinklist:Get All URLs on a Page furl:Get All URLs on a Page reddit:Get All URLs on a Page blogmarks:Get All URLs on a Page Y!:Get All URLs on a Page magnolia:Get All URLs on a Page segnalo:Get All URLs on a Page

Post a Comment »








Safari hates me

Comment Guidelines

  • Hyperlinks are automatically generated.
  • <em>italic</em>
  • <strong>bold</strong>