All Packages Class Hierarchy This Package Previous Next Index
|  | 
java.lang.Object | +----Acme.Spider
This is an Enumeration class that traverses the web starting at a given URL. It fetches HTML files and parses them for new URLs to look at. All files it encounters, HTML or otherwise, are returned by the nextElement() method as a URLConnection.
The traversal is breadth-first, and by default it is limited to files at or below the starting point - same protocol, hostname, and initial directory.
Because of the security restrictions on applets, this is currently only useful from applications.
Sample code:
 Enumeration spider = new Acme.Spider( "http://some.site.com/whatever/" );
 while ( spider.hasMoreElements() )
     {
     URLConnection conn = (URLConnection) spider.nextElement();
     // Then do whatever you like with conn:
     URL thisUrl = conn.getURL();
     String thisUrlStr = thisUrl.toExternalForm();
     String mimeType = conn.getContentType();
     long changed = conn.getLastModified();
     InputStream s = conn.getInputStream();
     // Etc. etc. etc., your code here.
     }
 Sample applications that use Acme.Spider:
 Fetch the software.
 Fetch the entire Acme package.
 
 
 done
	done
   err
	err
   todo
	todo
   
 Spider()
	Spider()
   Spider(int, int)
	Spider(int, int)
   Spider(int, int, PrintStream)
	Spider(int, int, PrintStream)
   Spider(PrintStream)
	Spider(PrintStream)
   Spider(String)
	Spider(String)
   Spider(String, PrintStream)
	Spider(String, PrintStream)
   
 addObserver(HtmlObserver)
	addObserver(HtmlObserver)
   addUrl(String)
	addUrl(String)
   brokenLink(String, String, String)
	brokenLink(String, String, String)
   doThisUrl(String, int, String)
	doThisUrl(String, int, String)
   gotAHREF(String, URL, Object)
	gotAHREF(String, URL, Object)
   gotAREAHREF(String, URL, Object)
	gotAREAHREF(String, URL, Object)
   gotBASEHREF(String, URL, Object)
	gotBASEHREF(String, URL, Object)
   gotBODYBACKGROUND(String, URL, Object)
	gotBODYBACKGROUND(String, URL, Object)
   gotFRAMESRC(String, URL, Object)
	gotFRAMESRC(String, URL, Object)
   gotIMGSRC(String, URL, Object)
	gotIMGSRC(String, URL, Object)
   gotLINKHREF(String, URL, Object)
	gotLINKHREF(String, URL, Object)
   hasMoreElements()
	hasMoreElements()
   main(String[])
	main(String[])
   nextElement()
	nextElement()
   reportError(String, String, String)
	reportError(String, String, String)
   setAuth(String)
	setAuth(String)
   
 err
err
protected PrintStream err
 todo
todo
protected Queue todo
 done
done
protected Hashtable done
 
 Spider
Spider
public Spider(PrintStream err)
 Spider
Spider
public Spider()
 Spider
Spider
 public Spider(String urlStr,
               PrintStream err) throws MalformedURLException
 Spider
Spider
public Spider(String urlStr) throws MalformedURLException
 Spider
Spider
 public Spider(int todoLimit,
               int doneLimit,
               PrintStream err)
Guesses at good values for an unlimited traversal: 200000 and 20000. You want the doneLimit pretty small because the hash-table gets checked for every URL, so it will be mostly in memory; the todo queue, on the other hand, is only accessed at the front and back, and so will be mostly paged out.
 Spider
Spider
 public Spider(int todoLimit,
               int doneLimit)
 
 addUrl
addUrl
public synchronized void addUrl(String urlStr) throws MalformedURLException
 setAuth
setAuth
public synchronized void setAuth(String auth_cookie)
Syntax is userid:password.
 addObserver
addObserver
public synchronized void addObserver(HtmlObserver observer)
Alternatively, if you want to add a different observer to each scanner, you can cast the input stream to a scanner and call its add routine, like so:
 InputStream s = conn.getInputStream();
 Acme.HtmlScanner scanner = (Acme.HtmlScanner) s;
 scanner.addObserver( this );
 
 doThisUrl
doThisUrl
 protected boolean doThisUrl(String thisUrlStr,
                             int depth,
                             String baseUrlStr)
 brokenLink
brokenLink
 protected void brokenLink(String fromUrlStr,
                           String toUrlStr,
                           String errmsg)
 reportError
reportError
 protected void reportError(String fromUrlStr,
                            String toUrlStr,
                            String errmsg)
 hasMoreElements
hasMoreElements
public synchronized boolean hasMoreElements()
 nextElement
nextElement
public synchronized Object nextElement()
 gotAHREF
gotAHREF
 public void gotAHREF(String urlStr,
                      URL contextUrl,
                      Object clientData)
 gotIMGSRC
gotIMGSRC
 public void gotIMGSRC(String urlStr,
                       URL contextUrl,
                       Object clientData)
 gotFRAMESRC
gotFRAMESRC
 public void gotFRAMESRC(String urlStr,
                         URL contextUrl,
                         Object clientData)
 gotBASEHREF
gotBASEHREF
 public void gotBASEHREF(String urlStr,
                         URL contextUrl,
                         Object clientData)
 gotAREAHREF
gotAREAHREF
 public void gotAREAHREF(String urlStr,
                         URL contextUrl,
                         Object clientData)
 gotLINKHREF
gotLINKHREF
 public void gotLINKHREF(String urlStr,
                         URL contextUrl,
                         Object clientData)
 gotBODYBACKGROUND
gotBODYBACKGROUND
 public void gotBODYBACKGROUND(String urlStr,
                               URL contextUrl,
                               Object clientData)
 main
main
public static void main(String args[])
All Packages Class Hierarchy This Package Previous Next Index
ACME Java ACME Labs