4teachers.org Tech-along
Español Web archiving
for the Macintosh

by Jennifer Holvoet

 
What is Web Whacking?
 
Have you ever found the perfect Web site to share with your students, but when you have them go to the site, you get the dreaded message "File not found?" Are your Internet connections too slow, so that all you and your students experience when trying to use the Web is the World Wide Wait? Or maybe your school doesn't have an Internet connection, so you have thought that your students cannot have an Internet experience until the school is wired. Do you have students who get confused when presented with too many options? If your answer was yes to any of these questions, then knowing how to archive sites for later use may be useful to you.
 
     Archiving a Web site means that you visit Web sites and download them to your own computer, your school server, a removable hard drive, or onto a floppy disk for use at a later time. Some people call this "Web whacking." Once a site has been archived, you can be more confident that it will be available when you want it. The load time will be significantly less because the computer will not have to fetch it from another computer halfway across the country or the world. You control the content you decide to archive so you can limit the amount of material available to students. Best of all, you don't even have to have a live connection to the Internet to share archived materials with your class.
 
What do I need:
 
Web sites, when archived, can take up a lot of disk space or storage space. Generally, you can only store one to two Web pages on a floppy disk. Though this is a good way to start, many sites you want to share are larger than this. You can use your classroom computer to store archived sites if you have a live Internet connection and if you are willing to throw away the archived sites regularly to free up memory for others. You can download the archived sites to your school server if you have a live Internet connection and your server administrator agrees. Although a server will probably have more space than your classroom computer, there is still a finite amount of storage, and you may be encouraged to throw away archived sites regularly in order to free up space. We have found that the best solution is to invest in a removable hard drive (such as a Zip ® , Syquest ® , or Jazz ® drive). Since these use disks that store large amounts of data, you can store several sites on each disk. When a disk gets full, you simply buy another one. This way you never run out of memory and you can use old disks when wanting to find something you archived earlier.
 
      If you do not have a live Internet connection on your school machine or one you own personally, you will need to locate a machine that is connected. You will also want to be sure that the machine you select is the same platform (Windows or Mac) as the one you use at school. Places that often have connected machines you can use are: the administrative offices of your district, a local college or university, a public library, a local business, or friends. You will probably need a minimum of two hours to search for relevant sites and to archive them.
 
Step One: Deciding What to Look For
 
The most important aspect of using the Internet in the classroom is to clearly decide what type of material you want and how you want the students to use it. For example, you might be teaching a unit on the rain forest and you would like to give the students the opportunity to find out more about the animals and insects of the rain forest so they can transform their classroom into a model of a rain forest. You decide you primarily want to find sites that provide pictures of animals and insects and fairly simple descriptions that tell where they live in the rain forest.
 
     Once you have decided what you want, you can more easily decide what keywords you want to use to try to find relevant sites and which search engines to use. In this example, you probably would want to use a kid-safe search engine such as Yahooligans or Ask Jeeves Jr. to search for sites that provide both pictures and descriptions. You could use keywords such as "rain forest," "rain forest animals," "insects of the rain forest, "rain forest lessons," or "rain forest graphics." You might also want to use the keywords "tropical insects" or "rain forest" at the Amazing Picture Machine to find additional graphics.
 
Step Two: Searching and Bookmarking
 
Once you know what you are looking for and are in front of an Internet-connected machine, you will open a browser such as Netscape or Microsoft Explorer. Then, you will want to locate appropriate sites. If you don't want to spend a lot of time searching for sites, a good strategy is to go to TrackStar and type in your keyword there. For example, when I typed in the keyword "rainforest," six tracks were returned. I could look at each track (by clicking on the link and then choosing either frames or text) and find sites about the rainforest that other teachers use with their students. The Web address for each site in a track is clearly marked and can be entered so you can see the site outside of the track. In the example, the teacher chose the track entitled, The Rainforest. The teacher then wanted to archive the site entitled, "The Rainforest, PBS Style." The track indicates that the site is located at http://www.pbs.org/tal/costa_rica/index.html. When that Web address is typed into the browser location line, the page can be seen in its full size. This is a necessary step before bookmarking the site.
 
      If you don't find anything in TrackStar that meets your needs, use a search engine such as Yahoo, Alta Vista, Metacrawler, or Ask Jeeves. If you want to ensure the sites you find are kid-safe without having to examine them, use Yahooligans or Ask Jeeves for Kids.
 
     Once you find a promising site, look through it and determine how much of it you want to archive. In our example, the teacher decided to archive the pages about monkeys and birds from the Rainforest PBS Style site. To be able to locate these pages again, the teacher went to each of these pages and bookmarked them. In Netscape, to bookmark a site, you click on the menu entitled Bookmark (or the one with the bookmark icon) and choose Add a Bookmark. Your site will appear at the bottom of the bookmark list (also under the Bookmark menu) and you can see that page again at any time by clicking on that item.
 
Step Three: Methods of Archiving
 
Once you have selected and bookmarked pages you want to archive, you are ready to begin.
 
     There are several strategies for archiving sites. One requires specific software made for this purpose. Some examples are WebWhacker (for Mac or Windows), or Web Buddy (for Mac or Windows). This is probably one of the fastest and easiest methods if you own a machine with Internet access, but isn't as useful if you have to borrow a machine, since most people don't want you to install software on their machines. Another fast and easy method is to use the Explorer 4.0 browser and save the sites as a Web archive. This method will be discussed in a later Tech-along.
 
     A third method, and the one to be discussed here is the manual method, which requires no special software, but takes a little longer.
 
Step Four: Archiving the Text
 
Wake a folder where you want to save your Web pages. I usually put mine on the desktop to make it easy to find. Use your bookmarks to locate a Web page that you want to archive. Click on the File menu and choose SAVE AS. You will get a window where you can name your file. Click on the pull-down FORMAT window near the bottom of the window and hold down the mouse until you see "Source." Slide the mouse down and select Source instead of Text. Name your page and select the folder (or directory) in which you wish to save the file, then click the Save button. This will save all the writing on the page. To see what it looks like, go to your browser and under the File menu, choose OPEN PAGE (or OPEN FILE). Choose the file you just saved. You should see all the text and the background, but no pictures. Go back to the original Web site.
 
Step Five: Archiving the Graphics
 
To archive a graphic, click and hold on the graphic and you will see a pop-up menu. Select SAVE THIS IMAGE AS and you will get a browser window. Do not change the name of the image, but be sure it goes in the folder where you saved the text file. Click on the Save Button. Do this with all the graphics on the page (including the advertisements). Test the page again, by going to the File menu, choosing Open Page, and choosing the text file. Some equivalent choices for saving a text file are saving as: DOS text, Plaintext, or ASCII text. You should now see the archived site with all the text and all the graphics.
 
     If the graphics do not show up, there is no need to panic. You simply need to make a few changes in your archived text file. First, write down the names of all the graphics (e.g., spider.gif). Next, open a word processing program. SimpleText will work fine. Then using the File menu, open the archived text document. Examine the text. You are looking for lines that contain the words "img src=" within angled brackets like these: ‹ ›. The words may be in all caps. The words may be separated from each other by other words such as Height= and Width=. Once you find one of these lines look to see if it matches the name of one of your archived graphics. It usually will contain extra information. For example, I had called one of my graphics "spider.gif" but in the archived text the reference read ‹IMG SRC="images/rainforest/spider.gif"›. You need to make the archived reference match your graphics title. In the example, I simply deleted the part that didn't match (i.e., images/rainforest/). When you have made corrections, go to the File menu and choose SAVE AS. Don't change the name of the document, but be sure that the format (usually found under the area where you type the name of the document) is TEXT ONLY. This is mega-important!
 
      Test the page again after making the corrections. It should look better now.
 
Step Six: Archiving Links
 
The hyperlinks (the text or pictures that you click on to go to another page) on this page will not be active unless you also archive the pages selected by the links. Choose a link, then archive the text and graphics on that page. Go back to the original and choose another link, archive the text and graphics. Do this until all the links that you want active have been archived. Then go to the first linked page and determine if you want any links on that page archived. If so, choose the link, then archive the text and graphics. You may want to make a list of which links work and which don't on each page to help you keep track. It is easy to be confused. This is why it is easier to use archiving software if you want to archive sites with lots of pages and lots of links.
 
Step Seven: Is this Legal?
 
The question of what you can legitimately archive is somewhat unclear. Most sites have a copyright statement. If that statement says that the material on the site cannot be copied or transmitted in digital form without the written consent of the copyright holder, then you should not archive that site without permission. Usually there is an e-mail link to the Webmaster of that site. This is a good place to send your permission request. Our experience has been that most people will allow you to archive the site if:
  • you have a clear educational purpose
  • you define what portions of the site you wish to archive
  • delineate the length of time you want to archive the site
  • you have a means of crediting the authors appropriately
     It also helps if you tell why you need to archive the content instead of using it over the Internet (time, Internet access, speed, etc.). Most permissions took from one to two weeks to receive. Once you get a permission, print it, date it, and keep it in a permission file.
 
     Some copyright statements say that others can borrow the material freely. Be sure to print out this copyright statement and put the date on the copy. This can serve as protection if there is ever any question about the legality of your archived material. It is important to note that this permission to borrow applies only to their site, not to other sites to which they may have links.
 
     It appears that most authors view their sites as a whole. For example, current interpretations of the law are that any advertisements on the page must be archived with a site. If you wish to archive only part of a site, you might want to e-mail the author to be sure this is ok.
 
     You can find out more about Internet copyright at the Copyright Resources on the Internet.
 
     Although this paper covers only archiving Web sites for the Mac using the Netscape browser, the concept is the same for other platforms and other browsers. Save the text and then save each graphic, keeping related files together in a folder. Those who wish to use commercial software to archive Web sites for off-line browsing can easily find reviews and places to buy such software by doing a Web search using the keywords "browsing offline."
 

Created by Jennifer Holvoet, University of Kansas, Lawrence


techalong logo
 

 

 

 

 
Jennifer Holvoet is the Team Leader of the Teachers' Team at SCR*TEC, located at University of Kansas in Lawrence, Kansas. Read more about this educator.
 
SCRTEC takes nominations and ideas for Tech-along. Send comments to the editor.
Tech-along © 1998 SCR*TEC.