The disappointment of navigating to a favorite website or an important online resource only to find that it has evaporated from the live web is one that we are all too familiar with. Websites disappear, move to new URLs, and embedded content is continually changed, resulting in an impermanence that can pose a threat to art historical scholarship. For publishers who are willing to invest time, expense, and substantial effort in creating web-based digital art history resources, why then let them go defunct or disappear from the web without preserving them (and the full experience of them as a website) for future researchers?
The practice of web archiving offers a solution to the ephemerality of web-based materials. Web archiving involves the use of a web crawler to harvest all of the files and related metadata of a website and saving them as a WARC file (or Web ARChive file format, an ISO standard). WARC files can then be rendered/viewed with a playback mechanism, such as the Internet Archive’s Wayback Machine, offering the experience of the website as it existed online at a specific date in time. Web archiving can be as simple as saving a page to the Wayback Machine to ensure a permanent link persists as a citation for locating the page in the future, or it can be done at scale in a more programmatic way, to include: curation and collection development, harvesting and quality assurance, metadata and description, and long-term storage and preservation.
Over the past seven years at the New York Art Resources Consortium (NYARC) we have established a web archiving program for collecting, preserving, and making publicly accessible web-based resources specific to art and art history. Our web archive collections are primarily built and managed with the Internet Archive’s Archive-It subscription service. We additionally utilize the web archiving tools offered by Webrecorder and the Conifer service at Rhizome. The sites that are selected for inclusion in our web archives are of relevance to scholars of art history and are in alignment with our own institutional collecting missions. Our subject-based collections include: art resources, artists’ websites, auction house websites, born-digital catalogues raisonnés, New York City gallery and art dealer websites, and websites for scholarship related to the restitution of lost or looted art. To date we have archived nearly 8 terabytes of content, inclusive of NYARC’s own institutional websites.
Designing your website with accessibility standards in mind will not only make it more usable and discoverable, but it will ensure it is more readily preservable by a web crawler. Including a site map is an important step to guaranteeing that the web crawler can identify all of the URLs within your website. In regards to displaying content that would generally be searched for or sorted by a site user, providing an option to “view all entries” will allow a web crawler to archive all content items within a collection. Adding a robots exclusion standard (robots.txt) to portions of a website will block web crawlers — thus it’s best to remove them from the sections of a site most significant to preserve.
For more information on web archiving and steps that can be taken to develop a more easily preservable website for your digital art history initiative, the following resources from experienced web archiving practitioners provide helpful guidance:
- ARLIS/NA Web Archiving Special Interest Group (SIG)
- Princeton University Library Special Collections: Guidelines for Designing Preservation-Friendly Websites
- New York Art Resources Consortium (NYARC) Wiki: Web Archiving
- Library of Congress Recommended Formats Statement on Web Archives
- Columbia University Libraries Web Resources Collection Program: Guidelines for Preservable Websites
- Smithsonian Institution Archives: Five Tips for Designing Preservable Websites
- International Internet Preservation Consortium (IIPC): Tools and Software
- Stanford Libraries: Archivability
- Library of Congress: Creating Preservable Websites
- UK National Archives: Web Archiving Guidance
Sumitra Duncan | Head, Web Archiving at the New York Art Resources Consortium