Description (en)
HathiTrust is a collaboration of universities working
together to establish a repository that archives and shares
their digitized collections. Initially, the Submission
Information Packages (SIPs) deposited into HathiTrust
were extremely uniform, being constituted primarily of
books digitized by Google. HathiTrust’s ingest
validation processes were correspondingly highly
regular, designed to ensure that these SIPs met agreedupon
qualities and specifications. As HathiTrust has
expanded to include materials digitized from other
sources, SIPs have become more varied in their content
and specifications, introducing the need to make
adjustments to ingest and validation routines. One of the
primary sources of new SIPs is the Internet Archive,
which has digitized a large number of public domain
materials owned by HathiTrust partners.
Many of the technical, structural, and
descriptive characteristics of materials digitized by the
Internet Archive did not match previously developed
standards for materials in HathiTrust. A variety of
solutions were developed to transform these materials
into HathiTrust-compatible AIPs and ingest them into
the repository. The process of developing these solutions
provides an example to other organizations that would
like to add new types of materials to their repository, but
are uncertain of the issues that may arise, or how these
issues can be addressed.