4.1 Acquisition of Content

4.1 Acquisition of Content #

4.1.1 - The repository shall identify the Content Information and the Information Properties that the repository will preserve. #

Response #

As stated in the Preservation Strategic Plan, SP is committed to preserving the intellectual content of every object held in the repository. SP has identified three properties which will be prioritized in all preservation activities:

  • The metadata included with object at time of ingest, especially that which relates to other objects within the repository.
  • The content object. SP will accept a wide range of well-known, commonly used formats, the details of which are negotiated in the Provider Agreement. The object includes all supplemental materials and the relationship between these objects, as can be determined from metadata or other context at the time of ingest.
  • The intellectual rights to the object held by SP and members of its Designated Community Definition. These properties are used to control access to the content and to determine its preservation level.

Secondary considerations in preservation include the following properties. While not strictly a part of the intellectual content of the preservation object, these properties are necessary to ensure its preservation and as such must be tracked as well.

  • The object’s chain of custody, starting as early as possible but at the very least from the time it entered the repository. This information is necessary in order to understand the history of the object, and to record any transformations or changes that have occurred to the content.
  • Information on the object’s representation. For every digital object, some level of interpretation is necessary in order to transform the object from binary data into a human interpretable item.
  • Fixity information. The repository will keep sufficient metadata on the object to ensure at any point in the future that the object remains in a complete and uncorrupted state.

SP maintains the content objects in the Provider provided format until there is a need for transformation of these objects to delay or prevent file obsolescence. In the course of these transformations, priority is given to maintaining the information contained in an individual content object, as opposed to preserving its appearance or a specific presentation.

Responsibility #

  • Digital Preservation Librarian

Documents #

  1. Mission Statement
  2. Provider Agreement
  3. Workflow Charts
  4. Preservation Strategic Plan
  5. Preservation Implementation Plan
  6. Preservation Action Plan - Journals

4.1.1.1 - The repository shall have a procedure(s) for identifying those Information Properties that it will preserve. #

Response #

SP commits to preserving materials for which it has accepted responsibility and to maintain access to the material for the Designated Community. SP maintains a discussion with Scholars Portal Operations & Development Committee (SPOD) in order to ensure that the preserved Information Properties are adequately serving the Designated Community. The repository will also be guided by digital preservation best practices and standards.

Responsibility #

  • Digital Preservation Librarian
  • SPOD

Documents #

  1. Preservation Strategic Plan
  2. Preservation Implementation Plan
  3. Preservation Action Plan - Journals
  4. Provider Agreement
  5. Workflow Charts
  6. Definition of AIP

4.1.1.2 - The repository shall have a record of the Content Information and the Information Properties that it will preserve. #

Response #

All Content Information and Information Properties are recorded in the Metadata Specifications.

Content Note - Journals #

As stated in the Preservation Implementation Plan and further described in the Preservation Action Plan - Journals, for the Full Preservation level for journal articles, currently acceptable formats include PDF and XML. XML articles may have diagrams in GIF, JPG, TIFF, or PNG format. Articles not in these formats (or those unable to be converted to these formats) may still be preserved at the Bit-level Preservation level. Supplementary materials will be accepted in any format, and preserved at the Bit-level preservation level.

Responsibility #

  • Digital Preservation Librarian
  • Metadata Librarian

Documents #

  1. Preservation Strategic Plan
  2. Preservation Implementation Plan
  3. Preservation Action Plan - Journals
  4. Collection Development Policy
  5. Metadata Specifications

4.1.2 - The repository shall clearly specify the information that needs to be associated with specific Content Information at the time of its deposit. #

Response #

Although SP is not dependent on, or restricted to, any particular file formats upon Provider deposit, it nevertheless aims to use well known, widely accepted formats that support long-term preservation. If a Provider wants to use a specific format not meeting these criteria, an agreement must be reached between the Provider and SP. These conditions are specified in SP’s Provider Agreement or License. Please see the additional documents below for more detail.

Content Note - Journals #

The repository requires Provider-supplied XML or SGML, ideally in the NLM Journal Archiving and Interchange Tag Set, which contains descriptive (e.g. bibliographic) as well as structural (e.g. file relationships) metadata.

Responsibility #

  • Digital Preservation Librarian
  • OCUL Projects Officer

Documents #

  1. Provider Agreement
  2. Definition of SIP
  3. Workflow Charts

4.1.3 - The repository shall have adequate specifications enabling recognition and parsing of the SIPs. #

Response #

Upon ingest, every file in the repository is subject to identification of its file format using DROID and validation of that format using JHOVE. During the process of DROID identification, a file format is associated with each file, and, where possible, the file is linked to the format’s entry in PRONOM, the British National Archives' format registry. The outputs of these processes are recorded in the preservation metadata for each file.

When necessary, Scholars Portal crosswalks metadata from the publisher’s XML/SGML to a Scholars Portal version of NLM XML. The repository creates preservation metadata for each file. The preservation level, explained in the Preservation Implementation Plan, is applied to each file upon ingest and recorded in the preservation metadata for each file.

Please see the documents below for additional details.

Responsibility #

  • Software Developer
  • Digital Preservation Librarian
  • Metadata Librarian
  • System and Web Development Analyst

Documents #

  1. Registry of File Formats
  2. Definition of SIP
  3. Preservation Action Plan - Journals

4.1.4 - The repository shall have mechanisms to appropriately verify the identity of the Producer of all materials. #

Response #

SP allows only Providers known to the repository to deposit digital objects. SP manages Providers through the configuration of loader scripts specific to each Provider and a contract agreed upon by both parties.

Please see an example of a Provider Agreement for more information. In addition, the Pull Script Diagram, available below, contains information about the relationship between SP and Providers during the ingest process.

Responsibility #

  • Software Developer
  • Digital Preservation Librarian

Documents #

  1. Provider Agreement
  2. Pull Script Diagram

4.1.5 - The repository shall have an ingest process which verifies each sip for completeness and correctness. #

Response #

SP has mechanisms in place to verify the completeness and correctness of each SIP upon ingest.

Completeness

  • Each Provider signs an agreement that defines the required components of a SIP. The unique loader script created for each Provider accepts only content conforming to this agreement. Any content differing from these requirements will result in errors in the loading process and will be recorded and reported to the appropriate SP staff.

Correctness

  • The SIP is tested during the Pull Script process to ensure that the downloaded file size matches that of the original file still in the publisher FTP server. Any error is immediately reported to loader@scholarsportal.info and SIP is reloaded.
  • Each file in the SIP is passed through format identification and validation tools, currently JHOVE and DROID.

If an error is found during the Pull Script test, an email is sent to the loader and ingest is retried or loader script is corrected. If the error persists the Provider is notified.

Documents explaining the mechanisms in greater detail are available below. Please also see SP’s definition of “completeness” and “correctness” in the Glossary.

Responsibility #

  • Software Developer
  • Digital Preservation Librarian
  • System and Web Development Analyst

Documents #

  1. Pull Script Diagram
  2. Registry of File Formats
  3. Preservation Strategic Plan
  4. Preservation Implementation Plan
  5. Preservation Action Plan - Journals

4.1.6 - The repository shall obtain sufficient control over the Digital Objects to preserve them. #

Response #

SP obtains rights from individual Providers that give the repository control over the information deposited by the Provider. The nature and scope of these rights varies by Provider and collection. In cases where the repository takes responsibility for the preservation of information, the rights include provisions for SP to receive a local copy of the information and host it in perpetuity. These local loading and perpetual access clauses allow SP to have a local copy and to make that copy available to its Designated Community regardless of changes in the relationship between SP and the Provider. Where possible, the repository obtains the right to modify information in order to ensure long-term preservation and accessibility.

SP Workflow Charts document the process by which SP gains physical control of the files so licensed. This process includes the creation of preservation metadata for each object sufficient to validate the fixity of the object.

Responsibility #

  • Software Developer
  • Digital Preservation Librarian
  • OCUL Projects Officer

Documents #

  1. Provider Agreements
  2. Workflow Charts
  3. Metadata Specifications

4.1.7 - The repository shall provide the producer/depositor with appropriate responses at agreed points during the ingest processes. #

Response #

SP maintains logs that list relevant information at each point in the ingest process listed below. These are available to the Provider upon request.

  • Pull Script - after the SIP is successfully pulled into the loader, the script makes a record of the FTPed files with the file name, size and current date and adds the file name to the FTP downloaded log file. If there is an error, the Provider is notified.
  • Preparation of Datasets - if decompression of datasets is unsuccessful, the script adds the file name to the publisher error log and to the publisher problem directory. In the case where the file is found to be corrupt, the Provider is notified.
  • Ejournals Loader - datasets are converted from the Provider XML/SGML into NLM XML, given a URI and inserted into the ejournals database. Any errors are recorded in a log file.

Responsibility #

  • Software Developer
  • Digital Preservation Librarian
  • System and Web Development Analyst

Documents #

  1. Quality Control
  2. Provider Agreement
  3. Workflow Charts

4.1.8 - The repository shall have contemporaneous records of actions and administration processes that are relevant to content acquisition. #

Response #

SP records and saves records of ingest actions for all content. These records include ingest process logs and the ingest event generated as a part of the preservation metadata for each AIP.

Responsibility #

  • Software Developer
  • Digital Preservation Librarian

Documents #

  1. Provider Agreement
  2. Metadata Specifications