URI and File Naming Policy #

1. Policy Statement #

URIs created by Scholars Portal:

Scholars Portal uses a systematic convention to generate unambiguously unique identification for digital objects within its repository. This convention will create a stable name or reference to an object that can be permanently associated with that object, regardless of future changes to organizational structure or to digital access protocols.
This is in conformance with section 4.2.4 of Audit and Certification of Trustworthy Digital Repositories (CCSDS, 2011) which states that a compliant repository “shall have and use a convention that generates persistent, unique identifiers for all AIPs” and “its components.”
This convention will ensure that “each AIP can be unambiguously found in the future” and that “each AIP can be distinguished from all other AIPs in the repository”

2. Implementation #

2.1. Journal articles #

2.1.1. Scholars Portal URIs are consistently constructed in the following manner: #

/<ISSN>/v<volume number>i<issue number padded to four digits>/<article hash>
The article hash is generated by concatenating the starting page number of the article, an underscore character, the first letter of the first six words in the article title, and the first letter in the last six words in the article title. In cases where there are not enough words in the article title to construct to this specification, the first letters of each word in the title are used.

Examples: #

Article: DNA-Directed Self-Assembly of Gold Nanoparticles onto Nanopatterned Surfaces: Controlled Placement of Individual Nanoparticles into Regular Arrays - ACS Nano (October 2010), 4 (10), pg. 6153-6161
URI: /19360851/v04i0010/6153_dsognooinira
Article: A possible chemical burn to the scalp following hair highlights - Burns (June 2005), 31 (4), pg. 530-531
URI: /03054179/v31i0004/530_apcbttsfhh

In the case of a collision, the URI that was generated will be appended with an underscore (_) and a sequential number beginning with one. This number will increment for each duplicate URI.

Example: #

Article: Book Review - Journal of the Franklin Institute (September 1944), 238 (3), pg. 224-224
URI: /00160032/v238i0003/224_br
Article: Book Review - Journal of the Franklin Institute (September 1944), 238 (3), pg. 224-224
URI: /00160032/v238i0003/224_br_1

In the case of journals that have no ISSN, a Scholars Portal identifier is defined and inserted in place of the ISSN at the beginning of the URI. The SP identifier is generated by concatenating a four-character provider-level identifier with a four-digit string corresponding to the journal.

Example: #

Article: Kŭlloja - April 2019, Volume 924 (Issue 4)
URI: /sp010001/v924i0004/nfp_k
Article: Kŭlloja - August 2014, Volume 868 (Issue 8)
URI: /sp010001/v868i0008/nfp_k

In the case of a replacement article, the new copy of an article will supersede the old and claim the original identifier. The old copy of the article will retain the original identifier with “_old1” appended to the end. In the case of subsequent replacements, the replaced article will be appended with “_old<X>”, where <X> is the next available integer.

Example: #

Article: The Myth of the Unicorn - Diogenes (September 1982), 30 (119), pg. 1-23
URI: /03921921/v30i0119/1_tmotu
Old Article: The Myth of the Unicorn - Diogenes (September 1982), 30 (119), pg. 1-23
URI: /03921921/v30i0119/1_tmotu_old1

The URI is used not only as the unique identifier for the item, but also as a path to the item’s file in MarkLogic on both the search index and the preservation database. It provides a link between the preservation engine and the Scholars Portal search engine. The MarkLogic path takes the form of:

https://journals.scholarsportal.info/details.xqy?uri=/03054179/v31i0004/530_apcbttsfhh.xml

2.1.2. URIs for individual files and events created by Scholars Portal #

Scholars Portal URIs for individual files and events are consistently constructed in the following manner:

The URI of the parent object and a hash generated from the current date/time are concatenated.
PDF files are identified by adding “pdf_fulltext” after the parent object URI and then the current date\time hash.
XML files are identified by adding “xml_fulltext” after the parent object URI and then the current date\time hash. In this way, there should be no chance of collisions.

Examples: #

Parent Object: A C. elegans LSD1 Demethylase Contributes to Germline Immortality by Reprogramming Epigenetic Memory - Cell (April 2009), 137 (2), pg. 308-320
Parent Object URI: /00928674/v137i0002/308_aceldcgibrem
PDF Fulltext: /00928674/v137i0002/308_aceldcgibrem/pdf_fulltext/1303399300907
XML Fulltext: /00928674/v137i0002/308_aceldcgibrem/xml_fulltext/130339930294
Other Files: /00928674/v137i0002/308_aceldcgibrem/1303399302709
Events: /00928674/v137i0002/308_aceldcgibrem/1303399314628

2.1.3. File system path structure for content files #

Content objects are stored in the same directory structure in which they arrived, which is transferred to a filesystem specific to the collection (e.g., ejournals1), in a directory specified for the publisher and named at the time a loader script is written. Both the filesystem name and the publisher directory name are followed by a sequential identifier, starting with 1. Publisher directories are limited in space to 200GB, so when the directory reaches that size, a new one is created, and the sequential identifier is incremented. (e.g. kluwer1, kluwer2, kluwer3)
When the current filesystem reaches 2 TB in size, it is unmounted and remounted as a read-only volume. At this time, a new filesystem is created, and the filesystem’s sequential identifier is incremented. (e.g. ejournals1, ejournals2) Any publisher directories that were on the old filesystem are considered closed, and the sequential identifier will be incremented for the next directory created for that publisher.

Please see Move Event Diagram

Example Paths: #

/mnt/pillar/ejournals2/wiley5/
/mnt/pillar/ejournals3/wiley6/
/mnt/pillar/ejournals3/ieee2/

References #

Consultative Committee for Space Data Systems (CCSDS). (2011). Audit and certification of trustworthy digital repositories. CCSDS 652.0-M-1.

Move Event Diagram

Review Cycle #

Ongoing

URI and File Naming Policy #

1. Policy Statement #

2. Implementation #

2.1. Journal articles #

2.1.1. Scholars Portal URIs are consistently constructed in the following manner: #

Examples: #

Example: #

Example: #

Example: #

2.1.2. URIs for individual files and events created by Scholars Portal #

Examples: #

2.1.3. File system path structure for content files #

Example Paths: #

References #

Related Documents #

Review Cycle #