The CMC Dives into Database Cleanup
The year? 2021. The directive: database cleanup.
The Consortium of Academic and Research Libraries in Illinois (CARLI) staff contacted Illinois Heartland Library System and the Cataloging Maintenance Center (CMC) about a possible database cleanup project. After several meetings, the CMC staff agreed to this undertaking. It was decided that the Cataloging Maintenance Center would be responsible for interviewing and hiring three full-time temporary catalogers to work virtually on this project. All three employees were hired and started working for the CMC in September 2021. Current Cataloging Maintenance Center staff working on the CARLI project full-time include Blake Walter, Katie Roberts, and Andrea Giosta. Two permanent full-time CMC catalogers, Mary Cornell, and Eric McKinney are also helping with this database cleanup project.
CARLI’s shared database is I-Share, and they use Ex Libris’ Alma as their library management system. Alma has three zones of bibliographic records: a separate institution zone (IZ) for each of the 88 I-Share member libraries, the shared network zone (NZ) for the I-Share consortium, and the community zone (CZ) shared by all Ex Libris customers and database vendors.
In early 2021, Ex Libris identified that I-Share libraries had exceeded the title count metric for their license. At the outset of project work in June 2021, The Consortium of Academic and Research Libraries in Illinois’s title count (all bibliographic records minus electronic journal bib records) exceeded 66 million records. This count included the bibliographic records found in 88 institution zones and the I-Share network zone. The NZ (network zone) contributed over 16 million titles to that count. By removing duplicate bibliographic records, the database will be cleaner, and online borrowing requests will be accurately tracked and filled. The reduction of bibliographic records will bring the I-Share libraries back under the title count limit for their license.
Duplication of consortial e-resource titles in all institution zones (IZs) accounted for over 7 million total bib records. Shifting records for shared electronic resources, such as consortial-licensed Alexander Street Press collections, as well as open access resources, to the NZ and sharing elsewhere cut that number in half. Additional deletions of orphaned records in IZs and the NZ reduced another two-plus million records through December 2021.
On the topic of duplicate titles, when including the CZ (community zone) records, there are over 1 million duplicate records when matching on OCLC number. Excluding the CZ, the number of direct matches on OCLC was closer to 10,000 records.
CARLI staff have provided spreadsheets of duplicate records in the NZ (network zone) based on the value of the 035 $a OCLC number. Cataloging Maintenance Center staff can use two different processes to resolve these lists of duplicate records in the NZ:
One option is to compare the duplicate records in Alma’s metadata editor, manually select one NZ record to retain, and then merge the other NZ records into the retained record. As a final step, WorldCat is searched for an updated version of the retained record which is then manually imported and merged into the retained record to ensure that it reflects updated cataloging from WorldCat.
Alternatively, Cataloging Maintenance Center staff can bring up the duplicate records in the metadata editor and compare them side by side with the most recent version of the record in Connexion. If needed, the Connexion record is edited and replaced to bring it up to full-level cataloging, including the most recent 3XX tags in RDA. Any appropriate tags found only in Alma, like added subject headings, contents notes, or summary notes, can also be added to the WorldCat record in Connexion when it is updated and replaced. The updated Connexion record is then exported to a MARC data file. As a final step, the MARC data file of updated WorldCat records is then imported as a batch into Alma. The Alma import profile handles any instances of matching duplicate records in the NZ, and all duplicates are automatically merged into a single record which then gets overlaid with the updated record from Connexion. One advantage of this batch importing process is that it will also identify and resolve additional NZ duplicates found in 019/035 $z tags.
During the first year of the Alma cleanup project, September 2021-September 2022, CMC staff deduped 174,269, deleted 2,951,376, and edited 25,862 bibliographic records for a total of 3,137,535 records (see graphic on right).
At the beginning of the project, deleting duplicate electronic collections was the focus. Then, in January 2022, the Cataloging Maintenance Center (CMC) staff began working on the Spertus Institute Project, which involved editing records rather than deleting bibliographic records or electronic collections. The current phase of the Alma CMC Cleanup Project involves the merging of duplicate records in the Consortium of Academic and Research Libraries in Illinois (CARLI) I-Share database. Records with identical OCLC IDs are merged, ensuring that the merged record that is retained includes the complete set of MARC tags. I-Share records without any network holdings are deleted either via a batch process or else individually after comparison with the retained I-Share record. CMC staff have refined their workflow processes through each different phase of the project.
- Ex Libris acknowledged the effort being made to reduce the record counts and, in conjunction with ongoing negotiations, reached a new agreement with CARLI on the title counts.
- Title counts now emphasize the shared nature of the network environment: The count begins with the network NZ (network zone) records present, then adds any IZ (institution zone) records that are not linked to the NZ or CZ (community zone).
- Any IZ records that are linked to the NZ are not counted again.
- Any IZ records that are linked to the CZ are not counted again.
- Ex Libris management hopes this agreement will open opportunities for I-Share members to implement new processes and explore efficiencies with new integrations.
- While the pressure is officially off the Consortium of Academic and Research Libraries in Illinois and members to reduce the title count, the project work reinforced the need to have ongoing attention paid to duplicate data in the NZ, as duplicates routinely interfere with patron search results and automated fulfillment of I-Share requests.
Spertus Institute was identified as a special case with respect to record counts that were not linked to the NZ (network zone). A large percentage of their records lacked OCLC numbers as a result of having been an RLIN library before joining I-Share. OCLC could not provide a mapping of RLIN record numbers to OCLC numbers. Cataloging Maintenance Center catalogers prepared Spertus records for linking to the NZ by verifying titles and adding OCLC numbers to their records. The process was automated as much as possible, but the process did require review and mapping by staff.
Their records were organized by item type, then searched in OCLC. If a number was located, it was added to the record and then searched in their IZ (institution zone) and NZ to identify any duplicates. Current OCLC records were not imported into Spertus. Their records were not brought up to date.
The Spertus project was managed through multiple spreadsheets. Comparisons are made with 100 (Author), 245 (Title), 250 (edition), 26X (publisher, publication date), and 300 (pages) MARC fields. The 010 MARC field was very helpful when present.
General guidelines for I-Share libraries
- As a result of the new understanding between the Consortium of Academic and Research Libraries in Illinois (CARLI) and Ex Libris, libraries need not be concerned about the overall title count when adding collections, records, or services.
- Duplication of records remains a concern, in that the presence of multiple records with the same OCLC number can prevent seamless importing and updating of records. Catalogers should remember to check the NZ (network zone) for matching records before adding new records. The Technical Services Committee is redrafting the consortial cataloging guidelines for the NZ and will be emphasizing this more in the next few months.
In summary, the work of this project has reminded us of the value of cooperative cataloging efforts and collaborative work on database maintenance. The project also contributes to a better understanding of the network zone and how it should be used to improve efficiencies. Some of what is learned reinforces the importance of maintaining accurate and up-to-date metadata. The continuing database cleanup project will also transform how the cooperative I-Share database can be used to support CARLI’s current values and goals.
Funding for the Cataloging Maintenance Center is provided through the Illinois State Library and the Secretary of State and administered by Illinois Heartland Library System. The CMC provides statewide cataloging support for Illinois libraries, including free original and copy cataloging of eligible special collections, consultation on metadata projects, database cleanup for LLSAPs, cataloging training, and more. Learn more at www.illinoisheartland.org/cmc.