Sunday, March 29, 2026

Mastering Metadata Cleanup in MarcEdit

Mastering Metadata Cleanup in MarcEdit

Mastering Metadata Cleanup in MarcEdit

Cleaning metadata in MarcEdit is a systematic process. It typically begins by "breaking" a .mrc (binary) file into the mnemonic .mrk (text) format, performing batch edits, and then compiling it back into .mrc.

1. Batch Deleting Unwanted Fields

To remove entire tags (like local 9XX fields or vendor-specific 655 tags) across your entire file:

  • Path: Tools > Add/Delete Field (Shortcut: F7)
  • Example: Removing all 949 local call number fields.
  • Action: Enter 949 in the Field box and click Delete Field.

Pro Tip: Use the Preview button first. It’s the best way to ensure you aren't accidentally deleting essential data.

2. Targeted Subfield Editing

Use this when you need to change data within a field, such as stripping proxy prefixes from URLs or fixing punctuation.

Path: Tools > Edit Subfield Data (Shortcut: F9)
Goal Field / Subfield Field Data Replace With
Remove "Electronic book" from 655 655 / a Electronic book. (Leave Empty)
Update Proxy Prefix in 856 856 / u oldproxy.com/ newproxy.com/
Add trailing period to 245 245 / a ([^.])\s*$ $1. (Check Regex)

3. Updating Indicators

Indicators control how data is indexed. A common task is fixing the second indicator in the 245 field to account for "The" or "A".

  • Path: Tools > Edit Indicators (Shortcut: F8)
  • Example: Changing 050 \4 (Local LC Call Number) to 050 00 (LC assigned by LC).

4. Modernizing with the RDA Helper

The RDA Helper automates the transition from AACR2 to modern RDA standards.

  • Path: Tools > RDA Helper
  • What it does:
    • Adds 336 (Content), 337 (Media), and 338 (Carrier) fields.
    • Converts abbreviations (e.g., "p." to "pages").
    • Removes the 245 $h [electronic resource] GMD.

5. Global Find/Replace & Regex

For general text cleaning (like fixing typos or removing specific phrases), use Edit > Replace (Ctrl+H). For complex patterns, enable Use Regular Expressions.

Regex Example: To find cases where a subfield $b is missing a leading space, search for ([^\s])\$b and replace with $1 $b.

Once your edits are complete, navigate to File > Compile File to save your work back into the .mrc format for your ILS.

Saturday, March 28, 2026

Display in OPAC of Different datatypes

⚠️
Diagnosis: Your Koha is defaulting to "text" because the Item Type Codes in your MARC (952$y) do not match the codes in your Koha Administration.
Solution 1: Define Item Types in Koha Admin

Go to Koha Administration → Item types and create these entries exactly as they appear in your MARC data:

Code (Must match 952$y) Description Suggested Icon
BK Book bridge/book.gif
ARTICLE Journal Article bridge/periodical.gif
THESIS Research Thesis bridge/thesis.gif
BIBLIO Bibliographic Entry bridge/reference.gif
Solution 2: Adjust MARC Leader (LDR) for Theses

If an item is missing, Koha looks at Position 06 of the Leader. To differentiate theses from standard books:

  • Books/Articles: Set Position 06 to a (Language material).
  • Theses/Manuscripts: Change Position 06 to t (Manuscript language material).
MarcEdit Fix: Click the LDR field → Type of Record → Select "t-Manuscript language material".
Solution 3: Enable XSLT System Preferences

Ensure your system is configured to show icons in the OPAC:

  1. Search for DisplayOPACiconsXSLT → Set to Show.
  2. Search for OPACNoItemTypeImages → Set to Show (this enables images).
Solution 4: Collection Codes (CCODE)

To see "Sirah Hub" or "SNK Bibliography" clearly next to the item type, add Collection Codes:

  1. Go to Admin → Authorized values → CCODE.
  2. Add values: SIRAH (Sirah Research Hub) and SNK (Sher Nowrooz Khan).
  3. Add the tag to your MARC: =952 \\$8SNK

Standardizing these administrative settings ensures your scholarly Hub is visually organized and professional for researchers.

Marc Tags of Different Data Types

Koha 25.11 "Sirah Hub" Test Suite

MARC21 Record Samples for Multilingual & Scholarly Validation

1. Physical Book (Multi-Holding) Consolidation Test
=LDR 00000nam 2200000ia 4500 =001 hub-book-001 =020 \\$a9789694080123 =100 1\$aNomani, Shibli. =245 10$aSirat-un-Nabi /$cShibli Nomani. =260 \\$aLahore :$bReligious Publications,$c2010. =952 \\$aIRI$bIRI$pML100445-IRI$yBK$t1$o297.63 NOM =952 \\$aSSMK$bSSMK$p120684-SSMK$yBK$t1$o297.63 NOM
2. Journal Article (Analytical Entry) 773 Linking Test
=LDR 00000nab 2200000ia 4500 =001 hub-art-001 =100 1\$aAhmad, Zohaib. =245 10$aProphetic Diplomacy in the Medinan Period :$ba bibliometric review. =773 0\$tJournal of Islamic Thought and Civilization$gVol. 10, No. 2 (2025)$x2070-0326 =856 40$uhttps://define.pk/articles/prophetic-diplomacy.pdf$zFull Text PDF
3. Thesis/Dissertation 502 Academic Test
=LDR 00000nam 2200000ia 4500 =001 hub-thesis-001 =100 1\$aAhmed, Rauf. =245 10$aMapping Sirah Literature in Pakistan :$ba comparative study of digital repositories. =502 \\$bPh.D.$cInternational Islamic University, Islamabad$d2026.
4. Virtual Bibliographic Citation (SNK) 510 Citation Test
=LDR 00000nam 2200000ia 4500 =001 hub-snk-045 =100 1\$aHamidullah, Muhammad. =245 10$aThe Life and Work of the Prophet of Islam. =510 4\$aSher Nowrooz Khan, Bibliography of Sirah Literature$cEntry No. 45. =952 \\$aSNK_BIB$bSNK_BIB$pSNK-45$yBIBLIO$t1$oREF SNK-45

🔍 Validation Checklist

  • The "Split" Display: Does "Nomani" show both IRI and SSMK availability in the OPAC results?
  • Link Integrity: Is the "Full Text PDF" link in the Article record clickable?
  • Note Visibility: Does the 510 tag appear clearly in the "Description" or "Notes" tab of the Hamidullah record?
  • Virtual Branch: Verify that the SNK_BIB item is listed as "Not for Loan."

The SNK Bibliography Strategy

The SNK Bibliography Strategy

Integrating Historical Citations into the Modern Hub

Managing the Sher Nowrooz Khan (SNK) bibliography requires a shift in perspective. We are not just cataloging books; we are performing a Bibliometric Mapping of Sirah literature, acknowledging the history of its documentation.

1. The "Virtual Branch" Concept

In Koha, create a library code specifically for this bibliography: SNK_BIB. This allows you to track items that exist in the "world index" even if you don't physically own them yet.

  • The Barcode: Use the Entry Number from the printed bibliography (e.g., SNK-124).
  • The Status: Set these to a custom "Not for Loan" status: Bibliographic Citation Only.

2. Scholarly Acknowledgement (Tag 510)

To give Sher Nowrooz Khan proper "Academic Credit," use MARC Tag 510. This formally links the record to his scholarly work.

510 $a: Sher Nowrooz Khan, Bibliography of Sirah Literature
510 $c: Entry No. 124

Result: Users see a note stating: "This book is acknowledged/cited in Sher Nowrooz Khan's Bibliography."

Handling the 4 SNK Entry Types

Entry Type Hub Action Deduplication Strategy
Books Create "Master Record" Match by ISBN/Title. Merge physical holdings into one 510 tag.
Journal Articles Create "Analytical Record" Use 773 tag to link to the Journal title.
Theses Create "Thesis Record" Match by Author+Title. Use 502 tag for University info.
Library Holdings Location Metadata Add an item with the NLP branch code if SNK cites it there.

The "Ghost Duplicate" Solution

To prevent having a "Physical Record" (IRI) and a "Cited Record" (SNK) as two separate entries, use the MarcEdit Merge Tool:

  1. Source: Your existing Koha records.
  2. Merge File: The SNK Bibliography records.
  3. Action: Tell MarcEdit to only add the 510 tag to the existing record instead of creating a new one.

This approach acknowledges that Sher Nowrooz Khan "found" these works before they were digitally available, adding immense historical value to your PhD research.

The Surgical Merge: MarcEdit Edition

Preserving Metadata Integrity for the Sirah Research Hub

Merging records in MarcEdit is like performing a surgical transplant: you take the "healthy" metadata from one file and move the "vital" local data (barcodes and URLs) from another into it.

The Scenario: A Practical Example

Imagine you have two separate files for the same 100 Sirah books:

File A (Source)

High-quality bibliography. Perfect titles and Islamic subject headings, but no barcodes.

File B (Merge)

SSMK Library records. Messy titles, but contains unique barcodes and shelf locations.

Goal: The Perfect Master Record

The Step-by-Step Process

1
Open the Merge Tool: Launch MarcEdit and navigate to Tools → Merge Records.
2
Select Your Files: Set File A as your Source and File B as your Merge file. Name your output Sirah_Hub_Final.mrc.
3
Define the "Match Key": Use the ISBN (020 $a) or Title (245 $a) so MarcEdit recognizes which books are identical.
4
Select Specific Data: Only check the boxes for fields you need from File B (e.g., 952 for barcodes, 856 for URLs).
💡 Pro-Tip: Do NOT select fields like 245 (Title) or 100 (Author) in Step 4. You want to keep the "clean" metadata from your Source file, not overwrite it with the "messy" data from the Merge file.

The Final Result (Behind the Scenes)

Feature File A (Source) File B (Merge) Final Merged Result
Title (245) The Life of the Prophet Prophet Life (SSMK) The Life of the Prophet
ISBN (020) 9780123456789 9780123456789 9780123456789
Barcode (952$p) (Empty) ML-100444 ML-100444

Common Troubleshooting

  • "No Matches Found": Standardize your ISBNs first. One file might have dashes while the other does not. Use the MarcEdit ISBN Tool to fix this.
  • Duplicate 952s: Use the "Replace Existing" option in merge settings if your Source file already contains empty placeholder tags.

Merging in MarcEdit is faster and safer than merging in Koha because you can verify 5,000 records in seconds before they ever touch your live catalog.

From Fragmented Lists to an Authoritative Master Record

/h1>

From Fragmented Lists to an Authoritative Master Record

The greatest challenge in the Sirah Research Hub is duplication. When aggregating data from IRI, SSMK, and printed bibliographies, we must move from importing to merging.

The Goal: "One Record, Many Homes"

IRI Copy + SSMK Copy + Citation #402
Unified Sirah Master Record
1

Handling Physical Holdings (Merging)

If you have three repeating entries in Koha, use the Merge selected tool in the staff client. Pick the record with the "best" metadata as your reference. Koha will move the barcodes from the duplicates to this master record automatically.

Result: One search result shows availability at IRI, SSMK, and NLP simultaneously.
2

Handling Bibliographies (Citation Tags)

A bibliography entry is a citation, not a physical book. Instead of a new record, use MARC Tag 510 (Citation/References Note) inside the Master Record.

  • 510 $a: Name of Source (e.g., Sher Nowrooz Khan Bibliography)
  • 510 $c: Entry or Page Number (e.g., Item #142)
3

Preserving Academic Credit

Don't lose track of who cataloged the book first. Use MARC Tag 040 (Cataloging Source) to create a breadcrumb trail of contributions.

Example: 040 $a NLP $d IRI $d SSMK shows the original record came from NLP and was enriched by IRI and SSMK.

💡 MarcEdit Pro-Tip: Before uploading, use the Deduplication Tool in MarcEdit to check your incoming file against your existing Koha export. This stops duplicates before they ever touch your database.

The AI-Driven Sirah Library

Practical Workflows for 2026 Library Standards

Using AI for your Sirah Research Hub is a practical necessity for handling the massive volume of data across multiple Pakistani libraries. By 2026, these tools have become the backbone of professional metadata standardization.

Method 1

AI-Powered Metadata Extraction

Don't type manually from scans or rare manuscripts. Use AI to act as your "Digital Typist."

Practical Example: Feed an image of a rare title page to an AI assistant (like Gemini 1.5 Pro). It extracts Title, Author, and Year into a CSV format ready for Koha import.

Result: Saves 15-20 minutes per title.

Method 2

Intelligent Subject Tagging

Determining if a book belongs under "Ghazwat" (Battles) or "Shama'il" (Character) is complex. AI can analyze deep content instantly.

Practical Example: Paste a Table of Contents into an LLM and ask for MARC 600/650 tags based on Library of Congress standards.
Method 3

Automated ALA-LC Romanization

Normalize author names like "Siddiqui" vs "Siddiqee" into a single, authorized form.

AI Action: Convert "سیرت النبی" and "Sirat-un-Nabi" into the standard ALA-LC authorized title string for consistent indexing.

Efficiency Comparison

Feature Manual Cataloging AI-Assisted Cataloging
Processing Speed 30-45 mins per record 5-10 mins per record
Subject Accuracy Limited to staff expertise Scholarly & Suggestive
Linguistic Support Manual translation/lookup Instant (100+ languages)

Start Today: Use the "Master Prompt"

Give your team this specific prompt to use with any AI tool to ensure high-quality Sirah metadata:

Copy & Paste This Prompt: "I am cataloging a Sirah book for the 'Sirah Research Hub'. Here is the description: [Paste Intro/Table of Contents]. Please generate a MARC21 record in text format (.mrk) including tags 100, 245, 260, 520, and 600. Ensure the subject heading follows LCSH standards and names follow ALA-LC romanization."

Implementation Tip: Use AI as a "Co-Pilot"—always have a librarian verify the AI-generated MARC tags before finalizing the record in Koha.

Mastering Metadata Cleanup in MarcEdit

Mastering Metadata Cleanup in MarcEdit Mastering Metadata Cleanup in MarcEdit Cleaning metadata in M...