Dr. Muhammad Hamidullah Library: Fixing Urdu and Arabic Search in Koha’s Elasticsearch

Saturday, March 28, 2026

Fixing Urdu and Arabic Search in Koha’s Elasticsearch

The Problem: By default, Elasticsearch treats Urdu/Arabic like English. It fails to recognize Right-to-Left (RTL) logic, character normalization, or diacritics (Harakat).

For a specialized library like the Sirah Research Hub, Elasticsearch is far superior to the legacy Zebra engine—but only if you install the "Missing Piece."

1. The Missing Piece: The ICU Analysis Plugin

The ICU (International Components for Unicode) plugin is vital. It allows the search engine to ignore Zer/Zabar/Pesh so a user finds the Prophet's name regardless of exact spelling.

# Run this on your Koha server terminal:
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-icu
sudo systemctl restart elasticsearch

2. Configure the Arabic/Urdu Analyzer

Once the plugin is active, tell Koha to use it. Navigate to:

Koha Admin → Search Engine Configuration (Elasticsearch)

Ensure your multilingual fields are mapped to the icu_analyzer or the dedicated arabic analyzer to handle word roots and stemming.

Zebra vs. Elasticsearch: The Sirah Hub Verdict

Feature	Zebra (Legacy)	Elasticsearch (Recommended)
Urdu/Arabic Support	Requires manual .chr file hacking	Native with ICU Plugin
Performance	Slow with 14,000+ records	Sub-millisecond response
Fuzzy Searching	Basic/Limited	Advanced (Finds "Sirah" vs "Seerah")
Scalability	Fixed/Rigid	Built for Big Data

💡 Academic Impact: Elasticsearch’s "Fuzzy Searching" is a lifesaver for researchers. It ensures that varying transliterations of names (e.g., Shibli vs Shebli) still lead to the correct scholarly records.

Action Step: Verify with your technical team if analysis-icu is installed. Without it, the Hub is limited; with it, it is world-class.

Dr. Muhammad Hamidullah Library

Saturday, March 28, 2026

Fixing Urdu and Arabic Search in Koha’s Elasticsearch

Optimizing Elasticsearch for Sirah Research

1. The Missing Piece: The ICU Analysis Plugin

2. Configure the Arabic/Urdu Analyzer

Zebra vs. Elasticsearch: The Sirah Hub Verdict

No comments:

Post a Comment

Claude

Followers

Report Abuse