Computer scientists and humanities scholars have been trying for more than decade to produce software that can accurately read Arabic text and transform it into a digital format, a task that has thus far eluded them. But artificial intelligence is changing that, opening up the possibility of getting archives of newspapers, magazines, and books available to all on the Internet.
“For a long time, accurate and reliable Arabic optical character recognition has remained sort of a mirage for academics (especially in the humanities) and librarians. Advances in the field in recent years have however gradually been transforming it into a reality,” writes Dominique Akhoun-Schwarb, curator of rare books and manuscripts at SOAS, University of London, in an e-mail.
Arabic text is harder for computers to read than the Latin alphabet. Arabic and its related languages Persian, Ottoman Turkish and Urdu are written as a continuous script; consonant letters have a variety of shapes depending on their place in a word and there are markings below and above letters that are essential to a word’s meaning but can be hard to see.
Software to digitize the Arabic language does already exist, but Khater says it is “limited and frustrating” to use.
Read more at the source
Be sure to read our post from 2017 about The Kitab Project led by Dr. Sarah B. Savant, associate professor at Aga Khan University’s Institute for the Study of Muslim Civilisations (AKU-ISMC).