Artificial Intelligence (AI) Helps Computers Leap Forward in Reading Arabic

Computer scientists and humanities scholars have been trying for more than decade to produce software that can accurately read Arabic text and transform it into a digital format, a task that has thus far eluded them. But artificial intelligence is changing that, opening up the possibility of getting archives of newspapers, magazines, and books available to all on the Internet.

“For a long time, accurate and reliable Arabic optical character recognition has remained sort of a mirage for academics (especially in the humanities) and librarians. Advances in the field in recent years have however gradually been transforming it into a reality,” writes Dominique Akhoun-Schwarb, curator of rare books and manuscripts at SOAS, University of London, in an e-mail.

Arabic text is harder for computers to read than the Latin alphabet. Arabic and its related languages Persian, Ottoman Turkish and Urdu are written as a continuous script; consonant letters have a variety of shapes depending on their place in a word and there are markings below and above letters that are essential to a word’s meaning but can be hard to see.

Software to digitize the Arabic language does already exist, but Khater says it is “limited and frustrating” to use.

Read more at the source

Be sure to read our post from 2017 about The Kitab Project led by Dr. Sarah B. Savant, associate professor at Aga Khan University’s Institute for the Study of Muslim Civilisations (AKU-ISMC).

Interview with Dr Sarah Bowen Savant on KITAB Project

Author: ismailimail

Independent, civil society media featuring Ismaili Muslim community, inter and intra faith endeavors, achievements and humanitarian works.

One thought

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.