Digital Dust of the Arabic Past": Corpus-Based Research in Arabic & Islamic Studies
Keynote for Digital Humanities Institute - Beirut 2019
Maxim Romanov (Vienna)
6pm - Orient-Instiut Beirut
For the past two decades a great number of printed Arabic books have been digitized in the Middle East. Now scholars anywhere in the world—not only at universities privileged with rich Middle East collections—have thousands of fully searchable volumes of classical Arabic texts at their fingertips. Due to this development, research tasks that used to take years of hard work now can be completed within mere hours. However, the field of Arabic & Islamic studies is yet to realize the profoundness of this change. Almost a century and a half ago, with the appearance of printed editions scholars began to find increasingly more texts they could work with. At the same time, the shift in the form from idiosyncratic manuscripts to normalized prints introduced "distance"—a condition of knowledge, as Franco Moretti puts it—that allowed scholars to focus their attention on the deep analysis of multiple texts (close reading). The change in the field went hand in hand with the change in technology. Now we are living through yet another technological shift. Unlike libraries, machine-readable corpora fuse texts into qualitatively new entities and through that promise a new form of "distance" where we will be able to focus our attention on the deep analysis of all available texts (close and distant reading). The digital age also brings us new computational methods that allow us to engage with these machine-readable corpora in the most efficient ways. Text reuse identification methods offer us a novel view on how any text in a corpus is connected to all other texts and through that we can get a penetrating perspective on the complex of interwoven texture of the Arabic written tradition itself. Making possible the extraction of meaningful data from unstructured texts, text mining methods offer ways of modeling large-scale and long-term historical processes from myriads of bits of information scattered across a corpus. The lecture will highlight major developments and current results in the said areas and will conclude with the discussion of the issue of resources and infrastructure required for making such new research possible.
Maxim Romanov is a Universitätsassistent für Digital Humanities at the Department of History, University of Vienna. His dissertation (Near Eastern Studies, University of Michigan, 2013) studied how modern computational techniques of text analysis can be applied to the study of premodern Arabic historical sources. Currently, he works on the study of "The History of Islam" (Taʾrīkh al-islām) by the Damascene scholar al-Dhahabī (d. 1348 CE), which will serve as the methodological and infrastructural foundation for the study of the entire extant corpus of Arabic biographical and historical tradition. Additionally, he is working on a series of foundational projects for the field of digital Islamicate humanities, which include 1) a machine-readable corpus of classical Arabic texts (openiti.github.io, 2) a large-scale text-reuse project (kitab-project.org, and 3) a gazetteer and geographical model of the classical Islamic world (althurayya.github.io).