Artificial intelligence application recognizes and digitizes historical books written in multitone

Artificial intelligence application recognizes and digitizes historical books written in multitone
Artificial intelligence application recognizes and digitizes historical books written in multitone
--

An intelligent artificial intelligence platform aimed at the digital display and management of texts of historical Greek books, written in a polytonic system, was developed by the Digital Document Processing team of the Laboratory of Computational Intelligence of the National Research Center for Natural Sciences “Democritos”. The application is presented at the “Athens Science Festival”, which is hosted until tomorrow, Sunday, April 21, in the Technopolis of the municipality of Athens.

The platform was developed in the context of the “reBook” project, which is implemented in collaboration with the Association for the Distribution of Useful Books (SÓB) and the company Innews, within the framework of the NSRF 2014-2020. The aim of the project is the development of new techniques and methodologies for recognizing texts, mainly written in a polytonic system, for the scientific documentation of cultural heritage.

With the help of the application, researchers of Democritos are digitizing and digitally republishing approximately 100 books from the SOB archives, which date from the beginning of the 20th century onwards. Among them are Adamantios Korai’s collection “Epistolai ton Protopsaltin”, published in 1911 and republished in 1959. Also, the book “The Hellenicity of the Prefectures of Prousia and Smyrna” by Pantelis Kontogiannis, first published in 1919.

Images of scanned books are uploaded to the application, and then with the help of artificial intelligence, the text of each image is recognized, even if it is written in a multi-tone system. Also, in the pre-processing stage of each image, the application corrects any problems the book image may have, from crooked scanning to correcting faded pages.

“There is a huge amount of books that are not available digitally and we want to make them available, to bring to the surface documents, which are in the cupboards and on the shelves of libraries. So our goal is for historical books to reach the general public and researchers”, Katerina Christopoulou, Ph.D candidate in Landscape Ecology and scientific collaborator of “Demokritos”, explains to APE-MBE.

Explaining the value of the application, Ms. Christopoulou points out that “we don’t just see a pdf with the image of the page, but the image has optical character recognition (OCR) behind it, so the reader can use parts of the file or search inside this”. But the big difference in the application lies “in the reading of the polytonic system”.

It is not the first time that the Computational Intelligence Laboratory of the Institute of Informatics and Telecommunications of Democritos deals with the processing and identification of historical documents. In a corresponding project that he implemented in collaboration with the University of Cyprus, he proceeded to digitize with the same method polytonic texts from approximately 150 editions of Shakespeare’s works in Greek. These are translations signed by great writers, such as Konstantinos Cavafis, Konstantinos Theotokis and Dimitrios Vikelas, and which come from, among others, the collections of the Parliament Library, the National Library and the Hellenic Literary and Historical Archive.

One of the bets that the Laboratory has placed is the digital display of manuscripts. “Handwriting recognition in modern texts has come a long way. What has not been done in the past is a tool that can identify old manuscripts, especially Greek polytonic ones”, explains the head of the Laboratory, Vassilis Gatos, to APE-MPE.

Currently, a project is underway in collaboration with the Bank of Greece for the identification and processing, again with the help of artificial intelligence, of the handwritten minutes of the Bank’s Board of Directors from the period 1928-1988. The file numbers about 30,000 pages. “It’s a very difficult manuscript problem, but something that helps us in this case is that the scribes are specific over the years, so for each scribe we have thousands of pages and this helps us in the matter of training the system,” emphasizes Mr. Cat.

In an earlier project, the team collaborated with the Mount Sinai Monastery Foundation to develop technologies to search for information directly from the images of the Monastery’s manuscripts. As part of the project, more than 100,000 pages of historical manuscripts were analyzed and identified.

It is noted that the work of another research group from the same Democritus Laboratory will be presented at the “Athens Science Festival”. The “AI4GEO” team will explain how observing the Earth through artificial intelligence applications is “transforming” into a giant watchful eye that helps us discover rocks and deposits, record natural disasters or monitor evolving humanitarian crises.


The article is in Greek

Tags: Artificial intelligence application recognizes digitizes historical books written multitone

-

NEXT Greek-Turkish: The “Blue Homeland” in schools – In Turkish books from next year