The Gaia Explorer, a Powerful Search Platform
Abstract After two years of development, the GAIA Explorer is now ready to assist Geoscientists at Total! This knowledge platform works like a little Google, but with a focus solely on Geosciences - for the time being. The main goal of the GAIA Explorer is to save time finding the right information. Therefore, it is particularly useful for datarooms or after business acquisitions to quickly digest the knowledge, but also for feeding databases, exploration syntheses, reservoir studies, or even staff onboarding specially when remote working. With this additional time, Geoscientists can focus on tasks with added value, such as to synthesize, find analogies or propose alternative scenarios. This new companion automatically organizes and extracts knowledge from a large number of unstructured technical documents by using Machine Learning (ML). All the models relie on Google Cloud Platform (GCP) and have been trained on our own datasets, which cover main petroleum domains such as geosciences and operations. First, the layout of more than 75,000 document pages were analyzed for training a segmentation model, which extracts three types of content (text, images and tables). Secondly, the text content extracted from about 6,500 documents labelled amongst 30 classes was used to train a model for document classification. Thirdly, more than 55,000 images were categorized amongst 45 classes to customize a model of image classification covering a large panel of figures such as maps, logs, seismic sections, or core pictures. Finally, all the terms (n-grams) extracted from objects are compared with an inhouse thesaurus to automatically tag related topics such as basin, field, geological formation, acquisition, measure. All these elementary bricks are connected and used for feeding a knowledge database that can be quickly and exhaustively searched. Today, the GAIA Explorer searches within texts, images and tables from a corpus (document collection), which can be made up of both technical and operational reports, meeting presentations and academic publications. By combining queries (keywords or natural language) with a large array of filters (by classes and topics), the outcomes are easily refined and exploitable. Since the release of a production version in February 2021 at Total, about 180 users for 30 projects regularly use the tool for exploration and development purposes. This first version is following a continuous training cycle including active learning and, preliminary user feedback is good and admits that some information would have been difficult to locate without the GAIA Explorer. In the future, the GAIA Explorer could be significantly improved by implementing knowledge graph based on an ontology dedicated specific to petroleum domains. Along with the help of Specialists in related activities such as drilling, project or contract, the tool could cover the complete range of upstream topics and be useful for other business with time.