The process that was developed provides a way to access a public repository of biomedical papers (PubMed) using its API for Python (PyMed) and extract information on a topic (Covid-19) that is processed using Natural Language Processing (NLP) techniques to extract semantic triplets in the format of “subject, predicate, object”. The software that is used for the semantic triplet extractions is SemRep, a UMLS-based program that extracts three-part propositions, called semantic predications from sentences in biomedical text. Additional processing is performed by implementing various techniques of data mining to generate and maintain additional information for the origin and the quality of those semantic triplets. This information is then stored in a graph database (Neo4J), which can be explored using a query language (Cypher) and presented in a way that is easy to understand and use for anyone. This process can be used to aid researchers in their expedition of research into Covid-19. The most important findings of the reasearch were the corellation among critical aspects of covid-19, along with its relevant/accurate source for further investigation. Considering that the study and analysis of how language is used figuratively and literally is proven to be a challenging task, the outcome of this project can serve clinicians needs and it can be easily adjusted to provide insightful data for any other medical topic. Keywords: covid-19, graph databases, data mining ,semantic predications, pubmed, semrep, neo4j,cypher
Python