The HBP Knowledge Graph provides many tools and APIs trying to hide the complexity of semantic multi-datasource meta-data management. For this to work, there are multiple technical components involved, on which we would like to give you a short tour.
If you have any questions / suggestions / comments, please get in touch with us!
The HBP Knowledge Graph bases on BlueBrain Nexus which provides a multi-modal solution for an eventual consistent data store.
As part of the BlueBrain Nexus, Apache Cassandra stores an event log of JSON-LD messages and is the primary storage component. The in-built indexing mechanism then ensures the indexing of the JSON-LD into multiple index-databases: Blazegraph is a triple store, elasticsearch is used for full-text queries. Since the indexing mechanism works asynchronously, the databases are eventually consistent.
To achieve the described solution, additional services were built around the standard Nexus infrastructure:
An additional indexing client normalizes the incoming payload (full qualification), executes inference logic, indexes the data in Arango DB and interprets semantics (e.g. recognizes spatial anchoring payloads and indexes them after a rasterization it in the additional Apache Solr index).
The decision for an additional index has been taken because of the need for a more simple way to traverse the graph and to recombine query results in a client specific way. The direct consequence of this need is the KG Query API which allows to execute semantically unambiguous queries on the data, transparently handles the combination of the spatial search and standard meta-data query and allows reflection, automatic client-code generation and abstraction.
Although nice for scalability, the eventual consistency causes problems for applications such as the KG Editor where postponed updates can lead to confusing states on a reactive UI with data manipulation (it e.g. can happen that changes which were just applied by a user are not yet reflected in the database). The KG Sync API therefore provides a synchronous alternative API primarily created for this use-case: Creations / modifications / deletions are applied to the Arango index directly after they have been transferred in the Nexus API. Therefore, they are immediately reflected in queries of the KG Query API which allows us to provide a responsive UI. The standard indexing process will overwrite this "temporary indexing" after a while.
Automated import scripts (typically written in Python) which load data from a specific source, transform it to the required JSON-LD structures and make use of the Nexus API to upload the data to the Knowledge Graph can be triggered externally. At HBP, we're using a job scheduler who manages these kind of reoccurring jobs.
The KG Search is built as a standalone application making use of the HBP Knowledge Graph as its original data source. This does not only reduce the dependency between the systems and allows us to scale the Search component independently of the HBP Knowledge Graph but also is a perfect showcase of how other (external) clients can integrate with the KG.
Find more information depending on:
Or contact us by e-mail: kg-team@humanbrainproject.eu