integrity System


Integrity is a data science solution that provides insights into scholarly publishing, funding, and research activity using industry standard datasets. Colourful, visual, and interactive search provide multiple points of entry for further discovery.


Using machine learning and artificial intelligence, Integrity provides transparent, real-time results, helping individuals and institutions discover patterns and relationships within scholarly data.


The Integrity system leverages the comprehensive, industry standard datasets and APIs from

  • CrossRef


  • GRID

  • DOAJ

  • Client-specific taxonomies and datasets


We intend to incorporate ROR and Transpose.

Ultimately, we want to integrate as many industry standard datasets as feasible, to provide a transparent, comprehensive, rich, expressive and accessible interface which explores the scholarly world.


  • Integrity mondernises searches and relationship finding utilising graph databases (specifically, Neo4j) 

  • Neo4j Bloom for internal hypothesis generation.

  • Python is used for parsing PDFs and XML, and ingesting JSON files.

  • Custom-built portals are constructed with vis.js or R. 

machine learning & artificial intelligence

We will use ML and AI for centrality and similarity algorithms to discover patterns and relationships within scholarly metadata, institutional affiliations, funding, subject categories, publishers, individual authors and researchers, and more.


Integrity’s transparent search and discovery is rooted in industry standard datasets. Basic searches will be free (with user registration); subscription customers can add custom data layers -- including their own datasets and taxonomies -- to the Integrity search.

Integrity clients and partners use a custom-built or white-label portal interface (built using vis.js or R) that provides graphical presentation of the Neo4j data and options for data download (eg, CSV). The paid version of Integrity allows specific focus on particular data segments of relevance and custom taxonomies and datasets.