About

HyperGraphDB is a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. It is a graph database designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

The system is reliable and in production use is several projects, including a search engine and our own Seco scripting IDE where most of the runtime environment is automatically saved as a hypergraph.

HyperGraphDB is primarily what its carefully chosen name implies: a database for storing hypergraphs. While it falls into the general family of graph databases, it is hard to categorize HyperGraphDB as yet another database because much of its design evolves around providing the means to manage structure-rich information with arbitrary layers of complexity. For instance, a relational as well as an object-oriented style of data management can be emulated. As a graph database, HyperGraphDB doesn't impose any constraints and offers much more generality than all other graph databases we've come across. The design is minimalistic at its core and the end-goal is to evolve a set of concepts and practices, combining structure and interpretation in such a way as to allow future software to meet the complexities of the real-world better that now.

Contributors

  • Borislav Iordanov
  • Konstantin Vandev
  • Ciprian Costa
  • Mihail Marinov
  • Murilo Saraiva de Queiroz
  • Ian Holsman
  • Alain Picard
  • Ingvar Bogdahn

Key Facts

  • The mathematical definition of a hypergraph is an extension to the standard graph concept that allows an edge to point to more than two nodes. HyperGraphDB extends this even further by allowing edges to point to other edges as well and making every node or edge carry an arbitrary value as payload.
  • The original requirements that triggered the development of the system came from the OpenCog project which is attempt at building an AGI (Artificial General Intelligence) system based on self-modifying probabilistic hypergraphs.
  • The basic unit of storage in HyperGraphDB is called an atom. Each atom is typed, has an arbitrary value and can point to zero or more other atoms.
  • Data types are managed by a general, extensible type system embedded itself as a hypergraph structure. Types are themselves atoms as everybody else, but with a particular role (well, as everybody else too).
  • The storage scheme is platform independent and can thus be accessed by any programming language from any platform. Low-level storage is currently based on BerkeleyDB from Sleepycat Software.
  • Size limitations are virtually non-existent. There is no software limit on the size of the graph managed by a HyperGraphDB instance. Each individual value's size is limited by the underlying storage, i.e. by BerkeleyDB's 2GB limit. However, the architecture allows bypassing BerkeleyDB for particular types of atoms if one so desires.
  • The current implementation is solely Java based. It offers an automatic mapping of idiomatic Java types to a HyperGraphDB data schema which makes HyperGraphDB into an object-oriented database suitable for regular business applications.A C++ implementation has been frequently contemplated, but never initiated due to lack of manpower. Note that the storage scheme being open and precisely specified, all languages and platforms are able to share the same data.
  • Embedded in-process: the database comes in the form of a software library to be used directly through its API.
  • A P2P framework for distributed processing has been implemented for replication/data partitioning algorithms as well as client-server style computing.

Possible Usage Scenarios

In a server-side Java application, the standard setup relies on a RDBMs together with a set of business components and a presentation tier. If you've kept up with the latest industry advances, you have a good O/R mapping tool such as Hibernate to transparently and non-intrusively convert your object structure to/from database tables. Recently, there has been a noticeable trend to replace RDBMs, especially for smaller applications by embedded in-memory databases with less sophisticated, but typically much faster querying capabilities.

In a desktop Java application, programmers frequently rely on a large set of configuration files to store user preferences and other persistent application state. A large amount of time is devoted to the management of configuration data and frequently end-users are not allowed to configure simple application behavior simply because programmers don't have the time to make "everything" configurable and need to selectively predict the most important parameters of potential interest to users. With HyperGraphDB, all beans that have to do with configuration can simply be added as atoms and they will be managed from there on.

Bioinformatics projects form a category of fairly complex software that not only can benefit form a data management piece like HyperGraphDB, but also constitute a very natural fit for it. Frequently, such projects need to manage highly complex descriptive information based on structured taxonomies (or ontologies), together with large sets of experimental data. In addition, sophisticated algorithms operate on both experimental and ontological data in order to infer interaction networks at various level of biological organization. HyperGraphDB is designed to facilitate all those activities.

Semantic Web projects are an obvious domain of application of HyperGraphDB. The so called "conceptual graphs" or RDF graphs and even the more advanced modeling practices utilizing higher-order relationships have a straightforward and natural expression within the HyperGraphDB framework.

Networks research can benefit from the capacity of HyperGraphDB to store very large, distributed graphs and have pattern mining, computationally intensive algorithms operate on them.