Graph-Based Sports Data Discovery

A graph database is a database designed to treat the relationships between data as equally important to the data itself. It is intended to hold data without constricting it to a pre-defined model. Instead, the data is stored like we first draw it out – showing how each individual entity connects with or is related to others.  There are no isolated pieces of information, but rich, connected domains all around us. Only a database that natively embraces relationships is able to store, process, and query connections efficiently. While other databases compute relationships at query time through expensive JOIN operations, a graph database stores connections alongside the data in the model.  One of the most popular graph database platforms is Neo4j.

What is Neo4j?

Neo4j is an open-source, NoSQL, native graph database that provides an ACID-compliant transactional backend for your applications. Initial development began in 2003, but it has been publicly available since 2007. The source code, written in Java and Scala, is available for free on GitHubor as a user-friendly desktop application download. Neo4j has both a Community Edition and Enterprise Edition of the database. The Enterprise Edition includes all that Community Edition has to offer, plus extra enterprise requirements such as backups, clustering, and failover abilities.

Neo4j is referred to as a native graph database because it efficiently implements the property graph model down to the storage level. This means that the data is stored exactly as you whiteboard it, and the database uses pointers to navigate and traverse the graph. In contrast to graph processing or in-memory libraries, Neo4j also provides full database characteristics, including ACID transaction compliance, cluster support, and runtime failover – making it suitable to use graphs for data in production scenarios.

How is it Useful in the Sports Community?

Sports and recreation is a wide-ranging field that has elements of human interaction with the environment (e.g., boating, mountaineering),  team play (e.g., baseball, football and basketball leagues), commerce (e.g., brand sponsorship by major sports stars), education (e.g., high school and collegiate sporting events and scholarships) and other discoverable features.  When seeking information on the inherent relationships between the nodes, relationships and properties of a data set the most effective means of achieving this is through a graph-based representation of the data.

The Neo4j community is actively contributing open source ‘Gists‘ or implementations of various sporting and recreation activities to illustrate the query and discovery mechanism of a graph data base.  To view some of these contributions visit the Neo4j Gist / Sports and Recreation link.

For example, Shantaram Waingankar’s graph on mountaineering in the Himalayan Mountains helps trekkers plot routes between villages and towns for Mt. Everest and K2 climbs.

Basketball in hoopAnother example with practical outcomes is Di Lu’s graph database showing National Basketball Association (NBA) play-off winning rates.  Gamblers are well ahead by taking advantage of these tools before placing bets.

The following visualization of the graph data model shows the number of wins by team for the 2013, 2014 and 2015 play-offs.

Application to Cybersecurity for Sports

As the Sports-ISAO continues to train cadres of cyber threat hunters and cyber threat intelligence analysts we are using tools such as graph-based databases to help us “discover” the patterns the threat actors are using to attack games, leagues, athletes and sporting events.

Join us and help us protect the integrity of the sports community for all of us!

You must be logged in to post a comment Login