This document summarizes the key concepts and components of Gremlin's graph traversal machinery:
- Gremlin uses a traversal language to express graph queries via step composition, with steps mapping traversers between domains.
- Traversals are compiled to bytecode and optimized by traversal strategies before being executed by the Gremlin machine.
- The Gremlin machine consists of steps implementing functions that process traverser streams. Their composition forms the traversal.
- Gremlin is language-agnostic, with language variants translating to a shared bytecode that interacts with the Java-based implementation.
1 of 85
More Related Content
Gremlin's Graph Traversal Machinery
1. Gremlin’s Graph Traversal Machinery
Dr. Marko A. Rodriguez
Director of Engineering at DataStax, Inc.
Project Management Committee, Apache TinkerPop
http://tinkerpop.apache.org
2. f : X ! X
The function f is a process that maps a structure of type X to a structure of type X.
3. f(x1) = x2
The function f maps the object x1 (from the set of X) to the object x2 (from the set of X).
x1 2 X x2 2 X
7. 90°
A traverser wraps a value of type V.
class Traverser<V> {
V value;
}
class Traverser<V> {
V value;
}
8. 90°
The step maps an integer traverser to an integer traverser.
class Traverser<V> {
V value;
}
class Traverser<V> {
V value;
}
Traverser<Integer> Traverser<Integer>
9. 90°
A traverser of with a rotation of 0° becomes a traverser with a rotation of 90°.
Traverser(0) Traverser(90)
class Traverser<V> {
V value;
}
class Traverser<V> {
V value;
}
14. A traverser can have a bulk which denotes how many V values it represents.
90°
class Traverser<V> {
V value;
long bulk;
}
class Traverser<V> {
V value;
long bulk;
}
15. 4
4
Bulking groups identical traversers to reduce the number of evaluations of a step.
90°
class Traverser<V> {
V value;
long bulk;
}
class Traverser<V> {
V value;
long bulk;
}
16. A variegated stream of input traversers yields a variegated stream of output traversers.
90°
17. 1
2
1 1
1
2
Bulking can reduce the size of the stream.
90°
class Traverser<V> {
V value;
long bulk;
}
class Traverser<V> {
V value;
long bulk;
}
55. g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices
(flatMap)
one vertex
to that vertex or no vertex
(filter)
one vertex
to many friend vertices
(flatMap)
one vertex to
one age value
(map)
many age values
to an age distribution
(map — reducer)
?
…
37 [37:2, 41:1,
24:1, 35:4]37
37
24
35
35
35
35 41
name=gremlin
57. a b c
a b c
Traversal creation via
step composition
Step parameterization via
traversal and constant nesting
a().b().c()
a(b().c()).d(x)d(x)
function
com
position
function
nesting
fluent m
ethods
m
ethod
argum
ents
Any language that supports function composition and function nesting can host Gremlin.
Gremlin Traversal Language
58. class Traverser<V> {
V value;
long bulk;
}
class Step<S,E> {
Traverser<E> processNextStart();
}
f(x)
class Traversal<S,E> implements Iterator<E> {
E next();
Traverser<E> nextTraverser();
}
The fundamental constructs of Gremlin’s machinery.
Gremlin Traversal Machine
interface TraversalStrategy {
void apply(Traversal traversal);
Set<TraversalStrategy> applyPrior();
Set<TraversalStrategy> applyPost();
}
a db c
a de
≣
63. Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
Gremlin-Python
CPython
64. Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
>>> graph = Graph()
>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))
Gremlin-Python
DriverRemoteConnection
CPython
65. Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
>>> graph = Graph()
>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))
# nested traversal with Python slicing and attribute interception extensions
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()
[u'marko', u'marko']
Gremlin-Python
Bytecode
DriverRemoteConnection
Gremlin Traversal Machine
CPython JVM
66. Python 2.7.2 (default, Oct 11 2012, 20:14:37)
[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gremlin_python.structure.graph import Graph
>>> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
>>> graph = Graph()
>>> g = graph.traversal().withRemote(DriverRemoteConnection('ws://localhost:8182','g'))
# nested traversal with Python slicing and attribute interception extensions
>>> g.V().hasLabel("person").repeat(both()).times(2).name[0:2].toList()
[u'marko', u'marko']
# a complex, nested multi-line traversal
>>> g.V().match(
... as_(“a”).out("created").as_(“b”),
... as_(“b”).in_(“created").as_(“c”),
... as_(“a”).out("knows").as_(“c”)).
... select("c").
... union(in_(“knows"),out("created")).
... name.toList()
[u'ripple', u'marko', u'lop']
>>>
Gremlin-Python
Bytecode
DriverRemoteConnection
Gremlin Traversal Machine
CPython JVM
67. Cypher
Bytecode
Distinct query languages (not only Gremlin language variants) can generate bytecode for
evaluation by any OLTP/OLAP TinkerPop-enabled graph system.
Gremlin Traversal Machine
71. Most OLTP graph systems have a traversal strategy that combines
[V,has*]-sequences into a single global index-based flatMap-step.
g.V().has(“name”,”gremlin”).
out(“knows”).values(“age”).
groupCount()
one graph to many vertices using index lookup
(flatMap)
GraphStepStrategy
one graph to many vertices
(flatMap)
one vertex to that vertex or no vertex
(filter)
?
…
compiles
optimizes
name=gremlin
DataStax
Enterprise Graph
72. Most OLAP graph systems have a traversal strategy that bypasses Traversal semantics
and implements reducers using the native API of the system.
g.V().count()
one graph to long
(map — reducer)
rdd.count() 12,146,934
compiles
one graph to many vertices
(flatMap)
many vertices to long
(map — reducer)
… 12,146,934
optimizes
SparkInterceptorStrategy
…
73. Physical Machine
DataProgram Traversal
Heap/DiskMemory Memory
Memory/Graph System
Physical Machine
Java
Virtual Machine
bytecode
steps
DataProgram
Memory/DiskMemory
Physical Machine
instructions
Java
Virtual Machine
Gremlin
Traversal Machine
From the physical computing machine to the Gremlin traversal machine.
75. Stakeholders
Application Developers
One query language for
all OLTP/OLAP systems.
GremlinG = (V, E)
Real-time and analytic queries are represented in Gremlin.
Graph
Database
OLTP
Graph
Processor
OLAP
77. Stakeholders
Application Developers
One query language for
all OLTP/OLAP systems.
No vendor lock-in.
Gremlin is embedded in
the developer’s language.
Iterator<String> result =
g.V().hasLabel(“person”).
order().by(“age”).
limit(10).values(“name”)
vs.
ResultSet result = statement.executeQuery(
“SELECT name FROM People n” +
“ ORDER BY age n” +
“ LIMIT 10”)
Grem
lin-Java
SQL
in
Java
No “fat strings.” The developer writes their graph database/processor
queries in their native programming language.
78. Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Easy to generate bytecode.
GraphTraversal.getMethods()
.findAll { GraphTraversal.class == it.returnType }
.collect { it.name }
.unique()
.each {
pythonClass.append(
""" def ${it}(self, *args):
self.bytecode.add_step(“${it}”, *args)
return self
“””)}
Gremlin-Python’s source code is
programmatically generated using Java reflection.
79. Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Easy to generate bytecode.
Bytecode executes against
TinkerPop-enabled systems.
Language providers write a translator for the Gremlin traversal machine,
not a particular graph database/processor.
DataStax
Enterprise Graph
80. Graph
Database
OLTP
Graph
Processor
OLAP
Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Easy to generate bytecode.
Bytecode executes against
TinkerPop-enabled systems.
Provider can focus on design,
not evaluation.
Gremlin Traversal Machine
The language designer does not have to concern themselves with
OLTP or OLAP execution. They simply generate bytecode and the
Gremlin traversal machine handles the rest.
82. Provider supports all
provided languages.
Easy to implement core
interfaces.
Graph System Providers
Stakeholders
OLAP Provider
OLTP Provider
The provider automatically supports all query languages
that have compilers that generate Gremlin bytecode.
83. OLTP providers can leverage
existing OLAP systems.
Provider supports all
provided languages.
Easy to implement core
interfaces.
Graph System Providers
Stakeholders
OLAP Provider
OLTP Provider
DSE Graph leverages SparkGraphComputer for OLAP processing.
DataStax
Enterprise Graph
84. Stakeholders
Language Providers
Gremlin Language Variant
Distinct Query Language
Application Developers Graph System Providers
OLAP Provider
OLTP Provider
One query language for
all OLTP/OLAP systems.
No vendor lock-in.
Gremlin is embedded in
the developer’s language.
Easy to generate bytecode.
Bytecode executes against
TinkerPop-enabled systems.
Provider can focus on design,
not evaluation.
Easy to implement core
interfaces.
Provider supports all
provided languages.
OLTP providers can leverage
existing OLAP systems.