Apache Persistence Query
Dependency
To use Persistence Query, you must add the following dependency in your project:
- sbt
val PekkoVersion = "1.0.3" libraryDependencies += "org.apache.pekko" %% "pekko-persistence-query" % PekkoVersion
- Maven
<properties> <scala.binary.version>2.13</scala.binary.version> </properties> <dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-bom_${scala.binary.version}</artifactId> <version>1.0.3</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependencies> <dependency> <groupId>org.apache.pekko</groupId> <artifactId>pekko-persistence-query_${scala.binary.version}</artifactId> </dependency> </dependencies>
- Gradle
def versions = [ ScalaBinary: "2.13" ] dependencies { implementation platform("org.apache.pekko:pekko-bom_${versions.ScalaBinary}:1.0.3") implementation "org.apache.pekko:pekko-persistence-query_${versions.ScalaBinary}" }
This will also add dependency on the Pekko Persistence module.
Introduction
Pekko persistence query complements Event Sourcing by providing a universal asynchronous stream based query interface that various journal plugins can implement in order to expose their query capabilities.
The most typical use case of persistence query is implementing the so-called query side (also known as “read side”) in the popular CQRS architecture pattern - in which the writing side of the application (e.g. implemented using Pekko persistence) is completely separated from the “query side”. Pekko Persistence Query itself is not directly the query side of an application, however it can help to migrate data from the write side to the query side database. In very simple scenarios Persistence Query may be powerful enough to fulfill the query needs of your app, however we highly recommend (in the spirit of CQRS) of splitting up the write/read sides into separate datastores as the need arises.
For a similar implementation of query interface to Durable State Behaviors please refer to Persistence Query using Durable State.
Design overview
Pekko persistence query is purposely designed to be a very loosely specified API. This is in order to keep the provided APIs general enough for each journal implementation to be able to expose its best features, e.g. a SQL journal can use complex SQL queries or if a journal is able to subscribe to a live event stream this should also be possible to expose the same API - a typed stream of events.
Each read journal must explicitly document which types of queries it supports. Refer to your journal’s plugins documentation for details on which queries and semantics it supports.
While Pekko Persistence Query does not provide actual implementations of ReadJournals, it defines a number of pre-defined query types for the most common query scenarios, that most journals are likely to implement (however they are not required to).
Read Journals
In order to issue queries one has to first obtain an instance of a ReadJournal
ReadJournal
. For example, given a library that provides a pekko.persistence.query.my-read-journal
obtaining the related journal is as simple as:
- Scala
-
source
// obtain read journal by plugin id val readJournal = PersistenceQuery(system).readJournalFor[MyScaladslReadJournal]("pekko.persistence.query.my-read-journal") // issue query to journal val source: Source[EventEnvelope, NotUsed] = readJournal.eventsByPersistenceId("user-1337", 0, Long.MaxValue) // materialize stream, consuming events source.runForeach { event => println("Event: " + event) }
- Java
-
source
// obtain read journal by plugin id final MyJavadslReadJournal readJournal = PersistenceQuery.get(system) .getReadJournalFor( MyJavadslReadJournal.class, "pekko.persistence.query.my-read-journal"); // issue query to journal Source<EventEnvelope, NotUsed> source = readJournal.eventsByPersistenceId("user-1337", 0, Long.MAX_VALUE); // materialize stream, consuming events source.runForeach(event -> System.out.println("Event: " + event), system);
Journal implementers are encouraged to put this identifier in a variable known to the user, such that one can access it via readJournalFor[NoopJournal](NoopJournal.identifier)
getJournalFor(NoopJournal.class, NoopJournal.identifier)
, however this is not enforced.
Predefined queries
Pekko persistence query comes with a number of query interfaces built in and suggests Journal implementors to implement them according to the semantics described below. It is important to notice that while these query types are very common a journal is not obliged to implement all of them - for example because in a given journal such query would be significantly inefficient.
Refer to the documentation of the ReadJournal
ReadJournal
plugin you are using for a specific list of supported query types. For example, Journal plugins should document their stream completion strategies.
The predefined queries are:
PersistenceIdsQuery and CurrentPersistenceIdsQuery
persistenceIds
persistenceIds
which is designed to allow users to subscribe to a stream of all persistent ids in the system. By default this stream should be assumed to be a “live” stream, which means that the journal should keep emitting new persistence ids as they come into the system:
If your usage does not require a live stream, you can use the currentPersistenceIds
currentPersistenceIds
query:
EventsByPersistenceIdQuery and CurrentEventsByPersistenceIdQuery
eventsByPersistenceId
eventsByPersistenceId
is a query equivalent to replaying an event sourced actor, however, since it is a stream it is possible to keep it alive and watch for additional incoming events persisted by the persistent actor identified by the given persistenceId
.
- Scala
-
source
readJournal.eventsByPersistenceId("user-us-1337", fromSequenceNr = 0L, toSequenceNr = Long.MaxValue)
- Java
-
source
readJournal.eventsByPersistenceId("user-us-1337", 0L, Long.MAX_VALUE);
Most journals will have to revert to polling in order to achieve this, which can typically be configured with a refresh-interval
configuration property.
If your usage does not require a live stream, you can use the currentEventsByPersistenceId
currentEventsByPersistenceId
query.
EventsByTag and CurrentEventsByTag
eventsByTag
eventsByTag
allows querying events regardless of which persistenceId
they are associated with. This query is hard to implement in some journals or may need some additional preparation of the used data store to be executed efficiently. The goal of this query is to allow querying for all events which are “tagged” with a specific tag. That includes the use case to query all domain events of an Aggregate Root type. Please refer to your read journal plugin’s documentation to find out if and how it is supported.
Some journals may support tagging of events or Event Adapters that wraps the events in a persistence.journal.Tagged
persistence.journal.Tagged
with the given tags
. The journal may support other ways of doing tagging - again, how exactly this is implemented depends on the used journal. Here is an example of such a tagging with an EventSourcedBehavior
EventSourcedBehavior
:
- Scala
-
source
val NumberOfEntityGroups = 10 def tagEvent(entityId: String, event: Event): Set[String] = { val entityGroup = s"group-${math.abs(entityId.hashCode % NumberOfEntityGroups)}" event match { case _: OrderCompleted => Set(entityGroup, "order-completed") case _ => Set(entityGroup) } } def apply(entityId: String): Behavior[Command] = { EventSourcedBehavior[Command, Event, State]( persistenceId = PersistenceId("ShoppingCart", entityId), emptyState = State(), commandHandler = (state, cmd) => throw new NotImplementedError("TODO: process the command & return an Effect"), eventHandler = (state, evt) => throw new NotImplementedError("TODO: process the event return the next state")) .withTagger(event => tagEvent(entityId, event)) }
- Java
-
source
private final String entityId; public static final int NUMBER_OF_ENTITY_GROUPS = 10; @Override public Set<String> tagsFor(Event event) { String entityGroup = "group-" + Math.abs(entityId.hashCode() % NUMBER_OF_ENTITY_GROUPS); Set<String> tags = new HashSet<>(); tags.add(entityGroup); if (event instanceof OrderCompleted) tags.add("order-completed"); return tags; }
A very important thing to keep in mind when using queries spanning multiple persistenceIds, such as EventsByTag
EventsByTag
is that the order of events at which the events appear in the stream rarely is guaranteed (or stable between materializations).
Journals may choose to opt for strict ordering of the events, and should then document explicitly what kind of ordering guarantee they provide - for example “ordered by timestamp ascending, independently of persistenceId” is easy to achieve on relational databases, yet may be hard to implement efficiently on plain key-value datastores.
In the example below we query all events which have been tagged (we assume this was performed by the write-side using tagging of events or Event Adapters, or that the journal is smart enough that it can figure out what we mean by this tag - for example if the journal stored the events as json it may try to find those with the field tag
set to this value etc.).
- Scala
-
source
// assuming journal is able to work with numeric offsets we can: val completedOrders: Source[EventEnvelope, NotUsed] = readJournal.eventsByTag("order-completed", Offset.noOffset) // find first 10 completed orders: val firstCompleted: Future[Vector[OrderCompleted]] = completedOrders .map(_.event) .collectType[OrderCompleted] .take(10) // cancels the query stream after pulling 10 elements .runFold(Vector.empty[OrderCompleted])(_ :+ _) // start another query, from the known offset val furtherOrders = readJournal.eventsByTag("order-completed", offset = Sequence(10))
- Java
-
source
// assuming journal is able to work with numeric offsets we can: final Source<EventEnvelope, NotUsed> completedOrders = readJournal.eventsByTag("order-completed", new Sequence(0L)); // find first 10 completed orders: final CompletionStage<List<OrderCompleted>> firstCompleted = completedOrders .map(EventEnvelope::event) .collectType(OrderCompleted.class) .take(10) // cancels the query stream after pulling 10 elements .runFold( new ArrayList<>(10), (acc, e) -> { acc.add(e); return acc; }, system); // start another query, from the known offset Source<EventEnvelope, NotUsed> furtherOrders = readJournal.eventsByTag("order-completed", new Sequence(10));
As you can see, we can use all the usual stream operators available from Streams on the resulting query stream, including for example taking the first 10 and cancelling the stream. It is worth pointing out that the built-in EventsByTag
query has an optionally supported offset parameter (of type Long
) which the journals can use to implement resumable-streams. For example a journal may be able to use a WHERE clause to begin the read starting from a specific row, or in a datastore that is able to order events by insertion time it could treat the Long as a timestamp and select only older events.
If your usage does not require a live stream, you can use the currentEventsByTag
currentEventsByTag
query.
EventsBySlice and CurrentEventsBySlice
Query events for given entity type and slices. A slice is deterministically defined based on the persistence id. The purpose is to evenly distribute all persistence ids over the slices.
See EventsBySliceQuery
EventsBySliceQuery
and CurrentEventsBySliceQuery
CurrentEventsBySliceQuery
.
Materialized values of queries
Journals are able to provide additional information related to a query by exposing Materialized values, which are a feature of Streams that allows to expose additional values at stream materialization time.
More advanced query journals may use this technique to expose information about the character of the materialized stream, for example if it’s finite or infinite, strictly ordered or not ordered at all. The materialized value type is defined as the second type parameter of the returned Source
Source
, which allows journals to provide users with their specialised query object, as demonstrated in the sample below:
- Scala
-
source
final case class RichEvent(tags: Set[String], payload: Any) // a plugin can provide: case class QueryMetadata(deterministicOrder: Boolean, infinite: Boolean)
- Java
-
source
static class RichEvent { public final Set<String> tags; public final Object payload; public RichEvent(Set<String> tags, Object payload) { this.tags = tags; this.payload = payload; } } // a plugin can provide: static final class QueryMetadata { public final boolean deterministicOrder; public final boolean infinite; public QueryMetadata(boolean deterministicOrder, boolean infinite) { this.deterministicOrder = deterministicOrder; this.infinite = infinite; } }
- Scala
-
source
def byTagsWithMeta(tags: Set[String]): Source[RichEvent, QueryMetadata] = {
- Java
-
source
public Source<RichEvent, QueryMetadata> byTagsWithMeta(Set<String> tags) {
- Scala
-
source
val query: Source[RichEvent, QueryMetadata] = readJournal.byTagsWithMeta(Set("red", "blue")) query .mapMaterializedValue { meta => println( s"The query is: " + s"ordered deterministically: ${meta.deterministicOrder}, " + s"infinite: ${meta.infinite}") } .map { event => println(s"Event payload: ${event.payload}") } .runWith(Sink.ignore)
- Java
-
source
Set<String> tags = new HashSet<String>(); tags.add("red"); tags.add("blue"); final Source<RichEvent, QueryMetadata> events = readJournal .byTagsWithMeta(tags) .mapMaterializedValue( meta -> { System.out.println( "The query is: " + "ordered deterministically: " + meta.deterministicOrder + " " + "infinite: " + meta.infinite); return meta; }); events .map( event -> { System.out.println("Event payload: " + event.payload); return event.payload; }) .runWith(Sink.ignore(), system);
Performance and denormalization
When building systems using Event Sourcing and CQRS (Command & Query Responsibility Segregation) techniques it is tremendously important to realise that the write-side has completely different needs from the read-side, and separating those concerns into datastores that are optimised for either side makes it possible to offer the best experience for the write and read sides independently.
For example, in a bidding system it is important to “take the write” and respond to the bidder that we have accepted the bid as soon as possible, which means that write-throughput is of highest importance for the write-side – often this means that data stores which are able to scale to accommodate these requirements have a less expressive query side.
On the other hand the same application may have some complex statistics view or we may have analysts working with the data to figure out best bidding strategies and trends – this often requires some kind of expressive query capabilities like for example SQL or writing Spark jobs to analyse the data. Therefore the data stored in the write-side needs to be projected into the other read-optimised datastore.
When referring to Materialized Views in Pekko Persistence think of it as “some persistent storage of the result of a Query”. In other words, it means that the view is created once, in order to be afterwards queried multiple times, as in this format it may be more efficient or interesting to query it (instead of the source events directly).
Materialize view to Reactive Streams compatible datastore
If the read datastore exposes a Reactive Streams interface then implementing a simple projection is as simple as, using the read-journal and feeding it into the databases driver interface, for example like so:
- Scala
-
source
implicit val system: ActorSystem = ActorSystem() val readJournal = PersistenceQuery(system).readJournalFor[MyScaladslReadJournal](JournalId) val dbBatchWriter: Subscriber[immutable.Seq[Any]] = ReactiveStreamsCompatibleDBDriver.batchWriter // Using an example (Reactive Streams) Database driver readJournal .eventsByPersistenceId("user-1337", fromSequenceNr = 0L, toSequenceNr = Long.MaxValue) .map(envelope => envelope.event) .map(convertToReadSideTypes) // convert to datatype .grouped(20) // batch inserts into groups of 20 .runWith(Sink.fromSubscriber(dbBatchWriter)) // write batches to read-side database
- Java
-
source
final ReactiveStreamsCompatibleDBDriver driver = new ReactiveStreamsCompatibleDBDriver(); final Subscriber<List<Object>> dbBatchWriter = driver.batchWriter(); // Using an example (Reactive Streams) Database driver readJournal .eventsByPersistenceId("user-1337", 0L, Long.MAX_VALUE) .map(envelope -> envelope.event()) .grouped(20) // batch inserts into groups of 20 .runWith(Sink.fromSubscriber(dbBatchWriter), system); // write batches to read-side database
Materialize view using mapAsync
If the target database does not provide a reactive streams Subscriber
that can perform writes, you may have to implement the write logic using plain functions or Actors instead.
In case your write logic is state-less and you need to convert the events from one data type to another before writing into the alternative datastore, then the projection will look like this:
- Scala
-
source
trait ExampleStore { def save(event: Any): Future[Unit] }
- Java
-
source
static class ExampleStore { CompletionStage<Void> save(Object any) { // ... } }
- Scala
-
source
val store: ExampleStore = ??? readJournal .eventsByTag("bid", NoOffset) .mapAsync(1) { e => store.save(e) } .runWith(Sink.ignore)
- Java
-
source
final ExampleStore store = new ExampleStore(); readJournal .eventsByTag("bid", new Sequence(0L)) .mapAsync(1, store::save) .runWith(Sink.ignore(), system);
Resumable projections
Sometimes, you may need to use “resumable” projections, which will not start from the beginning of time each time when run. In such case, the sequence number (or offset
) of the processed event will be stored and used the next time this projection is started. This pattern is implemented in the Pekko Projections module.
Query plugins
Query plugins are various (mostly community driven) ReadJournal
ReadJournal
implementations for all kinds of available datastores.
This section aims to provide tips and guide plugin developers through implementing a custom query plugin. Most users will not need to implement journals themselves, except if targeting a not yet supported datastore.
Since different data stores provide different query capabilities journal plugins must extensively document their exposed semantics as well as handled query scenarios.
ReadJournal plugin API
A read journal plugin must implement pekko.query.ReadJournalProvider
pekko.query.ReadJournalProvider
which creates instances of pekko.persistence.query.scaladsl.ReadJournal
and persistence.query.javadsl.ReadJournal
. The plugin must implement both the scaladsl
and the javadsl
traitsinterfaces because the pekko.stream.scaladsl.Source
and stream.javadsl.Source
are different types and even though those types can be converted to each other it is most convenient for the end user to get access to the Java or Scala Source
directly. As illustrated below one of the implementations can delegate to the other.
Below is a simple journal implementation:
- Scala
-
source
import org.apache.pekko class MyReadJournalProvider(system: ExtendedActorSystem, config: Config) extends ReadJournalProvider { private val readJournal: MyScaladslReadJournal = new MyScaladslReadJournal(system, config) override def scaladslReadJournal(): MyScaladslReadJournal = readJournal override def javadslReadJournal(): MyJavadslReadJournal = new MyJavadslReadJournal(readJournal) } class MyScaladslReadJournal(system: ExtendedActorSystem, config: Config) extends pekko.persistence.query.scaladsl.ReadJournal with pekko.persistence.query.scaladsl.EventsByTagQuery with pekko.persistence.query.scaladsl.EventsByPersistenceIdQuery with pekko.persistence.query.scaladsl.PersistenceIdsQuery with pekko.persistence.query.scaladsl.CurrentPersistenceIdsQuery { private val refreshInterval: FiniteDuration = config.getDuration("refresh-interval", MILLISECONDS).millis /** * You can use `NoOffset` to retrieve all events with a given tag or retrieve a subset of all * events by specifying a `Sequence` `offset`. The `offset` corresponds to an ordered sequence number for * the specific tag. Note that the corresponding offset of each event is provided in the * [[pekko.persistence.query.EventEnvelope]], which makes it possible to resume the * stream at a later point from a given offset. * * The `offset` is exclusive, i.e. the event with the exact same sequence number will not be included * in the returned stream. This means that you can use the offset that is returned in `EventEnvelope` * as the `offset` parameter in a subsequent query. */ override def eventsByTag(tag: String, offset: Offset): Source[EventEnvelope, NotUsed] = offset match { case Sequence(offsetValue) => Source.fromGraph(new MyEventsByTagSource(tag, offsetValue, refreshInterval)) case NoOffset => eventsByTag(tag, Sequence(0L)) // recursive case _ => throw new IllegalArgumentException("MyJournal does not support " + offset.getClass.getName + " offsets") } override def eventsByPersistenceId( persistenceId: String, fromSequenceNr: Long, toSequenceNr: Long): Source[EventEnvelope, NotUsed] = { // implement in a similar way as eventsByTag ??? } override def persistenceIds(): Source[String, NotUsed] = { // implement in a similar way as eventsByTag ??? } override def currentPersistenceIds(): Source[String, NotUsed] = { // implement in a similar way as eventsByTag ??? } // possibility to add more plugin specific queries def byTagsWithMeta(tags: Set[String]): Source[RichEvent, QueryMetadata] = { // implement in a similar way as eventsByTag ??? } } class MyJavadslReadJournal(scaladslReadJournal: MyScaladslReadJournal) extends pekko.persistence.query.javadsl.ReadJournal with pekko.persistence.query.javadsl.EventsByTagQuery with pekko.persistence.query.javadsl.EventsByPersistenceIdQuery with pekko.persistence.query.javadsl.PersistenceIdsQuery with pekko.persistence.query.javadsl.CurrentPersistenceIdsQuery { override def eventsByTag(tag: String, offset: Offset = Sequence(0L)): javadsl.Source[EventEnvelope, NotUsed] = scaladslReadJournal.eventsByTag(tag, offset).asJava override def eventsByPersistenceId( persistenceId: String, fromSequenceNr: Long = 0L, toSequenceNr: Long = Long.MaxValue): javadsl.Source[EventEnvelope, NotUsed] = scaladslReadJournal.eventsByPersistenceId(persistenceId, fromSequenceNr, toSequenceNr).asJava override def persistenceIds(): javadsl.Source[String, NotUsed] = scaladslReadJournal.persistenceIds().asJava override def currentPersistenceIds(): javadsl.Source[String, NotUsed] = scaladslReadJournal.currentPersistenceIds().asJava // possibility to add more plugin specific queries def byTagsWithMeta(tags: java.util.Set[String]): javadsl.Source[RichEvent, QueryMetadata] = { import pekko.util.ccompat.JavaConverters._ scaladslReadJournal.byTagsWithMeta(tags.asScala.toSet).asJava } }
- Java
-
source
static class MyReadJournalProvider implements ReadJournalProvider { private final MyJavadslReadJournal javadslReadJournal; public MyReadJournalProvider(ExtendedActorSystem system, Config config) { this.javadslReadJournal = new MyJavadslReadJournal(system, config); } @Override public MyScaladslReadJournal scaladslReadJournal() { return new MyScaladslReadJournal(javadslReadJournal); } @Override public MyJavadslReadJournal javadslReadJournal() { return this.javadslReadJournal; } } static class MyJavadslReadJournal implements org.apache.pekko.persistence.query.javadsl.ReadJournal, org.apache.pekko.persistence.query.javadsl.EventsByTagQuery, org.apache.pekko.persistence.query.javadsl.EventsByPersistenceIdQuery, org.apache.pekko.persistence.query.javadsl.PersistenceIdsQuery, org.apache.pekko.persistence.query.javadsl.CurrentPersistenceIdsQuery { private final Duration refreshInterval; private Connection conn; public MyJavadslReadJournal(ExtendedActorSystem system, Config config) { refreshInterval = config.getDuration("refresh-interval"); } /** * You can use `NoOffset` to retrieve all events with a given tag or retrieve a subset of all * events by specifying a `Sequence` `offset`. The `offset` corresponds to an ordered sequence * number for the specific tag. Note that the corresponding offset of each event is provided in * the [[pekko.persistence.query.EventEnvelope]], which makes it possible to resume the stream * at a later point from a given offset. * * <p>The `offset` is exclusive, i.e. the event with the exact same sequence number will not be * included in the returned stream. This means that you can use the offset that is returned in * `EventEnvelope` as the `offset` parameter in a subsequent query. */ @Override public Source<EventEnvelope, NotUsed> eventsByTag(String tag, Offset offset) { if (offset instanceof Sequence) { Sequence sequenceOffset = (Sequence) offset; return Source.fromGraph( new MyEventsByTagSource(conn, tag, sequenceOffset.value(), refreshInterval)); } else if (offset == NoOffset.getInstance()) return eventsByTag(tag, Offset.sequence(0L)); // recursive else throw new IllegalArgumentException( "MyJavadslReadJournal does not support " + offset.getClass().getName() + " offsets"); } @Override public Source<EventEnvelope, NotUsed> eventsByPersistenceId( String persistenceId, long fromSequenceNr, long toSequenceNr) { // implement in a similar way as eventsByTag throw new UnsupportedOperationException("Not implemented yet"); } @Override public Source<String, NotUsed> persistenceIds() { // implement in a similar way as eventsByTag throw new UnsupportedOperationException("Not implemented yet"); } @Override public Source<String, NotUsed> currentPersistenceIds() { // implement in a similar way as eventsByTag throw new UnsupportedOperationException("Not implemented yet"); } // possibility to add more plugin specific queries public Source<RichEvent, QueryMetadata> byTagsWithMeta(Set<String> tags) { // implement in a similar way as eventsByTag throw new UnsupportedOperationException("Not implemented yet"); } } static class MyScaladslReadJournal implements org.apache.pekko.persistence.query.scaladsl.ReadJournal, org.apache.pekko.persistence.query.scaladsl.EventsByTagQuery, org.apache.pekko.persistence.query.scaladsl.EventsByPersistenceIdQuery, org.apache.pekko.persistence.query.scaladsl.PersistenceIdsQuery, org.apache.pekko.persistence.query.scaladsl.CurrentPersistenceIdsQuery { private final MyJavadslReadJournal javadslReadJournal; public MyScaladslReadJournal(MyJavadslReadJournal javadslReadJournal) { this.javadslReadJournal = javadslReadJournal; } @Override public org.apache.pekko.stream.scaladsl.Source<EventEnvelope, NotUsed> eventsByTag( String tag, org.apache.pekko.persistence.query.Offset offset) { return javadslReadJournal.eventsByTag(tag, offset).asScala(); } @Override public org.apache.pekko.stream.scaladsl.Source<EventEnvelope, NotUsed> eventsByPersistenceId( String persistenceId, long fromSequenceNr, long toSequenceNr) { return javadslReadJournal .eventsByPersistenceId(persistenceId, fromSequenceNr, toSequenceNr) .asScala(); } @Override public org.apache.pekko.stream.scaladsl.Source<String, NotUsed> persistenceIds() { return javadslReadJournal.persistenceIds().asScala(); } @Override public org.apache.pekko.stream.scaladsl.Source<String, NotUsed> currentPersistenceIds() { return javadslReadJournal.currentPersistenceIds().asScala(); } // possibility to add more plugin specific queries public org.apache.pekko.stream.scaladsl.Source<RichEvent, QueryMetadata> byTagsWithMeta( scala.collection.Set<String> tags) { Set<String> jTags = scala.collection.JavaConverters.setAsJavaSetConverter(tags).asJava(); return javadslReadJournal.byTagsWithMeta(jTags).asScala(); } }
And the eventsByTag
eventsByTag
could be backed by a GraphStage for example:
- Scala
-
source
class MyEventsByTagSource(tag: String, offset: Long, refreshInterval: FiniteDuration) extends GraphStage[SourceShape[EventEnvelope]] { private case object Continue val out: Outlet[EventEnvelope] = Outlet("MyEventByTagSource.out") override def shape: SourceShape[EventEnvelope] = SourceShape(out) override protected def initialAttributes: Attributes = Attributes(ActorAttributes.IODispatcher) override def createLogic(inheritedAttributes: Attributes): GraphStageLogic = new TimerGraphStageLogic(shape) with OutHandler { lazy val system = materializer.system private val Limit = 1000 private val connection: java.sql.Connection = ??? private var currentOffset = offset private var buf = Vector.empty[EventEnvelope] private val serialization = SerializationExtension(system) override def preStart(): Unit = { scheduleWithFixedDelay(Continue, refreshInterval, refreshInterval) } override def onPull(): Unit = { query() tryPush() } override def onDownstreamFinish(cause: Throwable): Unit = { // close connection if responsible for doing so } private def query(): Unit = { if (buf.isEmpty) { try { buf = Select.run(tag, currentOffset, Limit) } catch { case NonFatal(e) => failStage(e) } } } private def tryPush(): Unit = { if (buf.nonEmpty && isAvailable(out)) { push(out, buf.head) buf = buf.tail } } override protected def onTimer(timerKey: Any): Unit = timerKey match { case Continue => query() tryPush() } object Select { private def statement() = connection.prepareStatement(""" SELECT id, persistence_id, seq_nr, serializer_id, serializer_manifest, payload FROM journal WHERE tag = ? AND id > ? ORDER BY id LIMIT ? """) def run(tag: String, from: Long, limit: Int): Vector[EventEnvelope] = { val s = statement() try { s.setString(1, tag) s.setLong(2, from) s.setLong(3, limit) val rs = s.executeQuery() val b = Vector.newBuilder[EventEnvelope] while (rs.next()) { val deserialized = serialization .deserialize(rs.getBytes("payload"), rs.getInt("serializer_id"), rs.getString("serializer_manifest")) .get currentOffset = rs.getLong("id") b += EventEnvelope( Offset.sequence(currentOffset), rs.getString("persistence_id"), rs.getLong("seq_nr"), deserialized, System.currentTimeMillis()) } b.result() } finally s.close() } } } }
- Java
-
source
public class MyEventsByTagSource extends GraphStage<SourceShape<EventEnvelope>> { public Outlet<EventEnvelope> out = Outlet.create("MyEventByTagSource.out"); private static final String QUERY = "SELECT id, persistence_id, seq_nr, serializer_id, serializer_manifest, payload " + "FROM journal WHERE tag = ? AND id > ? " + "ORDER BY id LIMIT ?"; enum Continue { INSTANCE; } private static final int LIMIT = 1000; private final Connection connection; private final String tag; private final long initialOffset; private final Duration refreshInterval; // assumes a shared connection, could also be a factory for creating connections/pool public MyEventsByTagSource( Connection connection, String tag, long initialOffset, Duration refreshInterval) { this.connection = connection; this.tag = tag; this.initialOffset = initialOffset; this.refreshInterval = refreshInterval; } @Override public Attributes initialAttributes() { return Attributes.apply(ActorAttributes.IODispatcher()); } @Override public SourceShape<EventEnvelope> shape() { return SourceShape.of(out); } @Override public GraphStageLogic createLogic(Attributes inheritedAttributes) { return new TimerGraphStageLogic(shape()) { private ActorSystem system = materializer().system(); private long currentOffset = initialOffset; private List<EventEnvelope> buf = new LinkedList<>(); private final Serialization serialization = SerializationExtension.get(system); @Override public void preStart() { scheduleWithFixedDelay(Continue.INSTANCE, refreshInterval, refreshInterval); } @Override public void onTimer(Object timerKey) { query(); deliver(); } private void deliver() { if (isAvailable(out) && !buf.isEmpty()) { push(out, buf.remove(0)); } } private void query() { if (buf.isEmpty()) { try (PreparedStatement s = connection.prepareStatement(QUERY)) { s.setString(1, tag); s.setLong(2, currentOffset); s.setLong(3, LIMIT); try (ResultSet rs = s.executeQuery()) { final List<EventEnvelope> res = new ArrayList<>(LIMIT); while (rs.next()) { Object deserialized = serialization .deserialize( rs.getBytes("payload"), rs.getInt("serializer_id"), rs.getString("serializer_manifest")) .get(); currentOffset = rs.getLong("id"); res.add( new EventEnvelope( Offset.sequence(currentOffset), rs.getString("persistence_id"), rs.getLong("seq_nr"), deserialized, System.currentTimeMillis())); } buf = res; } } catch (Exception e) { failStage(e); } } } { setHandler( out, new AbstractOutHandler() { @Override public void onPull() { query(); deliver(); } }); } }; } }
The ReadJournalProvider
ReadJournalProvider
class must have a constructor with one of these signatures:
- constructor with a
ExtendedActorSystem
ExtendedActorSystem
parameter, acom.typesafe.config.Config
parameter, and aString
parameter for the config path - constructor with a
ExtendedActorSystem
parameter, and acom.typesafe.config.Config
parameter - constructor with one
ExtendedActorSystem
parameter - constructor without parameters
The plugin section of the actor system’s config will be passed in the config constructor parameter. The config path of the plugin is passed in the String
parameter.
If the underlying datastore only supports queries that are completed when they reach the end of the “result set”, the journal has to submit new queries after a while in order to support “infinite” event streams that include events stored after the initial query has completed. It is recommended that the plugin use a configuration property named refresh-interval
for defining such a refresh interval.
Scaling out
In a use case where the number of events are very high, the work needed for each event is high or where resilience is important so that if a node crashes the persistent queries are quickly started on a new node and can resume operations Cluster Sharding together with event tagging is an excellent fit to shard events over a cluster.