- It offers different APIs (for batch and stream processing).
- It can run in different modes incl. a cluster mode on top of YARN.
- Although running in a JVM it implements a memory management on its own to mitigate OutOfMemory failures by serializing objects to disk when heap space gets terse.
- It uses the Chandy-Lamport algorithm for its fault tolerance. Therefore, checkpoints based on time criteria are inserted into the stream to have a well-defined recovery point - in contrast, Apache Spark uses a size criteria to create its mini batches. It is planned to have more control over the segmentation criteria in the future.
- At least the Scala examples look quite close to the ones I know for Apache Spark.
- It still lacks high availability for its job manager (but it's on the roadmap).
The meetup was hosted by Smaato which also offered food and drinks ... plus, their office offers a nice view over Hamburg: