2015-07-16

First Meetup in Hamburg

Today was a meetup of the Big Data & NoSQL Hamburg group. The two talks were about Apache Flink, a quite new stream processing system. We've learned about the key features of Flink and got a live demo of its failover and recovery abilities. My takeaways:

  • It offers different APIs (for batch and stream processing).
  • It can run in different modes incl. a cluster mode on top of YARN.
  • Although running in a JVM it implements a memory management on its own to mitigate OutOfMemory failures by serializing objects to disk when heap space gets terse.
  • It uses the Chandy-Lamport algorithm for its fault tolerance. Therefore, checkpoints based on time criteria are inserted into the stream to have a well-defined recovery point - in contrast, Apache Spark uses a size criteria to create its mini batches. It is planned to have more control over the segmentation criteria in the future.
  • At least the Scala examples look quite close to the ones I know for Apache Spark.
  • It still lacks high availability for its job manager (but it's on the roadmap).
The meetup was hosted by Smaato which also offered food and drinks ... plus, their office offers a nice view over Hamburg:

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.