2012-05-30

NoSQL Matters 2012

A new conference about NoSQL took place in Cologne this year in May - it's called NoSQL Matters, consisted of three parallel tracks and lasted for two days. I attended the following talks.
On Tuesday:
  • Scalable NoSQL - Past, Present and Future by Doug Judd: This keynote started with showing the changes in hardware, software development, IT applications, research and corporate leaderhip over the last decades comparing the pre-internet era with the current internet era. Moving on to NoSQL databases three categories for the different database products were introduced: auto-sharding, distributed hashing and data consistency. The talk ended with an outlook on future and present evolutions focussing on disk drive technology, networking and application trends.
  • NoSQL - A Technology for Real Time Enterprise Applications? by Dirk Bartels: It's been about big data processing in general - not going into depth so much, but showing lots of market analysts' comparisons and mentioning basic theories like the CAP theorem. Also, some differences in requirements concerning NoSQL databases for old-style enterpríses and web enterprises were shown. He finished with application examples from their customers.
  • Designing for Concurrency with Riak by Mathias Meyer: This talk held by the author of the Riak Handbook dealt with data consistency and concurrent writes. Having started with per-document changelogs and vector clocks, some more Riak-specific features like secondary indexes were introduced - besides some distributed data structures in general like g-counters.
  • Hypertable - The Storage Infrastructure behind one of the World's Largest Email Services by Doug Judd: Hypertable was built following Google's BigTable architecture and focuses on horizontal scalability. It uses sparse tables and column families.
  • From Tables to Graph. Recommendation Systems, a Graph Database Use Case Analysis by Pere Urbón-Bayes: This talk gave a short introduction into recommendation systems (detecting similarities, measuring the "distance" between items based on their properties etc.), what's the relation to graph processing and why graph databases may help. The following graph databases, graph processing frameworks and APIs were shortly introduced: Neo4j, OrientDB, Apache Giraph, Signal/Collect, Blueprints API.
  • Welcome to Redis 2.6 by Salvatore Sanfilippo: The main author of Redis gave an overview of the new features introduced in release 2.6: scripting (uses Lua, scripts are atomic and are run within the server), more bit operations, millisecond key expiration, increments by floating-point numbers, serialization of values (dump and restore), AppendOnlyFile improvements, improvements for small sets, hashes etc.
On Wednesday:
  • NoSQL Adoption - What's the Next Step? by Luca Garulli: It's been an entertaining keynote about database history (starting with stone tablets and papyrus ...) and the three rules of NoSQL - one is: "If you only have a hammer, everything looks like a nail." ;-). He also mentioned a criteria catalogue for choosing the right database product and showed some future developments and risks for NoSQL databases.
  • NoSQL - Not Only a Fairy Tale by Timo Derstappen and Sebastian Cohnen: This talk showed the evolution of an ad server's persistence layer (from Amazon S3 to CouchDB back to Amazon S3 with Redis as caching layer) and the lessons learned. In their scenario CouchDB didn't scale as needed due to replication and compaction overhead - CouchDB's strengths like multi-master replication, MVCC and append-only storage weren't needed. Redis' performance was impressive to them.
  • The No-Marketing Bullshit Introduction to Couchbase Server 2.0 by Jan Lehnardt: This talk was about Couchbase, which offers auto-sharding by introducing so-called vBuckets (which reminds me of consistent hashing), automatic failover, a Memcached-compatible API and SDKs for lots of programming languages. Release 2.0 also offers incremental MapReduce, replication across datacenters etc. He also presented a live demo.
  • Apache Cassandra: Real-World Scalability, Today by Jonathan Ellis: This talk started with an overview of Cassandra's high availability features (no single point of failure, multi-master and multi-datacenter awareness etc.). Then he introduced more details about partitioning and replication based on consistent hashing (taking just the primary key into account) and performance features like the log-structured storage engine, row-level isolation and builtin compression. After that he presented lots of examples of Cassandra use cases.
  • NoNoSQL@Google by Olaf Bachmann: This talk introduced, how Google's Ads Traffic Quality Team makes heavy use of big data analysis. Basically, data is stored as Protocol Buffers. There is heavy use of MapReduce, but it's hard to write, maintain and debug. You can countervail this by using Sawzall, but that makes MapReduce inflexible. So, the next try was Dremel, but that has limitations concerning intermediate and output data size. After that SqlMR (SQL on top of MapReduce) was given a try, but it lacks of interactivity and it's hard to debug. So, at Google SQL is still the data analysis language of choice, although there are different dialects of it.
  • Theoretical Aspects of Distributed Systems, Playfully Illustrated by Pavlo Baron: This entertaining talk showed several problems and possible solutions in distributed systems with audience interaction: time synchronization, vector clocks, re-hashig, consistent hashing, gossip architecture, hinted handoff, quorum, master election, failure detection, partition tolerance etc.
The baristas have a break ...

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.