2012-01-15

Some MongoDB "hackin'"

I recently wanted to know, which databases and collections exist in a running MongoDB. That's what I came up with:
var mongodb_conn = db.getMongo();
var admin_db = mongodb_conn.getDB("admin");


var role = "(MASTER)";
if (!admin_db.isMaster().ismaster) {
  mongodb_conn.setSlaveOk(true);
  role = "(SLAVE)";
}
var db_list = admin_db.runCommand("listDatabases").databases;
for (var db_index in db_list)  {
  var curr_db_name = db_list[db_index].name;
  var curr_coll_list = admin_db.getSiblingDB(curr_db_name).getCollectionNames();
  print(role + " " + curr_db_name + " " + curr_coll_list);
}
Just save the above lines of Javascript to a file, e.g. getCollections.js and run it against the MongoDB, which may produce an output similar to the following:

$ mongo --quiet getCollections.js
(MASTER) map_reduce_example myresults,system.indexes,things
(MASTER) test_database posts,system.indexes
(MASTER) graylog2 alerted_streams,blacklists,filtered_terms,historic_server_values,hosts,jobs,message_counts,messagecomments,messages,server_values,settings,streamcategories,streams,system.indexes,system.users,users
(MASTER) admin
(MASTER) local
(MASTER) test

Having got so far I wanted to know the (top level) properties of all documents stored in the MongoDB. As all documents are schema-less, I ended up in using MapReduce:
var mongodb_conn = db.getMongo();
var admin_db = mongodb_conn.getDB("admin");

var role = "(MASTER)";
if (!admin_db.isMaster().ismaster) {
  mongodb_conn.setSlaveOk(true);
  role = "(SLAVE)";
}
var db_list = admin_db.runCommand("listDatabases").databases;
for (var db_index in db_list) {
  var curr_db_name = db_list[db_index].name;
  var curr_db = admin_db.getSiblingDB(curr_db_name);
  var curr_coll_list = curr_db.getCollectionNames();
  for (var coll_idx in curr_coll_list) {
    var mapred = curr_db.runCommand({"mapreduce" : curr_coll_list[coll_idx], "map" : function() { for (var key in this) { emit(key, null); } }, "reduce" : function(key, anything) { return null; }, "out" : "myoutput" });
    var key_list = curr_db.myoutput.distinct("_id");
    for (var key_idx in key_list) {
      print(role + " " + curr_db_name + " " + curr_coll_list[coll_idx] + " " + key_list[key_idx]);
    }
    curr_db.myoutput.drop();
  }
}
So, in the map phase all documents of all collections in all databases are looked at and all top level properties of those documents are emitted as keys. In the reduce phase no further aggregations are performed, as we already got what we want - a result collection with all top level properties.
Again, save that code to a file, e.g. getAllKeys.js and run it against the MongoDB, which may produce an output similar to the following:
$ mongo --quiet getAllKeys.js
...
(MASTER) test_database posts date
(MASTER) test_database posts tags
(MASTER) test_database posts text
(MASTER) graylog2 historic_server_values _id
(MASTER) graylog2 historic_server_values created_at
(MASTER) graylog2 historic_server_values type
(MASTER) graylog2 historic_server_values value
(MASTER) graylog2 hosts _id
(MASTER) graylog2 hosts host
(MASTER) graylog2 hosts message_count
(MASTER) graylog2 message_counts _id
(MASTER) graylog2 message_counts hosts
(MASTER) graylog2 message_counts streams
(MASTER) graylog2 message_counts timestamp
(MASTER) graylog2 message_counts total
(MASTER) graylog2 messages _id
(MASTER) graylog2 messages _original_level
(MASTER) graylog2 messages _thread_name
(MASTER) graylog2 messages created_at
...
It's not optimized so far, but it does what I needed. The result is somehow a simulation of what you get on relational databases when running select ... from all_tab_columns ... (Oracle) or select ... from information_schema.columns ... (MySQL).

No comments:

Post a Comment