How to set up your own MongoDB sharded cluster for development in one host


In my current job I am using MongoDB as I need to deal with high-volumes of generated data and this NoSQL database is able to scale in a straight-forward and automatic way.

I will probably be covering different things related to this database engine in my next posts that I encountered while dealing while working with this.

In this post I will start with a basic thing: how to set up a MongoDB sharded cluster in your local machine.

If you don’t know what a MongoDB sharded cluster is or what is it purpose I highly recommend reading the sharding section of the MongoDB documentation.

Components of a sharded cluster

Every sharded cluster has three main components:

  • Shards: This are the actual places where the data is stored. Each of the shards can be a mongod instance or a replica set.
  • Config Servers: The config server has the metadata about the cluster. It is in charge of keeping track of which shard has each piece of data.
  • Query Routers: The query routers are the point of interaction between the clients and the shard. The query servers use information from the config servers to retrieve the data from the shards.

For development purposes I am going to use three mongod instances as shards, exactly one mongod instance as config server and one mongos instance to be a query router.

For now I am not going to setup replica-sets for the shards, I am going to leave that for a future post.

It is important to remember that due to mongo restrictions the number of mongo config servers needs to be either one or three. In a production environment you need to use three to guarantee redundancy but for a development environment with one will be enough.

Setting up the sharded cluster

Folder structure

We are going to store the whole cluster inside a folder, this way it is easier to manage the cluster when needed.

For that we create the following folder structure in some location, in my case I am going to use the / directory:

  • /mongocluster/
  • /mongocluster/mongod1/
  • /mongocluster/mongod1/logs/
  • /mongocluster/mongod1/data/
  • /mongocluster/mongod2/
  • /mongocluster/mongod2/logs/
  • /mongocluster/mongod2/data/
  • /mongocluster/mongod3/
  • /mongocluster/mongod3/logs/
  • /mongocluster/mongod3/data/
  • /mongocluster/mongoc/
  • /mongocluster/mongoc/logs/
  • /mongocluster/mongoc/data/
  • /mongocluster/mongos/
  • /mongocluster/mongos/logs/
  • /mongocluster/mongos/data/

The mongod_s folders will be used for the shards, _mongoc for the config server and mongos for the query router.

Configuration Files

Once the folder structure has been created, we proceed to create the configuration files for each of the processes.

We are going to use YAML configuration files, the new version of MongoDB uses this type of configuration file.

If you intend to use a version of MongoDB before 2.6 you will need to go to MongoDB’s documentation to see how to translate the config files to the old config file format.

The configuration files I am going to give are the most basic ones to have the cluster up and running. If you need authentication or SSL you can add these to the configuration.

Shards configuration

For each shard we are going to use the following configuration template:

  destination: file
  path: "/mongocluster/mongodN/logs/mongodN.log"
  logAppend: true
  pidFilePath: "/mongocluster/mongodN/mongodN.pid"
  fork: true
  dbPath: "/mongocluster/mongodN/data"
  directoryPerDB: true
  clusterRole: shardsvr
  mode: all

We are going to create a mongodN.conf inside each of the mongodN folders, replacing N for the corresponding number of shard. Also it is important to set a different port to each of the shards, of course these ports have to be available in the host.

For example, for the /mongocluster/mongod1/mongod1.conf we can have this:

  destination: file
  path: "/mongocluster/mongod1/logs/mongod1.log"
  logAppend: true
  pidFilePath: "/mongocluster/mongod1/mongod1.pid"
  fork: true
  port: 47018
  dbPath: "/mongocluster/mongod1/data"
  directoryPerDB: true
  clusterRole: shardsvr
  mode: all

The important things to notice here are:

  • That dbPath under the storage section is pointing to the correct place, otherwise you might have issues with the files mongod creates for normal operation if two of the shards point to the same data directory.
  • The sharding.clusterRole is the essential part of this configuration, it is the one that indicates that the mongod instance is part of a sharded cluster and that its role is to be a data shard.

Config server

The configuration file for the server is identical to the shards configuration except for the key difference that in the sharding.clusterRole we need to set up configsvr as the value.

Here is my configuration file for the server, the /mongocluster/mongoc/mongoc.conf file:

  destination: file
  path: "/mongocluster/mongoc/logs/mongoc.log"
  logAppend: true
  pidFilePath: "/mongocluster/mongoc/mongoc.pid"
  fork: true
  port: 47019
  dbPath: "/mongocluster/mongoc/data"
  directoryPerDB: true
  clusterRole: configsvr
  mode: "all"

Query router (Mongos)

The configuration of the query router is pretty simple. The important part in it, is the sharding.configDB value. The value needs to be a string containing the configuration server’s location in the form of <host>:<port>. If you have a 3-config server cluster you need to put the location of the three configuration servers separated by commas in the string.

Important: if you have more than one query router, make sure you use exactly the same string for the sharding.configDB in every query router.

This is the configuration file for the query router, which we’ll locate at /mongocluster/mongos/mongos.conf:

  destination: file
  path: "/mongocluster/mongos/logs/mongos.log"
  logAppend: true
  pidFilePath: "/mongocluster/mongos/mongos.pid"
  fork: true
  port: 47017
  configDB: "localhost:47019"

Running the sharded cluster

Once the folder structure and the files have been created, we are ready to start all of its components.

Starting the components

The order in which the components should be started is the following:

  1. Shards
  2. Config servers
  3. Query routers

Launching each of the elements is trivial. For each of the shards and config servers we need to launch a mongod process with the corresponding configuration file. Like this: bash mongod --config <path_to_config>

For the query server case, we need to launch a mongos instance with the configuration for the query router:

mongos -f <path_to_config>

We can create a simple bash script that will launch all the required instances. I call it start-mongo-cluster.sh and it has the following content:


#Start the mongod shard instances
mongod --config /mongocluster/mongod1/mongod1.conf
mongod --config /mongocluster/mongod2/mongod2.conf
mongod --config /mongocluster/mongod3/mongod3.conf

#Start the mongod config server instance
mongod --config /mongocluster/mongoc/mongoc.conf

#Start the mongos
mongos -f /mongocluster/mongos/mongos.conf

Stopping the components

To stop the components we just need to stop the started instances.

For that we are going to use the kill command. In order to use it, we need the PIDs of each of the processes. For that reason, we added the processManagement.pidFile to the configuration files of the components: the instances are going to store their PIDs in the those files, making it easy to get the PID of the process to kill when wanting to shutdown the cluster.

The following script shuts down each of the processes in case the PID file exists:


#Stop mongos
if [ -e $PID_MONGOS_FILE ]; then
    kill $PID_MONGOS

#Stop mongo config
if [ -e $PID_MONGOC_FILE ]; then
    kill $PID_MONGOC

#Stop mongod shard instances
if [ -e $PID_MONGOD1_FILE ]; then
    kill $PID_MONGOD1

if [ -e $PID_MONGOD2_FILE ]; then
    kill $PID_MONGOD2

if [ -e $PID_MONGOD3_FILE ]; then
    kill $PID_MONGOD3

Before using the sharded cluster

So, now we have the sharded cluster almost ready to be used. We can start it and stop it, but the configuration server has no idea of the existing shards.

What we need to do is setup the shards we created in the configuration server. In order to do that we need to connect to the cluster using the mongo client against the query server, like this:

$ mongo localhost:47017

Once we are connected we need to issue the following commands to add the shards to the cluster:

mongos> sh.addShard("localhost:47018")
mongos> sh.addShard("localhost:48018")
mongos> sh.addShard("localhost:49018")

And we’re ready to go!


Validating a SSL certificate in Python


I’m working in porting the rabbit-vs to Python 3 while documenting it in an appropriate manner and doing quite a lot of code refactoring. Right now I’m in the stage of porting the plugins and I decided to take a look again at the techniques used in them.

In the previous version of the SSL certificate validation plugin I used to use M2Crypto library but there’s no port to Py3k of that. So I had to look for another technique, after reading a while I finally decided to use PyOpenSSL.

What are the advantages of using PyOpenSSL?

  • Works in Python 3
  • Works in Linux and Windows
  • Based on OpenSSL which is present almost in every system.

What is it going to be checked?

Basically what is going to be checked is whether the certificate’s signature is valid, the correctness of its format, if it’s valid in time, if it is for the server we are accessing and, optionally, if the certificate is trusted. Other things to be checked are the _size _of its public key and if the signature algorithm used is strong enough.


Creating the SSL Context.

First of all, it is necessary to create an SSL Context, the context is the object that will let us create the SSL Layer on top of a socket in order to get an SSL Connection. The purpose of this context is to indicate the type of SSL we want the connection to be, the verification mode that is going to be used and where to look for the root certificates in case we want to check the trustworthiness of the certificate.

The code to create a SSL.Context object is:

from OpenSSL import SSL

context = SSL.Context(SSL.TLSv1_METHOD) # Use TLS Method
context.set_options(SSL.OP_NO_SSLv2) # Don't accept SSLv2
context.set_verify(SSL.VERIFY_NONE, callback)
context.load_verify_locations(ca_file, ca_path)

In the first line we create the object, in that moment we have to indicate which version of SSL the Context will handle. In this case I want to use TLSv1. After that we set the option OP_NO_SSLv2, this is in order to not establish SSLv2 connections, which are really insecure. The third line of code sets the verification mode and the callback function to call when verifying, I’ll go deeper into this afterwards. The last line of code sets two things that are fundamental if we want to validate if a certificate is trustworthy or not. The first parameter is the location of a file whose content must be a list of trusted/root certificates encoded in PEM and the second parameter is the path to a folder that contains trusted/root certificates. The ones that are loaded from there are the ones that are going to be used when checking the certificate’s trustworthiness.

Creating an SSL Connection

This basically consists of creating a socket and wrapping it with an SSL Context. In that way we create an SSL Connection which can connect to SSL services and do the corresponding handshake. The following is the Python code to do that:

from socket import socket

sock = socket()
ssl_sock = SSL.Connection(context, sock)
ssl_sock.connect((ip_addr, port))

Verification routine

When the do_handshake() method is called, the SSL initialization is executed and if the verification method is set (using the set_verify() method) it is performed. The callback function will get called for each of the certificates in the certificate chain that is being validated, it receives five arguments:

  1. SSL.Connection object that triggered the verification.
  2. OpenSSL.crypto.X509 the certificate being validated.
  3. An integer containing the error number (0 in case no error) of the error detected. You can find their meaning in the OpenSSL documentation.
  4. An integer indicating the depth of the certificate being validated. If it is 0 then it means it is the given certificate is the one being validated, in other case is one of the chain of certificates.
  5. An integer that indicates whether the validation of the certificate currently being validated (the one in the second argument) passed or not the validation. A value of 1 is a successful validation and 0 an unsuccessful one.

The callback function must return a boolean value indicating the result of the verification, it must return True for a successful verification and False otherwise.

In this callback function you can do as you want. In the rabbit’s plugin case I decided to take into account some of the errors, I could ignore trust errors when they are not needed and I decided to raise an Exception when a certificate was not valid.

For example, if one is only interested in checking whether the certificate at depth 0 is time valid and no other error is contemplated a possible callback function would be:

def callback_function(conn, cert, errno, depth, result):
    if depth == 0 and (errno == 9 or errno == 10):
        return False # or raise Exception("Certificate not yet valid or expired")
    return True

The behavior of what happens if a callback functions returns False depends on the verification method set: if SSL.VERIFY_NONE was used then the verification chain is not followed but if SSL.VERIFY_PEER was used then a callback function returning False will raise an OpenSSL.SSL.Error exception.

Hashing algorithm used to sign the certificate and public key size

To access the information of the certificate first we need to get it. In PyOpenSSL certificates are modeled as OpenSSL.crypto.X509 objects. To grab the certificate from a connection all it has to be done is call the get_peer_certificate() method of the SSL.Connection object.

Once we have the certificate object we can retrieve its public key (OpenSSL.crypto.PKey object) using the get_pubkey() method and its size by calling the bits() method on the returned object.

To retrieve the hashing algorithm used, the method to call is get_signature_algorithm() on the certificate object.

Verifying the host matches the common name on the certificate

The first thing to do is to get the common name from the certificate. This information is located inside a X509Name object corresponding to the subject of the certificate. This object is obtained using the get_subject() method on the certificate we are analyzing. Once the X509Name object has been obtained the commonName attribute can be accessed to obtain the common name from the certificate.

The next step is to convert that common name to a regex, why is this necessary? Because a certificate can be issued for a whole domain or subdomain. For example a certificate issued for *.xxx.com is valid for www.xxx.com or mail.xxx.com. To do that we need to replace the dots for escaped dots and after that the wildcard for a wildcard in regex, which is the combination of the dot and the asterisk.

Once the regex is prepared then what has to be checked is whether the host name being tested matches the regex. In code:

import re
cert = ssl_sock.get_peer_certificate()
common_name = cert.get_subject().commonName.decode()
regex = common_name.replace('.', r'\.').replace('*',r'.*') + '$'
if re.matches(regex, host_name):

Project Euler problem 182 - Solved


The statement of the problem can be found here.

In this problem we are given two primes p and q that are used to generate an n for an RSA key-pair.

As it states to complete a key pair one must choose an exponent e in the range but for each e there will be a number of unconcealed messages, this means that .

The number of unconcealed messages for an exponent e in modulo N with is equal to

Knowing this it is pretty easy to write a code that finds the exponents that generate the fewer unconcealed messages and add them up. The python source code can be downloaded (problem182.py):

import gmpy

if __name__ == '__main__':
    p = 1009
    q = 3643
    n = p * q
    phi_n = n - p - q + 1
    result = 0
    min_res = 9999999999999
    for e in range(1, phi_n):
        if gmpy.gcd(e, phi_n) != 1:
        num_unconcealed = (gmpy.gcd(e-1, p-1) + 1) * (gmpy.gcd(e-1, q-1) + 1)
        if num_unconcealed < min_res:
            min_res = num_unconcealed
            result = e
        elif num_unconcealed == min_res:
            result += e
    print("The result is: {0}".format(result))