2020-04-11

Setting up a OpenVPN gateway using docker containers

Comments

Motivation

This is basically an evolution of my previous blog post “Setting up a VPN gateway in Ubuntu using LXC containers and OpenVPN”.

I had to eventually upgrade my home server and with that I had to re-install everything I had, including my VPN node setup.

This prompted me to re-think my approach. Docker has become the ubiquitous containerization engine and LXC has become something pretty obscure. I decided to look into how I could replicate the same results I had obtained using the LXC but using docker containers.

Also since then, I’ve been learning Ansible in order to maintain my personal infrastructure tidy and reproducible. I have been delaying upgrading my home server primarily because all the things I had installed and configured manually and didn’t want to have to redo all over again.

Objective

I set my objective to basically get the same functionality as I had in my previous node. That is, a node connected to an OpenVPN with services attached to it that are guaranteed that they exit through the VPN and those services are accessible only through my LAN. But this time using docker as my containerization technology.

A second objective, but as important as the first, is to be able to set all this up using Ansible in order to replicate it and update it in an easy fashion.

Shaping the Solution

The approach I took was to basically use a primary container, for the OpenVPN client node. Then the rest of the services will run in separate containers but sharing the network stack of the OpenVPN node.

There are two main reasons behind this approach: (1) it makes it easy for the other service containers to access the internet through the VPN and (2) if something goes wrong with the OpenVPN container, the rest of the nodes lose all connectivity thus reducing the risk of exposure.

It will also allow us to control the firewall in a single place to make sure we don’t allow any traffic but VPN and LAN traffic to the host.

All the setup will be driven using an Ansible role, this way it allows parametrization and it is easy to reproduce the steps until we get it right.

Implementation

OpenVPN Node

The first challenge was to find a an image for the OpenVPN client container. Most OpenVPN images are meant to work as servers but not clients.

Fortunately, I stumbled upon dperson’s OpenVPN Client docker image.

It had almost all the things I required. Apart from being an OpenVPN client, it comes with a way of setting up a restricted firewall but also it allows easy configuration for opening up both networks and ports. This is exactly what I needed to simplify everything.

There is only one thing I don’t like about it is that I cannot easily give it an OpenVPN configuration file along with the authentication file to use. Based on what I saw on the ENTRYPOINT script it might be possible to do by mounting those files in particular locations but seemed just easier to bypass this by adding my own launcher script.

The way I decided to approach this was to create a folder in the host where I would place the openvpn.conf I get from my VPN provider along with a login.info file that contains the credentials for the VPN (see auth-user-pass option in OpenVPN reference manual). Also in that directory I will add a bash script that will execute openvpn with some command line arguments I want to bypass the entrypoint’s launch method.

Then we’ll mount that directory inside our container and instruct the container to initialize and derive the launch of the openvpn to our script.

Doing all of the above using ansible is a fairly straight-forward. The playbook to do this looks something like this:


- name: Create base directory for OpenVPN node configs
  file:
    path: "{{ vpn_node__base_dir }}"
    owner: root
    group: root
    mode: '755'
    state: directory

- name: Create itemized directories
  file:
    path: "{{ vpn_node__base_dir }}/{{ item }}"
    owner: root
    group: root
    mode: '755'
    state: directory
  loop:
    - openvpn

- name: Copy OpenVPN configuration file
  copy:
    src: "{{ vpn_node__openvpn_config_file }}"
    dest: "{{ vpn_node__base_dir }}/openvpn/config.ovpn"
    owner: 'root'
    group: 'root'
    mode: '644'

- name: Generate login file
  template:
    src: files/login.info.j2
    dest: '{{ vpn_node__base_dir }}/openvpn/login.info'
    owner: 'root'
    group: 'root'
    mode: '600'

- name: Generate launch script
  template:
    src: files/openvpn-launch.j2
    dest: '{{ vpn_node__base_dir }}/openvpn/launch.sh'
    owner: 'root'
    group: 'root'
    mode: '700'

# Setup OpenVPN docker container
- name: Setup OpenVPN docker container
  docker_container:
    image: dperson/openvpn-client:{{ vpn_node__openvpn_container_version }}
    name: '{{ vpn_node__containers_base_name }}-openvpn'
    state: started
    restart_policy: unless-stopped
    networks_cli_compatible: yes
    networks: '{{ vpn_node__docker_network | default(omit) }}'
    capabilities:
      - NET_ADMIN
    devices:
      - /dev/net/tun
    mounts:
      - source: "{{ vpn_node__base_dir }}/openvpn"
        target: /vpn/
        type: bind
        read_only: no
    command:
      - '-f'
      - '{{ vpn_node__openvpn_port }}'
      - '-r'
      - '{{ vpn_node__lan_cidr }}'
      - /vpn/launch.sh

login.info.j2 template:

{{ vpn_node__openvpn_username }}
{{ vpn_node__openvpn_password }}

openvpn-launch.j2 template:

#!/bin/bash

iptables -A OUTPUT -p tcp -m tcp --dport {{ vpn_node__openvpn_port }} -j ACCEPT
iptables -A OUTPUT -p udp -m udp --dport {{ vpn_node__openvpn_port }} -j ACCEPT
ip6tables -A OUTPUT -p tcp -m tcp --dport {{ vpn_node__openvpn_port }} -j ACCEPT
ip6tables -A OUTPUT -p udp -m udp --dport {{ vpn_node__openvpn_port }} -j ACCEPT

exec openvpn --config /vpn/config.ovpn --auth-user-pass /vpn/login.info

The key things to note here are

  1. the fact that we need to make sure that we are giving the container the NET_ADMIN capability, to attach the /dev/net/tun device to it
  2. we are mounting the directory that was created with the files on the /vpn directory
  3. By passing those arguments in the command what we are doing is telling the entrypoint’s script to setup a restrictive firewall, to not route through the vpn the subnetwork vpn_node__lan_cidr and to use /vpn/launch.sh script rather than launching openvpn on its own.

Now let’s talk about the variables we are using:

  • vpn_node__base_dir: this is the directory where all the configuration and files related to our VPN gateway are going to be located. There’s going to be a subdirectory per service, openvpn being one of them
  • vpn_node__openvpn_config_file: location of the VPN’s configuration file from ansible’s perspective
  • vpn_node__openvpn_username: username to use to authenticate against the VPN server
  • vpn_node__openvpn_password: password to use to authenticate against the VPN server
  • vpn_node__openvpn_port: port where the VPN server is listening to. The default value is 1194
  • vpn_node__openvpn_container_version: the version of the image to use. Default latest
  • vpn_node__containers_base_name: prefix to use for all the containers that will be launched by this playbook. It is useful to identify all related containers
  • vpn_node__docker_network: change the network the container will be attached to. It is recommended to override this and not have it attached to the default bridge network. In order to override it the network to be used will need to be created before calling this role and use it
  • vpn_node__lan_cidr: this variable is used to configure the container so that packets destined to that network are not sent through the VPN. This should be setup to the LAN subnetwork you are using. E.g. 10.0.10.0/24

Once this is applied we’ll have a container that is connected to the VPN but there is not much use for it. Unless you connect to it and use it, there’s actually not much value provided by it. That takes us to the next step: setting up relevant services to make this VPN gateway useful.

SOCKS5 Proxy

The first and most important service to install for a VPN gateway is a SOCKS5 proxy that can be used from the LAN and provides access to the internet through the VPN.

I wanted to keep using dante as it has proven to work well in the time I’ve been using it. The search for the image was even harder than for the OpenVPN client one, but I managed to find a decent one: vimagick/dante. It is the simplicity I was looking for: the only thing it needs is to have the configuration file overriden by an appropriate mount.

A simple-yet-effective configuration file for this occasion would be something like this:

debug: 0
logoutput: stderr
internal: eth0 port = 1080
external: tun0
clientmethod: none
socksmethod: none
user.privileged: root
user.unprivileged: nobody

client pass {
    from: 0.0.0.0/0 port 1-65535 to: 0.0.0.0/0
    log: error
}

socks pass {
    from: 0.0.0.0/0 to: 0.0.0.0/0
    log: error
}

The key parts are the internal and external configurations. They need to match the interfaces you want it to use for listening to new connections and establish the proxied connections, respectively. tun0 is going to be the interface for the VPN and eth0 should be the bridge interface in the container. The rest of the configuration is pretty self explanatory, in my case I’m not applying any restrictions as this will only be reachable within my LAN and don’t want to bother to configure it in a more restrictive manner.

In order to integrate this into our VPN what we need to do is launch our container using the OpenVPN’s container network stack but we need to remember to (1) publish the port for the SOCKS5 proxy in the OpenVPN container and (2) configure the port-forwarding in the OpenVPN container. The reason of why we need to do the port publishing in the OpenVPN container and not in the danted container is that that is a network stack operation and danted’s container doesn’t have a network stack, it is using the OpenVPN’s one.

What needs to be done in the ansible playbook is something like:


- name: Create itemized directories
  ...
  loop:
    - openvpn
    - socks5-proxy

...

- name: Copy SOCKS5 proxy configuration file
  copy:
    src: "{{ vpn_node__socks5_proxy_config_file }}"
    dest: "{{ vpn_node__base_dir }}/socks5-proxy/sockd.conf"
    owner: 'root'
    group: 'root'
    mode: '644'

# Setup OpenVPN docker container
- name: Setup OpenVPN docker container
  docker_container:
    published_ports:
      - '1080:1080/tcp'
    command:
      - '-f'
      - '{{ vpn_node__openvpn_port }}'
      - '-r'
      - '{{ vpn_node__lan_cidr }}'
      - '-p'
      - '1080'
      - /vpn/launch.sh

- name: Setup SOCKS5 Proxy docker container
  docker_container:
    user: root
    image: vimagick/dante:latest
    name: '{{ vpn_node__containers_base_name }}-socks5'
    state: started
    restart_policy: unless-stopped
    networks_cli_compatible: yes
    network_mode: 'container:{{ vpn_node__containers_base_name }}-openvpn'
    mounts:
      - source: "{{ vpn_node__base_dir }}/socks5-proxy/sockd.conf"
        target: /etc/sockd.conf
        type: bind
        read_only: yes

Once this changes have been applied to the host, voila!, there’s a VPN gateway that can be used by any application through the SOCKS proxy.

In order to use it just point your SOCKS5 configuration to the address of the host running the docker containers port 1080. Remember that you need to be in the LAN specified by vpn_node__lan_cidr in order for you to have access.

The same methodology that I used for adding the SOCKS5 proxy can be used to add any other service. The only thing needed is launching a container with the desired service and using the OpenVPN’s container network stack. If the service provides a frontend or API that needs to be accessible from the LAN, the corresponding port needs to be both, published and forwarded in the OpenVPN container.

Conclusion

Using all this bits and pieces I built an ansible role that sets up a VPN node: the OpenVPN client container along with all the services I wanted to setup together. This allows me to regenerate everything from scratch in a really easy and convenient manner.

I haven’t made that ansible role public yet, because I don’t know how useful people will find it and I would need to put some more effort to make it more flexible than it currently is as I embedded my specific requirements in it.

2020-03-20

IDEX Staking Node Ansible Role

Comments

I decided to start contributing to start participating in some crypto projects. I found the IDEX and found it interesting enough. I liked the idea that external participants are able to collaborate with the market itself in some capacity. This, for the time being, is restricted to being a Tier 3 Staker; which basically keeps track the trading history and provides it to the IDEX user client.

The first thing to do in order to set it up is have the minimum number of IDEX tokens (as the time of writing that is 10k IDEX token) and having held them for a period of 7 days.

The pains of following the instructions

My initial attempt to setup the host was to follow the instructions in the IDEX Github.

I had some issues installing the @idexio/idexd-cli that after googling a bit was able to overcome. But once I got it working is when the biggest issues started happening.

The parity node that is bootstrapped through the docker-compose file ended up hanging at random times, making the staking node hang. They recommend using Infura in order to avoid this, which I’ve found it really helps.

The other thing that I’ve found is that, for some reason, the IDEX staking node would consume as 100% CPU even though in the logs it would be basically saying “waiting for new blocks”. I wanted to see if it was possible to limit the CPU so as not to be burning my resources, but couldn’t find a way.

In any case, the most painful part of all this is that everything is really manual and when I tried to automate it I’ve found a lot more issues. All the cli is not designed to be used automatically.

Dig in, keep the important stuff and discard the rest

Up until that moment I hadn’t looked into how the actual staker was setup I decided to see how the actual thing worked to see if it was possible to make it more automation friendly.

The first thing I found out was that all the CLI could be completely removed without losing any of the core functionality. The most important thing that the CLI does is to generate the settings.json that is then mounted in the staking node docker container.

I basically looked into CLI to see how it launched the containers and found that the key thing was this docker-compose.yml file.

...
services:
    parity:
        image: parity/parity:stable
        env_file: aurad_config.env
        volumes:
            - parity:/eth
        ...
    mysql:
        image: mysql:5.7
        env_file: aurad_config.env
        ...
    idexd:
        image: idexio/idexd:0.2.0
        depends_on:
          - "mysql"
        volumes:
          - type: bind
            source: ${HOME}/.idexd/downloads
            target: /usr/idexd/downloads
          - type: bind
            source: ${HOME}/.idexd/ipc
            target: /usr/idexd/ipc
        stop_signal: SIGINT
        stop_grace_period: 20s
        command: ["start", "pm2.config.js", "--no-daemon", "--only", "worker", "--kill-timeout", "5000"]
        ...
        ports:
            - "8080:8080"
            - "8443:8443"
        env_file: aurad_config.env
        environment:
          - RPC_HOST
          - RPC_PROTOCOL
          - RPC_PORT
          - STAKING_HOST
          - SSL_PRIVATE_KEY_PATH
          - SSL_CERT_PATH
        ...

I used this as my foundation to launch a docker container using the idexio/idexd image and passing the corresponding environmental variables, mounting appropriate volumes and using a separate instance of MySQL.

That worked well and would be fairly straight-forward to automate and started doing so in Ansible until I started finding that the staking node would hang and not resume. This would happen when my box would lose connectivity for some time; I was running my staking node in my home server and my router had been not working properly at the time. It would basically print some message saying that there had been a timeout or a health-check was missed and then stayed on that state forever.

Because of that is that I started looking into how the staking node server was being run. The more I looked into it, the more I realized that I could remove layers. Something that my experience in software engineering has taught me is that the more things you add to the project the more places for an error to happen are.

By looking at the Dockerfile and the command used in the docker-compose, I realized they were using pm2 to launch the staking node. I don’t have any experience with node.js but based on looking at the pm2 site, it is basically a process manager for node applications. In the case of this project it seems to be used as a watchdog to restart the staking server in case of failure. I would like to know why was this used when you can trust docker to restart the container automatically in case of failure. But this seemed to be the cause of my server getting stuck after losing connectivity for some time. So I decided to see how to remove this.

It didn’t take much effort to understand how pm2 was setup to launch the staking node server. Looking at the pm2.config.js it is obvious that it is just launching (using node, of course) the lib/index.js file relative to the container’s WORKDIR (/usr/idexd/). I just then replaced the entry-point of the container to use nodejs lib/indexjs. That solved my stability issues forever.

Now that I had the most basic and simple way of launching a staking node, I went for the setup automation.

Writing the Ansible role

I have recently started to manage my “personal infrastructure and services” using Ansible. Decision that was taken after I had to ditch a server and it took me forever to setup everything back again. So I decided that this was a great opportunity to write a role for it.

I already knew what I had to do and what was that I needed:

  • Setup a set of folders to mount on the container
  • Copy a given configuration file to one of those folders
  • Launch a docker container mounting the appropriate folders, setting up the correct environmental variables and publishing the right port.

Writing the role didn’t take long as it didn’t have to do any weird things.

Most of the things are assumed to have been already setup before (MySQL and access to an Ethereum API) and the role just takes variables in order to pass the container. One extra detail I added was to add a variable that allows control of CPU allocation; I added it being scared of running into high CPU usage but seems that this was also caused by using pm2.

You can find the salessandri.idex_staking_node in Galaxy.

I published it but there was still something that was bothering me. In order to set up the staking node, it still required generating a settings.json and the only way of doing so was by downloading the idex-cli and installing it was the most annoying part of the process. Apart that it pollutes your home directory in order to do so. I said to myself: “Challenge accepted, I’m gonna simplify this”.

Simplifying the settings generation

I dig into the config command provided by the idex-cli. The source code is located in the aurad-cli/src/commands/config.js file of the repo.

Even without being a javascript nor a nodejs ninja I was able to figure out what it does which basically boils down to:

  1. Send a GET request to https://sc.idex.market/wallet/<cold wallet address>/challenge and get what is contained message field of the JSON response.
  2. Ask the user to sign using the cold wallet account the message from step 1.
  3. Locally verify that the signature actually was generated by the cold wallet.
  4. Create a new Ethereum account.
  5. Send a POST request to the same URL as step 1 but passing the public address of the account generated in step 3 as well as the signature received from the user in step 2.
  6. Encrypt the newly created Ethereum account with a randomly generated 16-byte token.
  7. Generate the settings file containing:
    1. The cold wallet public address.
    2. The token used to encrypt the new account.
    3. The encrypted account.

Doing a python script that can be easily run in a virtual environment didn’t take long to emerge thanks to the usage of the requests and web3 libraries. I added it to the Ansible role to make it easier for people to deploy it. It still requires manual intervention as the signing needs to be done manually; I assumed no one would want to paste their private key in someone else’s code (I wouldn’t). But at least it doesn’t require to install the whole idex-cli and it is isolated.

I have to admit I haven’t actually tried the generated settings in a production environment as I already have the settings from before and don’t have any other wallet’s that could be used for staking.

There are a couple of question that I would love to be answered by someone with the knowledge.

  1. Does IDEX keep track of the challenge given per cold-wallet? Or can a random challenge be generated and signed rather than using step 1?
  2. What happens if step 5 is executed multiple times with different hot wallet addresses? It might be relevant based on the answer for 1.
2016-11-06

Database transaction handling in C++ systems

Comments

Concurrency (either by multithreading or asynchonicity) and database usage are present in every software system. Transactions are the mechanism given by relational databases to provide ACID properties to the execution of multiple statements.

In order to use them, most database systems expect transactions to take place in a single database connection by using explicit boundaries. This means that in order to perform X database actions as a single logical operation the system needs to: get a connection to the database, start a transaction on the connection, execute the actions over that connection and finish the connection’s ongoing transaction. The application can tell whether to finish the transaction by applying the changes or rollback all the performed changes and leave the DB in the exact same form as it was before starting the transaction. Also, the DBMS might forcefully finish abort the transaction due to conflicts generated by other transactions.

In most C++ systems I have worked on I have found that transactions are not handled in a nice way from the architectural point of view. I can basically classify the transaction management and handling in one of two options:

  • Database layer abstracted but not allowing transactions. This situation is represented by the architecture where there are multiple classes acting as repositories which interface with the database. But the abstraction handles the acquisition and release of the DB connection, thus not allowing to have a transaction that can cover multiple methods across one or more repositories. To give an example, there is a repository for the Account object with two methods: save and getById. Each of the methods acquires a connection performs the DB operation and then releases the connection. As each method acquires its own connection, there is no way to have a transaction across the two methods.
  • Database implementation polluting business logic. In this case, we have the business logic acquiring a connection and starting a transaction and passing the DB connection to other methods so that they can pass it to the DB layer. Now, you are able to use transactions in the system but the business logic needs to know the underlying layer in order to start the transaction and also all the functions that might be called need to take the DB connection object as an argument.

None of the aforementioned solutions is a good one. We would like to have the abstraction of the first scenario but with the possibility that the business logic can declare that a certain set of actions should be done within a transaction.

Solution Approach

Architecture

Let’s see how we can tackle this issue in a nice way. The following UML diagram shows an architectural approach to the transaction handling program that decouples all the components as much as possible. The decoupling is important because we want to be able to mock most of the things in order to successfully test our code.

Architecture of transaction management approach

Business logic

If the business logic requires actions to be run under the context of a single transaction, then it will need to know about the transaction manager.

As the management of transactions is very specific to each implementation, what the business logic actually knows and interacts with is an object that implements the TransactionManager interface. This interface will provide an easy way of executing actions within a single transaction context.

In order to interact with the database layer, the business logic will still need to have knowledge and interact with the model’s repositories. Of course, this is done through an interface and not a concrete type.

Entity repositories

The repositories don’t change at all. They use the same interface to retrieve the connections they need to execute the queries. The difference will be in the setup, instead of using the DbConnectionManagerdirectly, they will use the ConcreteTransactionManager which also implements the DbConnectionManager interface.

This means that the repositories don’t actually care whether a transaction is happening or not.

Transaction Manager

The transaction manager is the key piece in this design. Its purpose is to provide a clear way for the business logic to perform a sequence of actions in the same transaction, while hiding the details of how that transaction is handled.

By using the transaction manager, the business logic doesn’t need to know the underlying implementation of how the persistence layer initiates or closes the transaction.

It also removes the responsibility from the business layer to keep moving the transaction state from call to call. Thus, allowing a component of the business logic that requires transactional behavior to call another who doesn’t but still get transactional results if the latter one fails.

In the diagram above, I assumed the scenario where the transaction is handled through the DB connection, which might not be the case. The way in which the transaction manager and repositories interact will depend entirely on the technicality of how the transaction mechanism needs to be implemented.

Behavior and usage

From the business logic’s point of view what we want to achieve is that the usage is somewhat like this:

void BusinessLogic::foo()
{
    transactionManager.performInTransaction([&]() {
        Entity a = entityRepo.getById(123);
        if (a.x > 20) {
            a.x -= 20;
        }
        entityRepo.save(a);
    });
}

Everything that is executed inside the lambda given to the transaction manager, should be done in the context of a single transaction. When the lambda returns, the transaction gets committed.

That way would be fine if transactions cannot fail, but that is not the case. The easiest way for the TransactionManager to communicate the failure of a transaction is through an exception.

In that case we should enclose the call to the transaction manager in a try-catch clause like this:

void BusinessLogic::foo()
{
    try {
        transactionManager.performInTransaction([&]() {
            Entity a = entityRepo.getById(123);
            if (a.x > 20) {
                a.x -= 20;
            }
            entityRepo.save(a);
        });
    }
    catch (TransactionAborted&) {}
}

There is one more thing we need to add to make it more complete at a basic level. The business logic needs to be able to trigger a transaction rollback on its own. For that we can also use C++ exception mechanism, the lambda can throw an AbortTransaction which will make the transaction to be rolled back silently. As it was the user who asked for the rollback, the performInTransaction call should finish normally and not through an exception as was the case for the failed transaction.

Nested transactions

The purpose of this blog-post is not to write a full-fledged transaction manager, but to show an architectural solution that is decoupled, versatile and easily expandable.

For the sake of simplicity I am going to assume that if transaction nesting occurs the nested transaction is irrelevant. That is, everything will be done in the context of the outer transaction.

PoC Implementation: a transaction manager for SQLite

As a proof of concept of of this architecture, I am going to write a simple transaction manager for SQLite and write a couple of test cases to show the behavior.

The code is in the following gist: TransactionManagerArchitecturePoC.cpp.

Let’s disassemble the code into the different components.

ScopedDbConnection and ConcreteConnectionManager

In order to make things simple, the ScopedDbConnection is just a std::unique_ptr that takes a std::function<void(sqlite3*)> as its deleter. Doing this, allows us to return a ScopedDbConnection from the TransactionManager that when it gets destructed, it either actually closes the connection or does nothing if the connection belongs to an on-going transaction.

The ConcreteTransactionManager::getConnection() method, simply creates and returns a ScopedDbConnection to the DB we are using. When this scoped connection gets destructed, the underlying connection gets closed.

Transaction manager

There are two important components in the transaction manager: the protocol for transactions and its internal state to give support to the transaction protocol.

The state is composed by one internal data type (TransactionInfo) and a std::map from threads to TransactionInfo instances. A std::mutex also forms part of the transaction manager’s state and is used to synchronize access to the map.

The TransactionInfo is a structure that holds a ScopedDbConnection and a counter. This counter is used to keep track of transaction nesting.

The transaction handling protocol is the meat of the class, this is what actually handles when a transaction gets initialized and when it gets committed or aborted.

Let’s take a look at that code:

void ConcreteTransactionManager::performInTransaction(const std::function<void()>& f)
{
    auto threadId = std::this_thread::get_id();

    TransactionInfo& transactionInfo = setupTransaction(threadId);

    try {
        f();
    }
    catch (AbortTransaction&) {
        if (--transactionInfo.count > 0) {
            throw;
        }
        abortTransaction(transactionInfo);
        return;
    }
    catch (...) {
        if (--transactionInfo.count > 0) {
            throw;
        }
        abortTransaction(transactionInfo);
        throw;
    }

    if (--transactionInfo.count > 0) {
        return;
    }
    commitTransaction(transactionInfo);
}

The protocol is really simple: when starting into a performInTransaction block, we setup the transaction which results in a TransactionInfo object reference. Then we execute the given function and when that function exits, we reduce the count on the transaction information object. When the count reaches 0, depending on how we reached the count, the transactions gets committed or aborted. In case the transaction finished by an exception different than AbortTransaction, the exception is re-thrown.

commitTransaction and abortTransaction are really simple functions that just execute a statement using the connection from the transaction. They also remove the transaction from the on-going transaction map.

The setupTransaction is also really simple: it checks whether there is an on-going transaction for the given threadId, if there is then it just increments its count and returns. If there’s not it initializes a new TransactionInfo, places it in the map and executes the statement to start a transaction.

The other important function in this class is getConnection. This needs to handle two cases: (a) if something is executed outside the context of a transaction, the returned connection needs to destroy itself when going out of scope; and (b) if a connection is requested within the context of a transaction, the ScopedDbConnection returned must not close the inner connection when going out of scope.

The way I decided to handle the latter is to return a new ScopedDbConnection containing the sqlite3 connection of the transaction but with an innocuous deleter. This way for the client is totally transparent whether the connection comes from a transaction context or not.

ScopedDbConnection ConcreteTransactionManager::getConnection()
{
    auto threadId = std::this_thread::get_id();
    std::lock_guard<std::mutex> _(_currentTransactionsMutex);
    auto it = _currentTransactions.find(threadId);
    if (it == _currentTransactions.end()) {
        return _connectionManager.getConnection();
    }
    return ScopedDbConnection(it->second.dbConnection.get(), [](sqlite3* c) {});
}

The developed transaction manager has the following properties:

  • Transactions are per thread. This means that the transaction is associated to the thread who started it. Only the actions performed by that thread included in the transaction.
  • Transactions cannot be shared. This derives from the former item and means that one thread cannot give its current transaction to other threads. So, if thread A opens a transaction and launches thread B, the actions that thread B performs are not covered by the transaction initiated by thread A and there is no way to make that happen.
  • There are no nested transactions. If while in the context of a transaction a new call to executeInTransaction is made, this second call doesn’t have any practical effect. A call to abort aborts the already on-going transaction and a successful exit from the inner transaction doesn’t trigger a commit of the outer transaction.

Conclusion

Advantages of the given design

One of the most important advantages this design has is that it is so decoupled that mocking it is really easy to do. This helps testing the code that uses it with minimal effort and no dependencies.

It is also really easy to use and transparent that it is really non-intrusive to the code that uses it. And only the code that wants to use transaction capabilities requires knowledge of it. The rest of the code that needs to interact with it in order to provide the transaction functionalities (i.e the repos or whoever uses the DB) don’t know about transactions. This gets masked by the TransactionManager providing the ConnectionManager interface.

Also all the transactional functionality, is encapsulated by the TransactionManager. This makes it easier to test implementations in isolation without requiring other components to know any logic about transactions.

What can be improved?

The PoC is really simple and there are many edge cases that it doesn’t take into account.

When implementing a production solution, one has to be aware that the repositories can fail due to transaction problems and probably mask those as a TransactionAborted.

For my PoC, I decided to use abstract classes and virtual functions to provide polymorphism, but this can be implemented using templates.

It is also arguable that the ScopedDbConnection returned carries no guarantees that it is not going to be retained by the callee. A different approach needs to be taken to guarantee that: maybe the connection manager can have a method executeOnConnection(std::function<void(const ScopedDbConnection&)>).