Wakatta!

Like Eureka!, only cooler

Seven Databases in Seven Weeks Redis Day 2

Performance tuning with Redis can be achieved in different ways, as we see today. First there are basic changes in the client side (such as pipelines), then configurations options (frequency of saves, …), and finally distribution of load.

Pipeline

Redis low level protocol supports the notion of pipelines: sending commands in batch, and collect all the results at the end, instead of waiting for results between each command. This should save a round trip delay for each command, so there can be huge performance boosts for specific usages, as the informal benchmarks below show.

Distributed Redis

Redis servers can be distributed for performance or memory concern, but much of the work falls on the client side.

Slaves

Slaves in Redis are just the opposite of MongoDB’s. Whereas MongoDB’s slaves are meant to be written to, so that updates are automatically pushed to the master, Redis slaves are, or should be, read-only. Updates are only propagated from master to slaves.

There is no integrated support for failover; it has to be implemented in client code.

So slaves are mainly a mechanism to distribute reads; combined with monitoring client code, they can also be used to data replication and failover.

Note that each slave needs as much memory as the master, as it contains the same data.

Sharding

By itself, Redis does not support sharding, and relies on the client library to spread accesses over several instances. There is a ongoing development to have real Redis Clusters, but for the time being it has to be simulated.

One issue not mentioned in the book is that sharding breaks transactions and pipelines: there is no guarantees that the relevant keys are all in the same instance, so the Redis Ruby client, for instance, will raise an exception when invoking MULTI.

The Java client, Jedis, has a mechanism to “tag” a key such that keys with the same tag are guaranteed to be on the Redis server. This makes the distribution of keys predictable, and allows the use of transactions (provided all the involved keys have the same tag).

This shows that not only this is a client side feature, but the actual extent of the feature may vary widely. And of course, there is no reason to think that different clients will shard keys the same way.

Properly setup, sharding will distribute the data over each node, reducing the memory load of each node.

Exercises

Performance tests

I first tried to rewrite the code in Java, to measure the cost of Ruby’s convenience. The code in Java is clumsier than in Ruby, but it ran a bit faster (105 seconds instead of 155 seconds for the Ruby version using hiredis).

Simple ISBN Loader (ISBNLoader.java) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
package jp.wakatta;

import java.io.BufferedReader;
import java.io.FileReader;

import redis.clients.jedis.Jedis;


public class ISBNLoader {
  public static final String REDIS_HOST = "127.0.0.1";
  public static final int REDIS_PORT = 6379;
  public static final int TIMEOUT = 5000;

  public static void main(String...args) throws Exception {
      BufferedReader in = new BufferedReader(new FileReader(args[0]));
      
      Jedis jedis = new Jedis(REDIS_HOST, REDIS_PORT, TIMEOUT);
      jedis.flushAll();
      
      String line;
      int count = 0;
      
      
      while ((line = in.readLine()) != null) {
          count++;
          if (count == 1)
              continue;
          String[] tokens = line.split("\t");
          if (tokens.length < 4)
              continue;
          String isbn = tokens[0];
          String title = tokens[3];
          if (isbn.isEmpty() || title.isEmpty())
              continue;
          jedis.set(isbn, title);
      }
      
      jedis.disconnect();
  }
}

Using pipelines, the difference was 11 seconds against 26 seconds (again, the Ruby version is using hiredis).

Pipelined ISBN Loader (ISBNLoader.java) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
package jp.wakatta;

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.LinkedList;
import java.util.List;

import redis.clients.jedis.Jedis;
import redis.clients.jedis.Pipeline;


public class ISBNLoader {
  public static final String REDIS_HOST = "127.0.0.1";
  public static final int REDIS_PORT = 6379;
  public static final int TIMEOUT = 5000;
  public static final int BATCH_SIZE = 1000;
  private final Jedis client;
  
  protected ISBNLoader() {
      client = new Jedis(REDIS_HOST, REDIS_PORT, TIMEOUT);
      client.flushAll();
  }
  
  public static void main(String...args) throws Exception {
      new ISBNLoader().load(args[0]);
  }
  
  protected void load(String fileName) throws Exception {
      BufferedReader in = new BufferedReader(new FileReader(fileName));
      
      String line;
      int count = 0;
      
      List<Pair> batch = new LinkedList<Pair>();
      
      while ((line = in.readLine()) != null) {
          count++;
          if (count == 1)
              continue;
          String[] tokens = line.split("\t");
          if (tokens.length < 4)
              continue;
          String isbn = tokens[0];
          String title = tokens[3];
          if (isbn.isEmpty() || title.isEmpty())
              continue;
          batch.add(new Pair(isbn, title));
          
          if (batch.size() == BATCH_SIZE)
              flush(batch);
      }
      
      flush(batch);
      
      client.disconnect();       
  }
  
  protected void flush(List<Pair> batch) throws Exception {
      Pipeline pipe = client.pipelined();
      
      for (Pair p : batch)
          pipe.set(p.key, p.value);
      pipe.sync();
      batch.clear();
  }
  
  static class Pair {
      String key;
      String value;
      
      Pair(String key, String value) {
          this.key = key;
          this.value = value;
      }
  }
}

Disabling snapshots and append only file did not improve the time significantly compared to the default (snapshots but no append only file).

Enabling the append only file and setting it to always was almost 3 times as slow for the pipelined Java version (27 seconds). For the original Ruby version (with hiredis), it was even worse (1101 seconds). This means the overhead of writing to file can be mitigated with pipelines.

To recap: disabling snapshots did not improve performance measurably, but enabling append only file always degrades the performance significantly; using pipelines makes it a bit better, but it is still much slower.

URL Shortening Service

The exact setup to implement is not described, so what I did is to distribute data between two shards of one master and two slaves.

There is no direct support for such a layout in Jedis (nor, as far as I can tell, in the Ruby library), so I had to write some of it myself.

As always with Redis, the writes are restricted to the masters, and the reads are distributed over the slaves (and the masters as well, if needed).

Distribution over slaves

Jedis does not support slaves directly. What the documentation proposes is to have a dedicated client to the master to write on, and a sharded pool to the slaves. However, such an approach would be difficult, as I need to shard the writes to the masters as well (I would have to use a different sharding algorithm, and manage the routing of commands through the tree of Redis instances).

Fortunately, Redis user Ingvar Bogdahn had posted an implementation of a Round Robin pool of slaves. This implementation manages a connection pool to a master, and another connection pool to a set of slaves. The commands are properly distributed: all the write commands are sent to the master, and the reads commands are distributed over the slaves.

I had to fix the code in some places: a command implementation was missing, another was incorrect, and finally the password was never sent to the master, causing authentication errors. But the bulk of the code is Ingvar’s, and I was glad to use it.

The classes are

  • UniJedis: provides pools for both master and a set of slaves, and dispatches commands to the correct pool.
  • RoundRobinPool: implements a pool with Round Robin access
  • ChainableTransaction: (not used in this project) provides a fluent interface for Redis transactions.
  • DBKeys: (not used in this project) abstracts database and keys.

Sharding

Sharding is directly supported by Jedis, but as organized the code is restricted to a set of clients to specific instances.

There are basic, generic classes (Sharded, ShardInfo, …) that can be used to implement sharding of arbitrary clients (such as the Round Robin pool above), but it requires a lot of tedious code to map each command to a method on the right shard. Worse, such code would be the same for every kind of shard.

So I first wrote generic classes that implement sharding in terms of generic Jedis client; the actual implementation is then much simpler (just the constructors, and the few commands that cannot be sharded, such as disconnect or flushAll).

  • BinaryShardedGJedis: first level of Jedis commands implementation (binary commands)
  • ShardedGJedis: second level of Jedis commands implementation (String based commands)
  • UniJedisShardInfo: descriptor class to use with Sharded
  • ShardedUniJedis: actual implementation of sharded UniJedis. As promised, the class has hardly any code.

Service

The code for the service itself is now fairly small. JedisClient is the class that builds the tree of sharded master/slaves pools. It is loaded and initialized as a Spring bean. The web services are JSR 311 services, running over Jersey, and loaded and initialized by Spring.

Admin let the user defines a keyword for a specific URL, and Client extracts a keyword from the request URL, retrieves the URL for the this keyword, and returns a request to redirect to this URL.

Once deployed (on Apache Tomcat), it can be used in a browser or on the command line:

1
2
$ curl -X POST http://localhost:8080/url-shortener/u/admin --data "url=http://slashdot.org&shorter=slash"
Key[slash] mapped to URL[http://slashdot.org]

and for clients:

1
2
3
4
5
6
$ curl -I http://localhost:8080/url-shortener/u/s/slash
HTTP/1.1 303 See Other
Server: Apache-Coyote/1.1
Location: http://slashdot.org
Content-Length: 0
Date: Mon, 23 Jan 2012 06:13:29 GMT

The code for the whole project can be found on Github.

And this completes Day 2.

Comments