So if you have been following NoSQL movement, the migration of some types of data to non-relational datastores has recently picked up speed. For web (and other developers) this has lead to some impressive engineering resources developing some amazing tools being open sourced for the world to use. One that caught my eye recently has been Salvatore Sanfilippo’s redis which has been taken under the wing of VMWare, solidifying and validating the great work that Salavatore has done making redis an amazing tool in any developers arsenal.
A very simplified explanation of redis is that it is an in memory key-value store like memcached but, it is persistent on disk, unlike memcached which is volatile. Along with being disk-persistent redis also supports some basic data structures like lists, sets, ordered sets, hashes, and of course basic key-value string storage like memcached. Redis, with disk-persistence and basic data structures, remains blazing fast with published benchmarks of 110,000 SETs per second, about 81,000 GETs per second.
This post is the start of a series of articles on redis for Python programmers. A prerequisite for this is going to be some basic Python knowledge, which if you haven’t used before I highly recommend the free web book Dive Into Python. This is going to be a simple overview of the basic data types and usage for Python programmers and we will slowly progress into more complex usages, so if you haven’t done anything with redis before this is a perfect start.
Redis is extremely easy to install on POSIX systems (Linux, OSX, Cygwin for Windows). Simply download either a release version or checkout the trunk version of redis from the project home page. Unpackage, make, and run. That’s it!
gettingstarted $ wget http://redis.googlecode.com/files/redis-1.2.5.tar.gz gettingstarted $ gunzip redis-1.2.5.tar.gz gettingstarted $ tar -xvf redis-1.2.5.tar gettingstarted $ cd redis-1.2.5 redis-1.2.5 $ make redis-1.2.5 $ ./redis-server 21 Mar 17:37:11 * Warning: no config file specified, using the default config. In order to specify a config file use 'redis-server /path/to/redis.conf' 21 Mar 17:37:11 - Server started, Redis version 1.2.5 21 Mar 17:37:11 - The server is now ready to accept connections on port 6379 21 Mar 17:37:11 . 0 clients connected (0 slaves), 832496 bytes in use, 0 shared objects
6 commands and redis is running and ready for a connection. Obviously this isn’t best practice for running redis in production but, it works well for just getting it up and running for some experimentation. Let’s leave that window open and open another for a Python console.
In every example below we only are covering the basic operations of each type. If you want to learn about all of the possible operations for redis read up on the redis Command Reference. This has all of the commands available to you.
Redis and Python
First thing we need is the redis-py module for Python. You can install this manually or just use easy_install.
Everything has a place: keys
Redis is a key-value store just like a Python dictionary. So everything that goes into redis no matter what it is, has a key (or location, or address, whatever makes sense to you). You can’t store something without a place for it to go
Now let’s open up a Python console and start to play with the basic data structures
Data structure: String
The most basic data structure in redis is the string. Simply, you have a key and it stores a string.
>>> import redis >>> r_server = redis.Redis("localhost") >>> r_server.set("name", "DeGizmo") True >>> r_server.get("name") 'DeGizmo'
So, line by line let’s walk through what we’ve done here.
1. import the redis module
2. create an object called “r_server” of redis.Redis type with the parameter “localhost” which is the network location of our running redis server
3. call the function “set” on the “r_server” object with the parameters “name” and “DeGizmo”. The first parameter “name” is the key and “DeGizmo” is the value. So in redis we created a key-value pair of “name” -> “DeGizmo”
4. call the function “get” which will retrieve the value from the key which is passed as the parameter. We are getting the key “name” and redis returns the value “DeGizmo”
That’s it. If you go and stop the redis server and restart it and run r_server.get(“name”) command again, it will return “DeGizmo” again due to redis’s persistent store.
Data structure: Integer
The second data type available is an integer.
>>> r_server.set("hit_counter", 1) True >>> r_server.incr("hit_counter") 2 >>> r_server.get("hit_counter") '2' >>> r_server.decr("hit_counter") 1 >>> r_server.get("hit_counter") '1'
Again using the “r_server” Redis object we run the following commands
1. We set the redis key “hit_counter” to a value of 1 just like in the example above
2. Then we run the command “incr” on the key “hit_counter”. This will increment (incr) the key by 1
3. We get the value for the key “hit_counter” to verify that it was been incremented
4. Then we run the command “decr” on the key “hit_counter”. This will decrement (decr) the key by 1
5. We get the value for the key “hit_counter” to verify that it was been decremented
Simple huh? Well remember that redis is a persistent store that can be accessed by multiple machines (or programs) simultaneously. If you have a horizontally scaled web farm with many different web servers running on separate machines, your web app can all call on and reference the same redis instance and all operate on the “hit_counter” key at the same time and increment it just like how you would use a MySQL database. You can kind of think of redis like a “shared” storage across different applications.
Data structure: Lists
Very similar to the built in Python list type, the redis list type has a few basic methods that combined can quite powerful. We are only covering a tiny portion of the commands available, you can find all the commands in the redis Command Reference.
>>> r_server.rpush("members", "Adam") True >>> r_server.rpush("members", "Bob") True >>> r_server.rpush("members", "Carol") True >>> r_server.lrange("members", 0, -1) ['Adam', 'Bob', 'Carol'] >>> r_server.llen("members") 3 >>> r_server.lindex("members", 1) 'Bob'
We don’t actually have to create or declare anything when using redis, we just use it and redis handles everything else for us. So we just go ahead and tell redis at the key “members” we are going to add (rpush, right push) the value “Adam”. Redis creates the list for us at key “members” and adds the value “Adam” to the list.
1. With the r_server object again we call the method “rpush” which will add the value “Adam” to the newly created list “members”
2. We add “Bob” to the same list
3. Finally we’ll add “Carol”
4. With the lrange method we are asking redis to return all the objects in “members”. lrange takes 3 arguments: key, start index in list, end index in list. We are getting the objects from the key “members” starting at index 0 and ending at -1 (which is technically the -1, or last index in the list, which will return everything)
5. The llen method asks redis to return the length of the list at the key “members” which now has 3 objects
6. lindex method tells redis that we want the object from the key “members” at the index position of 1 (remember lists start at index 0), so redis returns “Bob”
We’ve got some elements in the list at the key “members”; now lets get remove some elements.
>>> r_server.rpop("members") 'Carol' >>> r_server.lrange("members", 0, -1) ['Adam', 'Bob'] >>> r_server.lpop("members") 'Adam' >>> r_server.lrange("members", 0, -1) ['Bob']
1. With the method rpop (right pop) we remove the element in the list on the right side (tail), which is “Carol”
2. Now when we ask for the list “members” from redis again (from the start of 0, and the end of -1 which returns everything) we see our list now doesn’t have “Carol”
3. We now lpop (left pop) an element from the list “members”. This will remove the far left element from the list, “Adam”
4. Now the entire list only contains “Bob”
Data structure: Set
Again, sets perform identically to the built in Python set type. Simply, sets are lists but, can only have unique values. In the above example if we added the value Adam (r_server.lpush(“members”, “Adam”) ) 20 times our list would contain 20 values all containing the value “Adam”. In a set, all elements are unique.
>>> r_server.delete("members") True >>> r_server.sadd("members", "Adam") True >>> r_server.sadd("members", "Bob") True >>> r_server.sadd("members", "Carol") True >>> r_server.sadd("members", "Adam") False >>> r_server.smembers("members") set(['Bob', 'Adam', 'Carol'])
1. First off we delete the old list in the key “members” so we can use it as a set
2. Then we sadd (set add) the value “Adam” to the key “members”
3. Do the same for the value “Bob”
4. Do the same for the value “Carol”
5. Now we try to add the value “Adam” to the key “members” again but, this time it returns “False” since we are working on a set, and a set only has unique values. There already is a value “Adam” present in this set
6. The method smembers returns all the members of the set in the Python type set
An example of a use of sets in a web application would be for “upvotes” on a reddit, or hacker news type website. We want to keep track of who up votes a story but, you should only be able to up vote a story once.
>>> r_server.sadd("story:5419:upvotes", "userid:9102") True >>> r_server.sadd("story:5419:upvotes", "userid:12981") True >>> r_server.sadd("story:5419:upvotes", "userid:1233") True >>> r_server.sadd("story:5419:upvotes", "userid:9102") False >>> r_server.scard("story:5419:upvotes") 3 >>> r_server.smembers("story:5419:upvotes") set(['userid:12981', 'userid:1233', 'userid:9102'])
I added a little twist in here with the name of the key: “story:5419:upvotes” but, it’s easy to explain. Redis is “flat” with it’s keyspace. So if we have many different stories we use a fixed key naming convention for easy reference in redis. For this example our key is broken down like this: “object type” : “id” : “attribute”. So, we have an object of type “story” with an ID of “5419″ with an attribute “upvotes’. Redis has no idea what any of this means it just knows the key is “story:5419:upvotes” but, it doesn’t matter, we know what it means and we can divide up our objects into this name space to make it easier to work with and keep from “losing” things. The value being added to the key is divided up in the same way. “userid” is the type and “9102″ is the value or the ID for that user voting on the story.
1. Just like before we are adding the value “userid:9102″ to the key “story:5419:upvotes”
2. Now we are adding the value “userid:12981″
3. Finally adding the valud “userid:1233″
4. Now, the user with the ID 9102 tries to up vote the story with the ID 5419 again, and redis returns False since that user has already up votes this story before and you can’t up vote a story twice!
5. The method “scard” is asking redis for the cardinality of the set at key “story:5419:upvotes” or how many things are in this set, and redis returns 3.
6. Finally we return the list of userid’s that we have stored in the set.
Data structure: Ordered sets
The last data structure we are going to talk about today is an ordered (or sorted) set. This operates just like a set but, has an extra attribute when we add something to a set called a “score”. This score determines the order of the elements in the set. We will stick with the concept for this final example
>>> r_server.zadd("stories:frontpage", "storyid:3123", 34) True >>> r_server.zadd("stories:frontpage", "storyid:9001", 3) True >>> r_server.zadd("stories:frontpage", "storyid:2134", 127) True >>> r_server.zadd("stories:frontpage", "storyid:2134", 127) False >>> r_server.zrange("stories:frontpage", 0, -1, withscores=True) [('storyid:9001', 3.0), ('storyid:3123', 34.0), ('storyid:2134', 127.0)] >>> frontpage = r_server.zrange("stories:frontpage", 0, -1, withscores=True) >>> frontpage.reverse() >>> frontpage [('storyid:2134', 127.0), ('storyid:3123', 34.0), ('storyid:9001', 3.0)]
Quick namespace explanation like before. For the key we are going to be referring to “stories:frontpage” which is going to be a set of stories slated for the front page of our website. We are storing in that key the value of “storyid:3123″ which is the ID of some story on the site and then a score, which in our case is going to be the number of votes on a story.
1. First we add the value “storyid:3123″ to “stories:frontpage”, and “storyid:3123″ in our example is going to have 34 votes.
2. Then add “storyid:9001″ with 3 votes
3. Then add “storyid:2134″ with 127 votes
4. We are going to try to add “story:2134″ to the set again but, we can’t since it already exists.
5. Now we are going to ask redis for all the elements in “stories:frontpage” from index 0 to index -1 (the end of the list) with all associated scores (withscores=True)
6. We’ve got the scores but, they are in ascending order, we want them in descending order for our website, so we are going to store the results in the variable “frontpage”
7. Then reverse it (which is an in place operation in Python)
8. Now print out the front page!
In conclusion let’s do a quick example of a “view” in an application in which a user will vote of a story using redis as a storage engine
#given variables #r_server = our redis server #user_id = the user who voted on the story #story_id = the story which the user voted on if r_server.sadd("story:%s" % story_id, "userid:%s" % user_id): r_server.zincrby("stories:frontpage", "storyid:%s" % story_id, 1)
2 lines of code? This is might compact but, once we unravel it we can see how it makes sense and how powerful redis can be. Let’s start with the if statement.
if r_server.sadd("story:%s" % story_id, "userid:%s" % user_id):
We know the command “sadd” already. This will add an element to a set at a key. The key in this case is
"story:%s" % story_id
If story_id is 3211, then the resulting string will be “story:3211″. This is the key in redis which contains the list of users that has voted on the story.
The value to be inserted at this key is
"userid:%s" % user_id
Just like with story, if the user_id is 9481 then the string to be inserted into the set at “story:3211″ will be “user_id:9481″
Now the redis command “sadd” will return False if that element is already present in the set. So if a user has already voted on this story before we don’t execute the statement under the if. But, if it is added, then we have a brand new vote and we have to increment the votes for the front page.
r_server.zincrby("stories:frontpage", "storyid:%s" % story_id, 1)
We have an ordered set at the key “stories:frontpage” and we are going to increment the element “storyid:%s” % story_id (“story:3211″) by 1.
And now we’re done! We’ve made sure the user hasn’t voted on this story before and then we’ve incremented the number of votes for this story on the front page!