Wakatta!

Like Eureka!, only cooler

Seven Databases in Seven Weeks HBase Day 1

New week, new database. This week is about HBase, a product that has a significant enterprisy feel about it. First it is written in Java, the de facto enterprise language. Then it is already in production in very large big data consumers (Facebook among others).

Perhaps more surprising is the fact that it even runs at all on a single, personal computer (as the book states, 5 dedicated servers is the recommended minimal configuration).

Today is a fairly short day. Getting HBase to run, creating a single table and a couple of rows, and that’s it.

As for Riak, I recommend downloading the HBase package rather than trying your luck with the Homebrew version. HBase runs directly from the extraction directory, and already includes all the dependencies.

Just edit the hbase-site.xml configuratio file as the book recommends, and you’re good to go.

Exercises

put_many function

This function is more an exercise in Ruby than in HBase. The code is just a variant of what is already in the book.

put_many.rb (put_many.rb) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# recap some definition to make this definition self-contained
import 'org.apache.hadoop.hbase.client.HTable'
import 'org.apache.hadoop.hbase.client.Put'

def jbytes( *args )
  args.map { |arg| arg.to_s.to_java_bytes }
end

# actual exercise
def put_many( table_name, row, column_values)
  table = HTable.new( @hbase.configuration, table_name )

  p = Put.new( *jbytes( row ))

  column_values.each do |k, v|
    (kf, kn) = k.split(':')
    kn ||= ""
    p.add( *jbytes( kf, kn, v ))
  end

  table.put( p )
end

Use the put_many function

Invoking the put_many function then checking the insert:

Testing put_many
1
2
3
4
5
6
put_many 'wiki', 'Some title', {
  "text:" => "Some article text",
  "revision:author" => "jschmoe",
  "revision:comment" => "no comment" }

get 'wiki', 'Some title'

generates

1
2
3
4
5
COLUMN                CELL                                                      
 revision:author      timestamp=1323575657943, value=jschmoe                    
 revision:comment     timestamp=1323575657943, value=no comment                 
 text:                timestamp=1323575657943, value=Some article text          
3 row(s) in 0.5340 seconds

And that’s all for today. Tomorrow will be a bit more fun: first a significant take on of Wikipedia files, then using HBase to play with the loaded data.

Comments