Monday, October 3, 2011

My "Top Five" IT trends of the next half decade

Big data alone is not information, it's not enough to transmit the message over to your reader. You need visualization to leverage the bandwidth of the visual system to move a huge amount of information into the brain very quickly.

Infographics versus Data Visualization

The other day I started reading about Designing Data Visualizations and I discovered some key criteria to differentiate Infographics from "real" Data Visualization:

Infographics:
  • aesthetically rich,
  • relatively data-poor,
  • specific to the data at hand,
  • manually drawn.
Data Visualization
  • often aesthetically barren,
  • relatively data-rich,
  • easy to regenerate with different data,
  • algorithmically drawn.

The "Big Five" IT trends of the next half decade

Today I was reading about the "Big Five" IT trends of the next half decade:

  1. mobile,
  2. social,
  3. cloud,
  4. consumerization,
  5. big data.

And yes, it was their infographic that inspired this article altogether:

The big shifts in IT: cloud, social, mobile, consumerization, and big data

It does a better job at transmitting the idea than the list preceding it, doesn't it?

From Big Data to Information

You may say I'm a dreamer, but I'd put big data first! And moreover, regardless how big is your data set, data alone is just stark. I believe that big data alone not enough to transmit your message over to your reader: data alone is not information.

It's the visualization that can leverage the incredible capabilities and bandwidth of the visual system to move a huge amount of information into the brain very quickly.
So, if we are to talk about the top shifts in Information Technology, I would definitely add visualization to the list:

Big Data + Visualization = Information

Tuesday, September 27, 2011

Jeff Jonas on Big Data and Geospatial Super-Food

As large collections of data come together, some very exciting and somewhat unexpected things happen. As data grows, the quality of predictions improves (less false positives, less false negatives), poor quality data starts to become helpful, and computation can actually get faster as the number of records grows.

Now, add to this, the "space-time-travel" data about how people move that is being created by billions of mobile devices, and what becomes computable is outright amazing. As it turns out, geospatial data is analytic super-food.

Join Jeff Jonas – Chief Scientist, IBM Entity Analytics Group and an IBM Distinguished Engineer – on September 28th, to hear his thoughts on hot topics such as Big Data, New Physics, and Geospatial Super-Food.

Who is this Jeff Jonas?

Jeff Jonas is a super-star IBMer; he designs next generation technology that helps organizations better leverage their enterprise-wide information assets. With particular interest in real-time "sensemaking", these innovative systems fundamentally improve enterprise intelligence, which makes organizations smarter, more efficient and highly competitive.

He also leads global think tanks, privacy advocacy groups and policy research organizations. Read more about Jeff.

Why Big Data is the Next Big Thing

Interviewing Jeff Jonas, TechCrunch's Andrew Keen imagines an entrepreneur, scratching his head and thinking "the next big thing is Big Data", yet he doesn't really know what it is.

What does someone do to understand, and not only to understand this, but to take advantage as an entrepreneur or as an investor?

Take 5 minutes to find an answer to this question and a couple others:

Jeff Jonas interviewed by TechCrunch

Personally, I feel like borrowing the concept of context accumulation from the interview, but this is just one of a set of themes woven through Jeff's work, explored on his blog, and captured in a series of evocative phrases, like:

  • perpetual analytics,
  • non-obvious relationship awareness,
  • sequence neutrality,
  • "data finds data",
  • anonymous resolution and others.

Attend Jeff's talk thanks to TOHUG

In my experience so far, the TechTALKs organized by the IBM Canada Lab were offered only to it's employees. On this occasion however, an agreement was secured to allow the Toronto Hadoop User Group (TOHUG) members to come to the Lab, listen and meet with Jeff.

So if you're in Toronto and interested to hear Jeff's thoughts on hot topics such as Big Data, New Physics, Geospatial Super-Food and more, all you have to do is to join the Toronto Hadoop User Group (if you are not a member yet) and RSVP to the event.

Monday, September 12, 2011

Extended: Hadoop Programming Challenge

October 4th update:

The Hadoop Programming Challenge just got extended
till Monday, October 10th!
Lately more and more of my interest is invested in Big Data. And the future of Big Data and Data Analytics sounds so appealing that you might be forgiven to believe it's too good to be true.
In this context, I was intrigued by Alex Popescu's reality check:
  1. Even if technology costs decreased over time, the investment in creating data startups are still high.
  2. Financial institutions are not investing (too much) into data technology companies.
  3. There are only a few companies that are able to accumulate significant amounts of useful data.
  4. There are even fewer companies that are able to use effectively the huge amounts of data.
When I read of few companies that accumulate significant amounts of useful data and even more, fewer that make effective use of the huge amounts of data, I strongly believe that the efforts made to tame huge amounts of meaningful data were far from few.

The Philosopher's Stone

Philosopher's stone - turning base metals into gold?
photo by Ahmed Mater
Speaking of effort, we all know that effort does not mean anything if it does not deliver results! Quoting Leon Katsnelson, those even fewer companies are comparable with alchemists that discovered their philosopher’s stone:
I think of the internet heavyweights like Google, LinkedIn, Facebook, Yahoo! as alchemists. They figured out a way to turn massive amounts of data in to gold. Unlike alchemists of the middle ages, these modern-day wizards have found their philosopher’s stone. They call it Hadoop. Hadoop lets them crunch massive amounts of data to extract keen business insight, which, if applied properly turns data in to gold. How else can one explain these incredible valuations for such young companies?
I think there is an alchemist in everyone of us!
If the philosopher's stone is already discovered, wouldn't you like to turn base metals – Big Data – into gold – Big Insights?
What if you were able to obtain the philosopher's stone for free? You would still need to learn how to use it, right?

Choose your gold!

There are some people for whom gold means knowledge, and there are others for whom gold means money. But why pick sides when you can have both?
BigDataUniversity.com has teamed up with the IBM Big Data Team to sponsor three BigDataUniversity.com students on an all expenses paid trip to attend the Information on Demand (IOD) Conference 2011 in Las Vegas.
Vegas, baby!
Here's all you have to do for an opportunity to be selected, the rules are simple:
  1. Register with BigDataUniversity.com.
  2. Enroll and complete the free Hadoop Fundamentals I course by October 10th.
  3. You’ll receive a certificate of completion and an invitation to participate in the Hadoop Programming Challenge.
  4. On October 3rd October 12th, three participants to this challenge will be selected for a free, all expenses paid, trip to IOD 2011 in Las Vegas on October 23rd—27th.

But until you get on with the challenge, one more detail: although the course is free, and the trip has all expenses paid, if you will chose to do the course work in the cloud, do expect to incur some cloud-related usage charges! For the time it took me to complete the course, it amounted to approximately a toonie in Amazon charges.

Good luck!

Wednesday, August 3, 2011

How to install Cisco VPN on Ubuntu

On Ubuntu, you don't need to install the Cisco VPN Client: NetworkManager includes support for Cisco IPSec VPNs. This 3 steps article will walk you through a successful installation and configuration of your VPN client.

If you encounter any issues, or need more details, make good use of the comments form at the end.

Step 0: Authentication details

First of all, make sure you have your authentication details at hand!

Step 1: Install vpnc

Ubuntu ships by default with the plugin for the Point-to-Point Tunneling Protocol (PPTP), but we need the plugin for the Cisco Compatible VPN (vpnc), that provides easy access to Cisco Concentrator based VPNs.

To install the vpnc plugin, open your terminal and run:

sudo apt-get install network-manager-vpnc

Is your Ubuntu version 10.10 or older?

Installing the Cisco VPN client on a kernel older than 2.6.38+, will result in compilation errors: the cisco_ipsec module crashes and the system is only of limited use.

The working solution is to:

  1. download the vpnc client source,
  2. apply this patch for the vpnc client,
  3. and follow the next steps for setting up your VPN.

Step 2: Setting up your VPN

Find Network Connections in your Dash, and in the VPN tab select Import to choose your .pcf file, or Add if you want to manually enter your authentication details from Step 0.
Adding a new Cisco VPN connection on Ubuntu

Step 3: Use Only as Needed

In the configurations, make sure to go to the IPv4 Settings tab, click on Routes and activate the option to use the VPN connection only for resources on its network,
unless you want all your traffic to be significantly slowed down.
Only use the VPN connection for resources on its network

You might want to reboot your machine, and you're good to go. Give it a try!

Tuesday, May 31, 2011

Using RVM to Install Rails 3.1: Best Practices

Wayne E. Seguin describes RVM as:

a command line tool which allows us to easily install, manage and work with multiple ruby environments from interpreters to sets of gems.

This article focuses on the workflow of installing RVM, and use it to install Ruby 1.9.2 (MRI), create gemsets and install Rails 3.1 Release Candidate, while considering current best practices.

Installing Ruby Version Manager

If your Ubuntu 11.04 is still fresh from the oven, you may want to first install git and curl. Having satisfied the prerequisites, installing the latest RVM release version from git is as easy as running:

If at any point you want to start all over, run rvm implode and we'll be ready for a fresh start. For a complete removal, follow these details.

On the other hand, if RVM is already installed but we are behind on the updates,
rvm get latest will do the trick.

Setting up Ruby

  1. Feel free to run rvm list known for all the Ruby implementation made available through RVM
  2. We are focusing on the latest stable release, so we'll choose to:
    rvm install 1.9.2
  3. And we will use it as the default version for our system:
    rvm use 1.9.2 --default

Using gemsets

Gemsets are compartmentalized independent ruby setups, each containing it's own version of ruby, gems and irb. RubyGems — the package manager for Ruby projects — is already available for us, since RVM includes it automatically (try which gem).

For this tutorial we will install side by side the latest Rails stable release (3.0.7) and the latest release candidate (3.1), so let's prepare the terrain:

  1. Start by creating our gemset(s):
    rvm gemset create rails307 rails31
  2. The result can be verified by listing the available gemsets:
    rvm gemset list
  3. If a gem's name still leaves room for confusion, simply delete it and create a more meaningful one (e.g., rails31rc):
    rvm gemset delete rails31
  4. As a best practice, remember to always use one gemset per project*.

Installing Rails

  1. Now that we have multiple gemsets installed, we must first select the one we want to use, and we can also set it as the default gemset by passing it the --default flag:
    rvm use 1.9.2-p180@rails307 [--default]
  2. Installing rails is as easy as installing any other gem: we only need to specify it's name, but we can always choose a specific version, or to speed up the installation process by skipping the documentation:
    gem install rails [-v 3.0.7] [--no-rdoc --no-ri]

  3. Next we switch to the gemset created to hold the latest Release Candidate (e.g., 1.9.2-p180@rails31rc) and we install it by passing in the --pre flag.
    Scratch that; the --pre flag is not working as of June 1st. You might want to read this before installing Rails 3.1, but long story short, this should do the trick:
    gem install rails -v ">=3.1.0rc"

Bonus feature

Switching from one project to another, from a client to a personal project, from testing the release candidate to developing using the latest stable version, always having to manually switch from using a gemset to another can impact productivity. The project .rvmrc files can increase the development speed by setting up our project's ruby environment when we switch to the project root directory.

The rule of thumb here is to use a .rvmrc file for each project, for both development and deployment.*

* Make sure to check the RVM best practices!

How are gemsets improving your workflow? What other tips & tricks have you discovered while setting up your environment, or while setting up a new project? I would be more than happy to learn something new from you!

Saturday, May 14, 2011

Installing and Testing DB2 9.7.4 on Ubuntu 11.04

The full refresh of DB2 Express-C 9.7.4 — the enterprise scale database server from IBM that is free to develop, deploy and distribute — was released last week.

Reading what's new with DB2 Express-C 9.7.4, one of the new features that caught my eye was the Text Search component. This is not the first appearance of Text Search: it was first integrated in DB2 Express-C 9.5.2 and it allowed for fast searches on text columns. But with time, you had two engines to choose from, lacking a unified solution. With DB2 Express-C 9.7.4, this standard solution is introduced to bring improvements in the areas of performance, configuration and tuning.

Probably a first in IBM's history, the DB2 Express-C 9.7.4 template for Rightscale/Amazon EC2 was made available on the cloud well before publishing the product through traditional channels. How about that for commitment to the cloud? :)

Brand new DB2 on brand new Ubuntu

Seeing all this buzz around this new Fix Pack for DB2, I took the quest of testing it first locally: a full install on a brand new copy of Ubuntu 11.04!

My scenario covers:

  1. Downloading DB2 Express-C 9.7.4, including the language pack for a proper full install
  2. Minimal CRUD testing:
    1. create a SAMPLE database
    2. start the current database manager instance background processes
    3. create a connection to the DB
    4. use the CLP to run at least a basic SELECT
  3. Check the state of installation files, instance setup, and local DB connections (db2val)
  4. Retrieve the current Version and Service Level of the installed product (db2level)
  5. Check licenses:
    1. Check the limitations of the free, unwarranted licence (2 CPUs / 2GB of memory)
    2. Apply a Fixed Term License
    3. Check the new limitations of DB2 Express Edition (4GB of memory)
  6. Having had issues in the past, test the uninstall of DB2.

Results of the DB2 Express-C 9.7.4 test

Installing on a brand new copy of Ubuntu 11.04, I bumped right into the error of the missing dependency:
Missing libaio1

But this can be solved as simply as running

$ sudo apt-get install libaio1
and then running again the db2setup script as root.
The GUI installation wizard is here to confirm:
Installing DB2 as root

My plan is to make a full install — more chances to find possible flaws — so I chose Custom as installation type:
Choosing Custom for a full install

Other reasons for choosing custom install are to be able to review the choice of components or see if some of the ones you might need aren't unchecked by default.
For instance, if you plan to install Ruby on Rails and DB2 on Ubuntu 11.04, make sure to check the Application development tools component: it's needed to build the Ruby driver.
Make sure to select all features

The rest of the instalation is almost self-explanatory, so moving forward to the first set of tests, the database creation, connection and the CLP behaved as expected:
Creating SAMPLE DB and basic querying

The license was also applied successfully, so in the end all 4GB of memory were put to good use:

Checking licence and applying the new one

Following the documentation for the uninstall process, it completed with success, but I couldn't resist and I quickly went through the installation again, eager to test the connectivity between the new Rails 3.1 beta and DB2 Express-C 9.7.4!

But that will fuel another article!

Thursday, April 21, 2011

Enabling jQuery support in RubyMine

Last week we saw how to enable jQuery support in your Rails 3.0.x app for all your UJS needs.
But after @dhh's leak, today we know for sure: Rails 3.1 will ship with jQuery as the default Javascript library! That's great news and I believe it will clear any doubt from a beginner's mind on which JavaScript library she should focus on. If you still plan to use the Prototype helpers/RJS though, it will be as easy as upgrading your application to use the prototype-rails gem.

But how jQuery-ready is RubyMine?

While experimenting with Unobtrusive Javascript using RubyMine, there was one improvement brought by version 3.1 that particularly drew my attention:

Autopopup code completion — code completion suggestions appear instantly as you type and work for Ruby, ERB, JavaScript, HTML and other files.

Still, as reported in issue RUBY-5026, RubyMine's autocomplete cannot automagically talk jQuery. By default, the suggestions will be based on Prototype.js; furthermore, if you --skip-prototype, they will be even more limited:
RubyMine's lack of jQuery support

Enabling support for jQuery

  1. In RubyMine, go to Settings » JavaScript Libraries and add a new library.
    At this point we have two options for type:

    Attaching new JavaScript libraries in RubyMine
  2. Make sure to Apply the changes before continuing, then go to Usage Scope and enable the library for the current project:
    Enabling the library for the current project
  3. Finally, both jQuery code completion and navigation are working as expected.
    Enjoy!
    Enjoy jQuery support in RubyMine!

Update: Thanks to the JetBrains' Easter Sale, you can could get RubyMine for 30% off.

Tuesday, April 12, 2011

Enabling UJS in Rails3 with jQuery: in just 3 steps!

Here is my approach to quickly enabling Unobtrusive Javascript support for your Rails application, using jQuery:

  1. Create a Rails application without Prototype support
  2. Edit your config/application.rb to include the latest jQuery and the Rails UJS adapter for jQuery:

    Update (May 4th) for jQuery 1.6:
  3. ...oh, I said 3 steps, right!
    Just continue building your application; here's an example: YouTube-like-comments.

Monday, April 4, 2011

Harvesting Big Insights from Big Data: Data + Visualization = Information

We know for a fact that reaching adulthood, most of the memories from our first 3-4 years of life are lost to infantile amnesia.

Deb Roy: The birth of a word
Deb Roy: The birth of a word
Imagine for a moment having a 200 terabytes dataset containing 3 years worth of audio and video "memories", that is:
  • 90,000 hours video
  • 140,000 hours multi-track audio
  • a 70-million-word transcript
of almost everything that happened in your childhood.

Considering this exercise of imagination, what would you do to harvest usable information out of that huge amount of opaque data? Here is where MIT cognitive scientist Deb Roy found his challenge: gathering and using such a natural longitudinal data to understand the process of how a child (his son) learns language. He described his research on this Big Data at MIT during his TED talk this month:


But it's not easy to strip all the developmental milestones — from one's first steps as a baby to the mastery of any spoken word (e.g., water) — from what is by far "the largest home video collection ever made". Then how about rolling back to see what verbal and physical interactions preceded the acquisition of language during early childhood?

I think this is where the visualization of data comes into play to communicate the message. For instance, Deb Roy's team reaped the power of data and captured every time his son ever heard the word water along with the the context he saw it in. They then used this data to penetrate through the video, find every activity trace that co-occurred with an instance of "water" and map it on a blueprint of the apartment. That's how they came up with wordscapes: the landscape that data leaves in its wake.

Wordscape for the word water
Wordscape for the word water – most of the action takes place in the kitchen.

Two years ago, during a research assignment at my University, I was getting a first contact with Apache Hadoop — a software framework that supports data-intensive distributed applications. Today, Deb Roy's TED talk inspired me to look for what solutions are available today for doing analytics on Big Data and transform them into information, but considering all the challenges that we meet when rising to the enterprise level. An interesting answer to this challenge can be found in InfoSphere BigInsights, which aims to bring Apache Hadoop MapReduce large scale analytics to the enterprise.

More examples of how to make intelligent use of Big Data can be found in TechCrunch's interview with Anjul Bhambhri, VP responsible for Big Data at IBM. In this interview, several interesting projects are mentioned, such as:
  • detecting an onset of infection in critically premature infants;
  • solving congestion problems in big cities like Stockholm as part of the IBM Smarter Planet initiative;
  • and of course about Watson and how it outsmarted humans on Jeopardy and how this technology helps real business.

Anjul Bhambhri interviewed by TechCrunch

Now consider that you can garner the power of Big Data and create meaningful visualizations that bring information to life: what dataset would motivate your work, what problem would you solve first?

Photo credit: Steve Jurvetson

Thursday, March 10, 2011

Oracle and DB2 - An Architectural Comparison

Update—Apr 19, 2011:

Thanks to the guys over at ChannelDB2.com, if you couldn't make it to this webcast teleconference, you can go through the recording at your pace, at your place:

Also available are:


Personal History

Remembering the Advanced Databases classes taken at my alma mater University in Iaşi, I find that prof. Victor Felea's focus on Oracle-only solutions was a rather limiting one. It was thanks to prof. Sabin Buraga's passion for Web Technologies that many of my colleagues and I were exposed to alternatives, all the way to NoSQL.

Contemporary Fact

Still, most of the extensive Database courses revolve around only one specific DBMS. You guessed that right: that ends up being the main skill of the graduate that wants to tackle enterprise level DBs. But when taking on a DB-centered career path, he/she needs a better understanding of the bigger picture; and this is when advices from his/her more experienced peers are most valuable. Asking a senior colleague what is his suggestion for an aspiring professional, he recommended a toolbox of skills that spread well beyond a single DBMS. How is this an obvious advantage, you ask?
  • most enterprises use more than one DBMS in-house;
  • therefore it may be useful to get skilled in more than one Enterprise class DBMS;
  • first obvious advantage: this improves career prospects if jobs for one database are more in demand;
  • secondly, you can have a higher salary if you know about more than one DB;
  • yes, the head of the database team will most likely earn more than a DBA in the team who is only knowledgeable about a single DBMS!

Chat with the Labs

Oracle and DB2 - An Architectural Comparison

Following the Chat with the Labs series of webinars, I could not help but notice that the previous episodes mostly catered to existing DB2 users. This is why I was surprised when I read about the Oracle and DB2 - An Architectural Comparison:

Many database professionals and DBAs often ask how DB2 and Oracle compare architecturally, that is, how they are different and similar at their core. They also ask what are the equivalent concepts, names, commands etc. in the other database system. This free webinar will answer those questions by covering the following topics in detail:
  • Server architecture comparison (e.g. instances and database model, process vs. thread)
  • Memory architecture comparison (e.g. Oracle SGA & PGA vs. DB2 instance, database and application memory)
  • Parameters, environment variables and registry variables
  • Database storage model comparison (e.g. table space types and layouts, compression approaches)
  • Basic database administration comparison (e.g. terminology, create database, start/stop, dictionary vs. system catalog, performance)
  • Compatibility mode for running Oracle applications with DB2

This webcast, scheduled for Thursday, March 31 2011, at 12:30pm (EST) is intended for the database professional, fresh or experienced, who is:
  • familiar with Oracle and looking to learn more about DB2 (for Linux, Unix and Windows);
  • familiar with DB2 and looking to learn more about Oracle;
  • working in a heterogeneous environment and looking to expand their DBMS knowledge and career prospects.

Edit: Since the (limited) number of places is filling up fast, I recommend that you register now!

Edit Mar 14th: With almost 2 weeks left till the live event, it is already sold out and no additional registrations are being accepted at this time.

Wednesday, February 23, 2011

Coding Green Robots: Debriefing #1 [+video]

I was announcing the other day the first episode of Coding Green Robots: a series of meetings focused on Android development. So right after work, fighting the rush hour, I went today to the YMC in Downtown Toronto to be "in the studio", with the hosts: Greg Carron and Matthew Patience from Mobicartel.

Today's Episode 1 was mainly a beginner's tutorial, but proved pretty useful in brushing up almost forgotten Android development skills. It was also a sneak peek into Mobicartel's workflow with varied ideas, from how Matthew and Greg divide/share the development and the design or how they make use of Dropbox in the development lifecycle for developing, sharing and quickly installing an app within the team, to how they decide which platform versions to target:

  • On the development side we covered from efficient IDE setup to basic Android components and Views (relative layout, fast prototyping, intents etc.).
    I found it particularly interesting to learn how experience has taught them the best practice of using sp as units of measurement.
  • As for the design, I got a better understanding of screen densities and how important is to use vector shapes in your design in order to be able to export nice and crisp 24-bit transparent PNGs that will fit your custom design like a glove.
    I also discovered the ShootMe app: useful for getting screenshots on your Android device.
In the interview with Mark Reale I found insights about organizing AndroidTO, and that it's 2011 edition will most likely be in October. His "be resourceful" philosophy was somewhat motivating for me:
  • use everything at your fingertips
  • always be around people smarter than you
  • never hesitate to ask questions
I found the night's coup de grâce to be in the short talk about NFC: I got to whet my appetite with a device that can write/read NFC tags, and the programmable NFC tags that accompanied it. It was the first real/physical tag that my Nexus S read: even if it was a blank one, it was still an Eureka moment!

Also tonight, the Gingerbread (Android 2.3.3) OTA update was announced, so I have to admit: I'm really looking forward to writing (not just reading) rewritable NFC tags, and even program my Nexus S to act as a NFC tag... and imagine the grin on my face when finally getting rid of those "random" reboots! :D

As I guess you can tell by now, every other Tuesday I'll head Downtown, to the YMC in Toronto. If you aren't in the area, but you're looking to learn how to develop for Android in somewhat of a classroom environment, you can always watch the sessions streamed live and/or enjoy the full videos of sessions published afterwards. Just head out to CodingGreenRobots.com.

Update: Here's the 2 hour recording of yesterday's first CGR episode:


...and wait till you read the debriefing on March 8th! ;o)

Monday, February 21, 2011

Coding Green Robots: Episode #1

Need to brush up your Android development skills? Meet fellow developers?

Android CodingGreenRobots.com is exactly what you're looking for! Tomorrow it will start from scratch: the meetup will begin with an overview on how to set up the Android SDK and Eclipse IDE in an efficient way.
Matthew Patience [Mobicartel] will be going through mini-tutorials on specific Android Views such as Lists, Tab Layouts, Form Widgets, and Galleries.

Greg Carron will go over basic graphics for Android Development including densities, sizes, and XML layouts.

In the Development News portion of the show we will cover NFC and its exciting future as a new feature of Android.

As well we are excited to have Mark Reale [BNotions] on the program for an interview regarding community evangelism, AndroidTO, and the Yorkville Media Centre.

For those who are unable to attend the event, you will be able to watch live online at CodingGreenRobots.com as they will be streaming the entire episode.

Meet you there! ;o)