Monday, June 18, 2012

Expand and Update your BigInsights

Roughly a year ago, you were reading about how MIT was harvesting big insights from big data, about discovering useful information by turning huge amounts of data into gists that are visually understandable. At that time, I was also in awe of IBM's Watson and how InfoSphere BigInsights is used to create a Smarter Planet.

But that was a year ago... Since then, many things happened.
On the Open Source side, the components integrated in the BigInsights platform experienced a consistent progress; take Hadoop for instance, who matured enough to reach version 1.0.
Meanwhile, InfoSphere BigInsights grew just as much: last Friday releasing BigInsights version 1.4. A nice tradition I wish other companies would copy from IBM is that the BigInsights v1.4 GA was simultaneous to its availability for deployment on public clouds: anyone can create their own Hadoop cluster (based on BigInsights v1.4) on the Cloud in less than 30 minutes.

BigInsights v1.4 available for deployment on public clouds
BigInsights v1.4 available for deployment on public clouds

The very same day, the 61 participants at the Big Data Developer Day hands-on labs used BigInsights v1.4 — delivered via Cloud instances.


The new version brings simple administration and management capabilities, rich developer tools, powerful analytic functions, up-to-date Apache Hadoop and associated projects, as well as many enterprise features and enhancements. The new capabilities are aimed to improving flexibility, consumability, and manageability.
Here are some updates I cherry-picked based on my interests and needs:

Update to open source component levels

Here are the new versions of the open source components shipped with BigInsights 1.4:

  • Hadoop 1.0.0
  • Flume 0.9.4
  • HBase 0.90.5
  • Hive 0.8.0
  • Oozie 2.3.1
  • Nutch 1.4
  • Pig 0.9.1
  • Zookeeper 3.3.4

Consumability and Usability

  • Text analytics development: Provide an enhanced user experience for developing text analytics applications, with improved navigation of result views, enhanced sorting and filtering, enhanced pattern discovery and progress reporting.
  • Developer tools: Better built-in support for text analytics, support for local mode map-reduce development, improved deployment of applications, and automatic creation of JDBC connections to Hive data sources.
  • BigSheets and web console: New chart customization features make it effortless to access and manipulate data the way you want. New sheets, macros, and readers make it possible to access more data and analyze it in new ways, giving you more control and improved browsing capabilities for HDFS and NFS files. The addition of new application input parameter types make applications even easier to run.


Version 1.4 of InfoSphere BigInsights brings support for Cloudera Distributions of Apache Hadoop (CDH). This allows enterprises to either run BigInsights with the IBM provided Apache Hadoop distribution or to deploy to a Cloudera CDH cluster. On the other hand, Cloudera CDH users can now take advantage of enterprise-class features such as text analytics, user-friendly data manipulation and exploration, and developer tools available in BigInsights.


If you want to join the Big Data Developer Day participants, you might want to get a
first-hand experience by either getting the free version — InfoSphere BigInsights Basic Edition or whet your appetite with the Hadoop and Big Data courses on

Thursday, June 7, 2012

Why am I envious of Big Data geeks in the Valley?

  1. Hacking your way through Big Data and taking pride with Hadoop on your tool belt?
  2. Also living in the Silicon Valley? Or among the Hadoop Summit attendees?

Did you just answer yes to both questions?!?

Well then, I'm a bit envious of you because next week, Silicon Valley is the place to be for us, Hadoop and Big Data geeks. Next week, the concentration of Big Data talent will be so high in the Valley that I tend to agree with Leon on this one:

I am always envious of the people who live in Silicon Valley. It is not the California weather that I crave though it is nice. If you like technology there never seems to be a shortage of meetups, conferences and all around interesting events.

First, on June 13th and 14th, Yahoo and Hortonworks team up to bring you a Hadoop Summit with an attractive agenda and speaker list. So many tracks, so many speakers, so many talks, that I’m not even going to mention them myself; just head over and whet your appetite. :)

The very next day, June 15th, IBM is inviting people over for a free (breakfast and lunch provided) Big Data Developer Day. The day will blend in both hands on labs and interactive discussions with the opportunity to meet other technical folks and exchange knowledge.

If you are interested in

  • Hadoop scripting,
  • real time in-memory analytics,
  • Big Data for social media,
  • log analytics,
  • Big Data in general

June 15th is your full day of time well spent with senior technical leaders of our Big Data development team.

So, if you are in the Valley and you have the day available to invest it in your Hadoop and Big Data skills, I recommend you register as the number of participants is limited.