jStart Pulse: tracking the latest in emerging technologies on the web, in social media sites, and in interactions between influencers...
Part of the job of jStart is to keep a pulse of what's trending and occuring in the field of emerging technology. To do this, we leverage a number of sites, data sources, and conduct our own analysis. This page gives you some insight into how we're keeping our fingers on the pulse of emerging technology.
Quantifying the value of social data
The concept of social data analytics is a well understood one: monitoring and measuring the marketplace via social media is something that many technologies (and companies built around those technologies) have been doing for some time. When jStart started looking at the existing solutions, however, we noticed a curious gap: while much effort was being put into performing analytics on large data sets of user generated content in the form of tweets, posts, etc., efforts into weighing the value of that content based on the influence of the user was limited. In short, the voice of every user is not equal--some users are simply more influential than others (think Barack Obama vs. a random voter).
Not only are all users not equal, but their influence varies by the context in which you're analyzing their tweets. While the US President might be highly influential in the field of politics, he may be less influential in the field of music, for example. The same holds true for any industry or field. This was something we learned from our numerous social data analytics engagements. Given that, we discovered that it was critically important not just to understand the influence of an influencer, but the context in which that influencer is being analyzed. To do that, a system has to be able to analyze the text of a user's content...and measure the impact of the text into every community in which the content is being viewed/distributed.
Technologies used in pulse
Text Analytics: keywords + influencers
One of our key insights during the development of this project, was understanding that the true influence of an individual can not be divorced from the context and content of that user--and the impact of their words into the community which is being observed. In other words, you can't separate influencer from keyword. But that, in itself, posed a challenge: how do you understand the context of a tweet or post? To do that, we needed to do sophisticated text analytics to be able to decompose and analyze each message. By leveraging IBM Research's English Slot Grammar technology we were able to break down each tweet into its individual grammar components, and from there, pull out the parts of the tweet that would lend itself to identifying technologies and trends which might be of interest to our system.
Building the solution
At first the team tackled the problem from a bottom's up approach--by scanning millions of web pages and social streams, an attempt was to discern from the massive amount of "chatter" patterns and terms of interest. Basically Big Data analytics in a nutshell. But then, we realized, we could approach this problem in a more concise manner: identify key influencers for the specific areas of interest we had, then listen to what they had to say and measure it's impact. By doing so, we reduced our analytics requirements by orders of magnitude.
Over time, we continued to expand the system by adding in brand new open source technologies, IBM technologies, and even jStart proprietary algorithms. We also decided to make Pulse a cloud application, and moved it to IBM's brand new next-gen cloud platform: BlueMix. Once we established our new approach, we started fine tuning our algorithms: since we wanted to largely automate how the system identified new influencers of interest, begin to monitor their messages, and build out the relationship maps, we created custom algorithms which provided scoring for not only the influencers, but the context of their messages/tweets/posts. This allowed the system to handle the vast majority of work to surface, identify, and establish the key influencer relationship mapping, as well as a similar mapping for keywords and content.
While Pulse was still under development, it had already yielded results for the team: we had a good handle on key influencers in the emerging technologies space, for instance. We were also using the tool to monitor for new technologies and trends as they emerge in our community of interest. And we were able to view, at any given point in time, the most influential users (not based on our subjective rankings, but based on how much chatter those users are causing within our community), the keywords of highest interest, and even tie those to specific events (new technology announcements, conferences, current events). This gave us a world-wide, unprecedented view, "the pulse", if you will, of our key market. Plans for Pulse included being able to use the data we've been gathering over the past year to create baselines by which we can add in predictive analysis and project the probabilities of each new technology--and even influencer--through time. It's not just about the past, or the present. For us, it was about understanding the pulse of the future.
Resources + Links
Related concepts & technologies
- Pulse measures influence within context--someone who is influential in one area may not be influential in another.
- Pulse is a cloud application that leverages the latest in open source and research technologies.
- While Pulse understands and tracks technologies as they surface in real-time, today, future versions of pulse will add in a predictive analytics perspective.