Skip to main content



graduated technologies

technologies that jStart is no longer actively working on



Using semantic enrichment to enhance big data solutions

How can semantics be applied to big data technologies to enhance and enrich the solutions being created today? jStart has been busily examining this question--and applying it to business challenges facing its clients.

Semantics is not exactly new ground for the team...afterall, text analytics has components of semantics built into it's very concept--and jStart has worked with semantic technologies for many years. The team, however, has been examining on a broader application of the notion of semantics and how it can be used to enrich and enhance the analytics which is being performed on big data.

The vision of semantics and its application to big data

The concept of the semantic web has been around since 1999, when Tim Berners-Lee expressed his vision of a future web in which computers could understand the context of human speech and thought, to be able to "understand" our meaning when expressing ourselves. Today, we are able to approach this ideal by leveraging sophisticated analytics with tremendous amounts of data--it's how IBM's Watson super computer managed to successfully compete on the television game show Jeopardy! recently. In this way, semantic concepts can be leveraged to enrich and enhance today's big data technologies.

Using semantic concepts with today's big data solutions

If you think about it, almost any big data solution already uses many of the components which lend themselves to semantic enrichment: sophisticated data analytics, text analytics, and significantly large amounts of data. Perhaps your big data solutions already use some form of semantic enrichment today...by:

  • understanding context of inflow data to generate intelligent responses
  • creation of proactive systems/oth/responses based on aggregated data
  • increased accuracy of projections and trends based on enhanced understanding of the context of data from a multiplicity of data sources
  • enhanced reaction times based on evolving understanding of context based on continuous data stream feeds

And those are just some of the ways Semantic Enrichment can be leverage with Big Data.

A jStart Graduated Technology

What do we mean by Semantic Enrichment?

Fundamentally, we mean enriching the content/context of data by tagging, categorizing, and/or classifying data in relationship to each other, to dictionaries, and/or other base reference sources. At its simplist, this means adding additional contextual information to some existing data set (think of adding traffic data to road maps where the traffic data provides context of road conditions, probability of delay, length of projected obstructions, condition of road, etc.). A more advanced way of thinking about semantic enrichment is to imagine a relatively primitive form of machine learning that is achieved by having a system automatically enrich it's own understanding of the context and content of the data it's receiving, by comparing it to it's existing knowledge store and then building upon that store. Each day the system would enrich it's understanding of its environment, leading to deeper insights, and hopefully, to more accurate projections and analysis.

"I have a dream for the Web [in which computers] become capable of analyzing all the data on the Web--the content, links, and transactions between people and computers. A 'Semantic Web', which should make this possible, has yet to emerge, but when it does, the day-to-day mechanisms of trade, bureaucracy and our daily lives will be handled by machines talking to machines."

Tim Berners-Lee, Weaving the Web (1999)

Practical Business Value

As you might imagine, the business value created by such systems would be significant. Imagine a climate system which pulls data from every weather station on the globe, and which continues to gather data which may be relevant to environmental forecasts (amount of man made structures from mapping software, pollution projections based on demographic and power generation requirements, cross referential data which combines information from numerous sources--perhaps salinity levels, ocean albedo/diffuse reflectivity, etc.--to project evaportation trends and add that as a factor in a precipitation forecast, etc).

In short: it's not just about the amount of data you have. It's about understanding what that data is trying to tell you--and revealing non-obvious factors which influence that understanding. Semantic enrichment is yet another tool to do this--and which can be applied in almost any business scenario in which big data is a part of the equation.

Resources and Links

Key Insights