About Oozie

Oozie is an open source project that simplifies workflow and coordina¬tion between jobs. It provides users with the ability to define actions and dependencies between actions. Oozie will then schedule actions to execute when the required dependencies have been met.

A workflow in Oozie is defined in what is called a Directed Acyclical Graph (DAG). Acyclical means there are no loops in the graph (in other words, there’s a starting point and an ending point to the graph), and all tasks and dependencies point from start to end without going back. A DAG is made up of action nodes and dependency nodes. An action node can be a MapReduce job, a Pig application, a file system task, or a Java application. Flow control in the graph is represented by node elements that provide logic based on the input from the preceding task in the graph. Examples of flow control nodes are decisions, forks, and join nodes.

An Oozie workflow

Start. Pig. Decision. MR1 Job. MR2 Job. Fork. Java. HDFS. Join. MR3 Job. End.

What is Oozie?

ebook: Understanding Big Data Beyond the Hype

Stay on top of all the changes including, Hadoop-based analytics, streaming analytics, warehousing (including BigSQL), data asset discovery, integration, and governance

Get started with Hadoop

Contact Us

Contact IBM

Have questions?