University of Ulster

Teaching data mining to students

Published on 01-Oct-2010

Validated on 04 Sep 2012

"…Students are now much more confident about where they are in the mining process and what is required of them at any stage. Yet CRISP-DM, while providing this support, does not stifle creativity – as some methodologies in other areas do. Many students reported that they found this coursework very stimulating and rewarding." - Ray Hickey, Lecturer in Computing Science, University of Ulster

Customer:
University of Ulster

Industry:
Education

Deployment country:
Ireland

Solution:
Business-to-Consumer, Business Analytics, Business Intelligence

Overview

The multi-campus University of Ulster is the largest in Ireland – north or south – and its Faculty of Computing and Engineering is one of the most substantial providers of undergraduate Computing Science courses in the UK.

Business need:
With the business community taking increased interest in data mining, the University of Ulster decided to introduce formal training for graduates who would encounter it at some point in their careers.

Solution:
IBM SPSS Modeler fitted all the university’s requirements. It was a natural choice because university staff were already using it, and it would run on standard machines used by undergraduates.

Benefits:
• A Machine Learning and Data Mining course, launched in 1998 for Computing Science students, was an immediate success. • Students found it fascinating that a computer could discover knowledge that seemed hidden in data. • Students found IBM SPSS Modeler very easy to use and quickly became quite independent. • CRISP-DM (CRoss Industry Standard Practice for Data Mining) methodology was introduced as a protocol for undertaking data mining tasks.

Case Study

The situation
The multi-campus University of Ulster is the largest in Ireland – north or south – and its Faculty of Computing and Engineering is one of the most substantial providers of undergraduate Computing Science courses in the UK. The Faculty also has a large Data Mining Research Group, and there is considerable collaboration with blue-chip companies as well as with other universities in Europe, government agencies and hospitals.

The challenge
Throughout the 1990s, limited aspects of machine learning and data mining were taught to undergraduates as part of their work on artificial intelligence. However, there was still a substantial shortfall in students’ knowledge and skill levels in the area of data mining. Ray Hickey, Lecturer in Computing Science at the University of Ulster, explains:

“It was decided in 1998, that with the much increased interest in Data Mining in the business community, the likelihood was that graduates everywhere would encounter it at some point in their future careers, whether as software engineers required to build systems or to evaluate a package, or as managers required to implement data mining for more improved business financial performance.”

The solution
A committee at Ulster University decided it would be appropriate and beneficial to the development of student skills to devote a full module to the subject of data mining in the final year. The module would have to be pitched in a way that made it accessible and rewarding for a typical undergraduate majoring in Computing Science. Such students vary considerably in their mathematical and statistical backgrounds, as well in the business options they may have taken.

The overriding aims were that students should be able to assess the suitability of a data mining solution for a business problem; that they should have a good understanding of a range of commonly used techniques; and that they should be able to interpret and evaluate the results of mining activity.

In addition, it was felt to be important that they should have practical experience of data mining gained through use of industry standard software. Various staff in the faculty were already using IBM® SPSS® Modeler and had been undertaking development work aimed at enhancing its capabilities: therefore, it seemed a natural choice. A further major requirement of the software was that it would run on standard lab machines as used by undergraduates. Again, IBM SPSS Modeler fitted that requirement nicely.

The results
The module, called Machine Learning and Data Mining, began in the autumn of 1998 as an option for students on the main Computing Science degree at the Coleraine campus of the university. “It was an immediate success with about 60 students enrolling – about 80 percent of the year group – and has remained so since then”, says Mr Hickey. Right from the start, many students found it fascinating that, in a variety of application areas, a computer could discover or manufacture knowledge that seemed impossibly hidden in data – even from the sight of human experts in those fields.

Students found the IBM SPSS Modeler interface very easy to use and quickly became quite independent and adventurous. They were encouraged to find datasets in application areas that interested them, and to experiment with them. A major additional benefit was that IBM SPSS Modeler helped the students gain understanding of the mining process and get a feel for how the main routines work. The CRISP-DM (CRoss Industry Standard Practice for Data Mining)
methodology was introduced as a major feature of the module. For the coursework element, students undertake a complete data mining task using a very substantial dataset and following the guidelines provided by CRISP-DM right through from the formulation of business goals to suggestions for deployment of the results of mining.

Commenting on what has been a very successful project, Mr Hickey concludes: “This use of CRISP-DM has been very successful. Students are now much more confident about where they are in the mining process and what is required of them at any stage. Yet CRISP-DM, while providing this support, does not stifle creativity – as some methodologies in other areas do. Many students reported that they found this coursework very stimulating and rewarding.”

Products and services used

IBM products and services that were used in this case study.

Software:
SPSS Modeler

Legal Information

© Copyright IBM Corporation 2010 IBM Corporation Route 100 Somers, NY 10589 US Government Users Restricted Rights - Use, duplication of disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Produced in the United States of America May 2010 All Rights Reserved IBM, the IBM logo, ibm.com, WebSphere, InfoSphere and Cognos are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. If these and other IBM trademarked terms are marked on their first occurrence in this information with a trademark symbol (® or TM), these symbols indicate U.S. registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the web at “Copyright and trademark information” at www.ibm.com/legal/copytrade.shtml. SPSS is a trademark of SPSS, Inc., an IBM Company, registered in many jurisdictions worldwide. Other company, product or service names may be trademarks or service marks of others.