Data mining

Competence
Data mining refers to finding new and useful regularities in large data masses. VTT offers deep and comprehensive competence for all areas of data mining, especially for numerical and textual data mining. We also have expertise in risk mining, bioinformation mining and knowledge representation using ontologies. We produce new innovations and solutions for companies to better utilise existing information. We have several application areas including (but not limited to) Analytical Customer Relationship Management (aCRM), bioinformatics, risk mining, telecommunications and fault diagnostics.
We do analysis for real world cases and use real world data. We implement applications and complex systems that utilise a great variety of data mining methods. We do not limit ourselves to just one data mining method in our solutions but select the applicable methods based on the problem at hand. We combine data and text mining methods and knowledge representation techniques in order to be at the frontline of research in several domain areas.
In our research we aim to find innovations and technical breakthroughs together with companies from different fields. We also help companies apply current technology in the best possible way in confidential consulting projects and assignments. In short, we refine data and bring the benefits to our customers.
Challenges
The amount of data available for data mining is increasing rapidly each year. This data holds a lot of information that can be utilised in several ways. Finding new and interesting regularities from data and utilising them is the first challenge in data mining. Another challenge is the number of different data storages that can be used in data mining. Data commensuration and cohesion is important in order to use all the data available. Integrating data from different sources is important for getting more comprehensive results.
How to store and represent the knowledge received from the mining process is another relevant challenge. As data mining can produce complex results about the relationships of different entities, the usual methods using simple databases are not enough.
The rise of social media, e.g. blogs, social sites such as Facebook, Youtube, etc., creates new challenges to text mining. One challenge is that the text is usually written in an informal style using abbreviations. However, social media mining will be an important area of text mining in the near future. One example of social media mining is learning consumers’ opinion towards a given product automatically from blogs.
Solutions
We have expertise in several areas of data mining as we have faced many of the problems linked with the research area. Our research includes the following.
-
Mining associations between entities from numerical and textual data.
-
Finding relationships between biological entities from medical publications.
Here we use ontologies for representing the knowledge we mine from the
different publications. The results can be used for example in drug discovery
(Transcendo - software called OAT is part of VTT’s software family).
-
Integration of different sources of data relative to business intelligence.
Here we use data and text mining methods for finding relationships between,
for example, employees and projects, companies, and patents. We also use
ontologies to integrate data (BICase - part of VTT’s software family).
-
Data commensuration for creating models to compare different attributes of
cars. We used self-organizing maps and mathematical models along with fuzzy
logic for making it possible to compare different attributes of cars
(SmartPick).
-
Anomaly detection related to, for example, server log data and network
security.
-
Recommendation algorithm research (content-based and collaborative) in
ubiquitous environments. Cultural event recommendations in Web pages.
Benefits
With our extensive experience of knowledge representation and data and text mining methods we can provide both consultation and implementation assistance in practically all tasks related to data mining. We have expertise in several domain areas which can provide great assistance in any case.
References and merits
Contract projects in several fields of data mining - for example, in classifying feedback messages using text mining methods.
Several scientific papers published for many international conferences. For example:
-
Mika Timonen, Antti Pesonen, Combining Context and Existing Knowledge When
Recognizing Biological Entities - Early Results, PAKDD08, Osaka, Japan, May
2008.
-
Teemu Mutanen, Jussi Ahola, Sami Nousiainen, “Customer Churn Prediction - A
Case Study in Retail Banking”, The 17th European Conference on Machine
Learning and the 10th European Conference on Principles and Practice of
Knowledge Discovery in Databases (ECML/PKDD). Workshop on Practical Data
Mining: Applications, Experiences and Challenges. Berlin, Germany, 18-22 Sept.
2006.
Thesis: Teemu Mutanen, Consumer Data and Privacy in Ubiquitous Computing, Master's Thesis, to be finished & submitted at the beginning of 2007.
-
Master's Thesis: Mika Timonen, Implementation of Ontology-Based Biological
Knowledge Base, Master's Thesis, Finished and submitted February 2007.
-
Jussi Ahola, Mikko Hiirsalmi, Using SOM-Based Resampling Techniques to Boost
MLP Training Performance, Proceedings of the Ninth Scandinavian Conference on
Artificial Intelligence (SCAI 2006), Espoo, Finland, 25-27 October 2006.
-
I. Karanta, A. Pesonen, L. Seitsonen, P. Silvonen, A Text Mining System for
Bioinformatics: Requirements and Architecture, ICDM 2006, Leipzig, Germany.
Additional information
Olli Saarela
Senior Research Scientist, Team Leader
+358 20 722 7497
Antti Pesonen
Senior Research Scientist
+358 20 722 3867
Sami Nousiainen
Senior Research Scientist
+358 20 722 6529
