Data Science for Innovation

The research unit Data Science for Innovation develops digital tools for knowledge transfer and thus supports the innovative capacity of industry and research. To this end, the researchers build up an extensive data base and develop intelligent applications based on it using machine learning and network analyses. With their competencies in the field of data science and artificial intelligence, they support other researchers and companies and can contribute significantly to the successful transformation of economic and social structures.

Our methods

The researchers use advanced data science methods such as natural language processing, large language models (LLMs), retrieval-augmented generation (RAG), network analysis, and graph machine learning (GraphML). 

  • Natural language processing (NLP) is a branch of artificial intelligence that aims to enable machines to read, understand, and interpret human language. Building on this, large language models (LLMs) are advanced AI models trained on massive text datasets to understand and generate human-like text. They can handle diverse and complex tasks such as conversational AI and content creation.

    The retrieval-augmented generation (RAG) method optimizes the results of LLMs by accessing an external, authoritative knowledge base before generating a response. This approach ensures that the generated information is not only based on the model’s training data but is supplemented by current and domain-specific facts.

    The research unit combines these methods to tap into the extensive data base of publications, patents, and websites. They form the core of intelligent applications that structure knowledge and make it usable for knowledge transfer. RAG is specifically employed to underpin the LLMs’ responses with valid information from the constructed graph databases, thereby maximizing the precision and relevance of the results.

  • Network analysis is a method in graph theory used to examine relational structures. It represents complex systems as a network of nodes (actors) and edges (connections) in order to analyze social structures and interactions. It helps identify patterns, clusters, and key actors within a network.

    Topic modeling is a statistical method for identifying and extracting thematic structures in large text collections. Algorithms such as Latent Dirichlet Allocation (LDA) or modern neural network approaches uncover hidden topics in documents and analyze their distributions. This method enables the automatic recognition of semantic relationships and the thematic structuring of large text corpora.

    The research unit uses network analysis for the regional and thematic analysis of innovation ecosystems. By visualizing and examining actor networks, key organizations and their collaborations are identified. Building on this, topic modeling is used to automate and optimize the matching of actors and content from academia, industry, and politics by uncovering thematic patterns and potential synergies in the data.

  • A graph database is a specialized database system that stores and manages data as a network of nodes and edges. This structure makes it possible to efficiently model and query complex relationships between entities. Graph databases are particularly well-suited for applications that need to process highly interconnected data and perform complex relationship analyses.

    Recommender systems are algorithms that provide users with personalized recommendations for products, content, or services. They analyze a user’s past behavior as well as similarities to other users to generate relevant and appropriate suggestions. The goal is to improve the user experience and proactively provide relevant information.

    For the research unit, graph databases form the technological foundation for building robust and interconnected data infrastructures. They structure knowledge from publications, patents, and websites and make it accessible for AI applications. Building on this, recommender systems are developed that efficiently match actors and content from science, industry, politics, and society, thereby actively supporting knowledge transfer.