Publicly funded research project aiming at the development of innovation metrics for measuring novelty based on real-time text mining methods
Supported by the Federal Ministry of Education and Research Germany (BMBF)
Research and development are of great importance to Germany, as it is one of the leading countries in science and industry in the world. To future-proof the pole position, it is important to support the innovative activities in the most meaningful way. To this end, metrics to measure the success of innovation activities are required. This project addresses the issue and aims at further developing existing innovation and R&D measurement methods and supplementing established output indicators in the most fundamental way possible. Above all, output metrics are crucial, especially with regard to determining the effectiveness of funding measures. But established output metrics, e.g. the share of sales from new product launches, suffer at three core problems:
In many cases, they are based on key figures that just capture the economic success of innovations. It often remains unclear to what extent any market success can be attributed to concrete technical progress compared to the status quo or can be explained by advertising measures or just market trends.
Established metrics measure at best indirectly the degree of novelty of a specific innovation, i.e. the level of scientific and technical progress. For a valid assessment of the degree of innovation (e.g. from incremental to radical), rather subjective and scarcely scalable qualitative methods have to be used as a rule.
Moreover, established output indicators often only take hold of innovations after-market launch, which makes it difficult to systematically measure the degree of novelty in earlier phases of the research and development process. They are also less suitable for new types of innovation such as business model innovations and social innovations.
The lack of suitable indicators to measure novelty along the entire innovation process from research to development to market launch results in the fact that the innovation level of ideas often remains hidden from decision-makers in research funding, science and practice. This makes it particularly difficult to evaluate efforts to promote innovations.
For internal corporate and regulatory control, finer measurement instruments are needed that illuminate the aspect of novelty of an innovation and additionally make the degree of novelty measurable. In order to identify radical innovations and support the development process, it is required to unveil the degree of novelty in any development phase.
The research approach presented is positioned at the interface of innovation research and text mining and addresses the aforementioned research gap of measuring novelty explicitly. Thanks to considerable progress in data analysis, new data sources are to be exploited enabling the measurement of levels of innovation.
As society becomes increasingly digital, the amount of data available is increasing rapidly. Such data is often unstructured by nature. In the context of innovation research, this includes for example research reports, specialist articles, patent descriptions, press releases and product reviews. Despite their potential, such unstructured data sources are rarely used to measure novelty to the present day. This project aims to exploit this information to measure the degree of novelty.
Current progress in the area of Big Data and Data Mining as well as growing computing capacities now make it possible to process large, unstructured amounts of data automatically using natural language processing techniques. Due to possible automation capabilities, the procedures are not only cost-effective and scalable, but can also be carried out in real time potentially. All in all. the purpose of the project is to make unstructured information from texts which had previously gone unnoticed, such as research reports, patent descriptions and press releases, usable for measuring the novelty of an innovation.
The underlying idea is based on the following assumption: texts that describe innovations differ from all existing texts the higher the degree of novelty of the idea described. The project thus aims at making novelty of innovations measurable through the associated linguistic change. This is grounded in early works from Fleck and Kuhn. They argue that scientific and technical progress produces changes in the use of language. Current advances in Big Data and Data Mining allow these interrelationships to be investigated on a large scale. This is done via text clustering, a method for automated grouping of large amounts of text. Here, a text corpus is divided into groups of similar documents based on the contained words, so-called clusters. In this project, a method called Topic modeling will be used. It was developed in 2003 and has undergone several improvements and validations to date. For some time now, it has also been successfully used in social science research to group together large amounts of text - for example, for the analysis of scientific literature. The method was also used to investigate whether and how knowledge is recombined in patents. However, the method has not yet been used to measure the degree of innovation. The present project aims to develop a corresponding approach as follows.
At its core, documents such as research reports are grouped thematically up to a defined point in time using the clustering algorithms mentioned above. Documents that are created after the defined point in time can now be checked with regards to the extent to which they can be integrated into the existing groupings. The cluster assignment probabilities of the newer documents form the basis for the proposed new metric. The decisive factor for this is the "linguistic" distance between younger documents and already existing documents. Documents with a large linguistic distance from the existing body of text, i.e. the existing state of research, are candidates for particularly novel innovations. The calculated distances form the basis for the proposed new metric and allow the calculation of a degree of novelty and thus the positioning of an innovation on the spectrum from incremental to radical. Documents with a large linguistic distance from the existing body of text, that is, the current state of research, are candidates for particularly innovative innovations. Hence, this approach makes use of the fact that new things tend to be difficult to grasp in existing linguistic categories.
This procedure can be carried out on the basis of any type of text generated along the entire research and development process. Thus, the level of innovation can be recorded systematically and potentially in real time in various process phases, depending on whether draft proposals (research funding), scientific publications (scientific research), patents (development), or texts in connection with the market launch such as product announcements, press releases and product reviews (innovation diffusion) are used as a text basis. The new method thus also contributes to systematically recording innovative developments with a shorter time delay (early indicators).
In harmony with the project’s orientation towards basic research, the development aims at making the results available to the public. The central idea of the developed method for novelty measurement shall be made accessible to the expert audience from economy and society in a way, that allows for an implementation of the new method according to own needs. Following Open Source philosophy, the prototype of the method is to be published in source code.
Thus, the project further develops current social science research in the field of innovation by opening up new data sources on the one hand and on the other hand by providing a new method to measure novelty. The availability of the new output indicator provides a new tool for investigating the determinants of levels of innovation. The availability of the algorithm as a software program under an open source license should above all ensure the further development of the method beyond the funding period and allow other researchers to test the procedure on a wide variety of data sources.
In addition, the new methodology offers companies and public institutions an instrument for controlling and evaluating their innovation activities. It can be used to assess entire research programs or research on specific technologies in a defined industry.