The scientific article titled “Authorship identification of documents with high content similarity” which was authored by researchers from the Knowledge Discovery Team at Know-Center, has been published in the Scientometrics journal.

The Scientometrics journal is an international Journal for all quantitative aspects of the Science of Science, Communication in Science and Science Policy. Emphasis is placed on investigations in which the development and mechanism of science are studied by statistical mathematical methods. The impact factor for this Q1 journal is 2.147.

The work by Know-Center authors Andi Rexha, Mark Kroell, Hermann Ziak and Roman Kern (from the Knowledge Discovery area at Know-Center) is inspired by the task of associating segments of text to their real authors. In this paper, the focus is on analyzing the way humans judge different writing styles. This analysis can help to better understand this process and thus simulate/mimic such behavior accordingly. The majority of the work done in this field (i.e.authorship attribution, plagiarism detection, etc.) uses content features; whereas this scientific contribution focuses only on the stylometric, i.e. content-agnostic, characteristics of authors. Two pilot studies were conducted to determine whether humans can identify authorship among documents with high content similarity. The authors’ findings could (1) help to improve algorithms such as automatic authorship attribution as well as plagiarism detection, (2) assist forensic experts or linguists to create profiles of writers, or(3) support intelligence applications to analyze aggressive and threatening messages.

We congratulate all authors on their achievement!

You can download the full paper here.

Complete bibliography:

Rexha, A., Kröll, M., Ziak, H. and Kern, R. (2018) Authorship identification of documents with high content similarity. Scientometrics.


press release

Authorship identification of documents with high content similarity