Simple Definition of Distances between Texts from Rank-frequency Distributions. A Case of Ukrainian Long Prose Works by Ivan Franko (2024)

Glottometrics 46 2019 RAM-Verlag ISSN 1617-8351 e-ISSN 2625-8226

Glottometrics Indexed in ESCI by Clarivate Analytics and SCOPUS by Elsevier Glottometrics ist eine unregelmäßig er-schei- nende Zeitdchrift (2-3 Ausgaben pro Jahr) für die quantitative Erforschung von Sprache und Text. Beiträge in Deutsch oder Englisch sollten an einen der Herausgeber in einem gängigen Textverarbeitungssystem (vorrangig WORD) geschickt werden. Glottometrics kann aus dem Internet herun- tergeladen werden (Open Access), auf CD- ROM (PDF-Format) oder als Druck-version bestellt werden. Glottometrics is a scientific journal for the quantitative research on language and text published at irregular intervals (2-3 times a year). Contributions in English or German written with a common text processing system (preferably WORD) should be sent to one of the editors. Glottometrics can be downloaded from the Internet (Open Access), obtained on CD- ROM (as PDF-file) or in form of printed copies. Herausgeber – Editors G. Altmann Univ. Bochum (Germany) ram-verlag@t-online.de S. Andreev Univ. Smolensk (Russia) smol.an@mail.ru K.-H. Best Univ. Göttingen (Germany) kbest@gwdg.de R. Čech Univ. Ostrava (Czech Republic) cechradek@gmail.com E. Kelih Univ. Vienna (Austria) emmerich.kelih@univie.ac.at R. Köhler Univ. Trier (Germany) koehler@uni-trier.de H. Liu Univ. Zhejiang (China) lhtzju@gmail.com J. Mačutek Univ. Bratislava (Slovakia) jmacutek@yahoo.com A. Mehler Univ. Frankfurt (Germany) amehler@em.uni-frankfurt.de M. Místecký Univ. Ostrava (Czech Republic) MMistecky@seznam.cz G. Wimmer Univ. Bratislava (Slovakia) wimmer@mat.savba.sk P. Zörnig Univ. Brasilia (Brasilia) peter@unb.br External Academic Peers for Glottometrics Prof. Dr. Haruko Sanada Rissho University,Tokyo, Japan (http://www.ris.ac.jp/en/); Link to Prof. Dr. Sanada:: http://researchmap.jp/read0128740/?lang=english; mailto:hsanada@ris.ac.jp Prof. Dr.Thorsten Roelcke TU Berlin, Berlin, Germany ( http://www.tu-berlin.de/ ) Link to Prof. Dr.Roelcke: http://www.daf.tu- berlin.de/menue/deutsch_als_fremd_und_fachsprache/mitarbeiter/professoren_und_pds/prof_dr_thorst en_roelcke mailto:Thosten Roellcke (roelcke@tu-berlin.de) Bestellungen der CD-ROM oder der gedruckten Form sind zu richten an Orders for CD-ROM or printed copies to RAM-Verlag RAM-Verlag@t-online.de Herunterladen/ Downloading: https://www.ram-verlag.eu/journals-e-journals/glottometrics/ Die Deutsche Bibliothek – CIP-Einheitsaufnahme Glottometrics. 46 (2019), Lüdenscheid: RAM-Verlag, 2019. Erscheint unregelmäßig. Diese elektronische Ressource ist im Internet (Open Access) unter der Adresse https://www.ram-verlag.eu/journals-e-journals/glottometrics/ verfügbar. Bibliographische Deskription nach 46 (2019) online/ e-version ISSN 2625-8226 (print version ISSN 1617-8351)

Related Papers

Text length and the thematic concentration of text

Mathematical Linguistics, 2016

Miroslav Kubát

The impact of text length very often biases results of stylometric indices which are based on rank-frequency distribution (e.g. type-token ratio, repeat rate, entropy). The aim of the article is to observe the relation between text size and thematic concentration indicators (TC, STC). The corpus consists of 1471 English texts of various genres. The obtained results show that thematic concentration is independent of text length in the interval <200; 6500>. Given that the analysis corroborates the findings of the previous research in Czech language, TC and STC seem to be reliable stylometric indicators applicable to text analyses of different languages.

Download

Comparison of distance and similarity measures for stylometric analysis of Lithuanian texts

Daumantas Stanikūnas

Constant developments in information and computer technologies make it possible to handle constantly increasing amount of data, thereby expanding the research possibilities. In this article, we discuss and compare distance and similarity measures used in stylometric analysis which could be applied to analyze Lithuanian texts. As corpus for the analysis, transcripts of parliamentary debates by two politicians of the Lithuanian Parliament were chosen. Furthermore, comparison of distance measures, stylometric analysis and visualization were performed. Objective of the experiment was to identify what measures would perform better when executing stylometric analysis of Lithuanian texts and explore where these differences in the performance occur. Summarizing the experiment results, the recommendations are as follow: number of Most Frequent Words used should be at least 1000, Eder’s Simple Delta measure can be used in general stylometric analysis of transcriptions of parliamentary debates...

Download

Word Length and Frequency Distributions

Peter Grzybek

Download

Measuring Structural Distances between Texts

Fabrizio Biondi

Download

Rank Distance as a Stylistic Similarity

Liviu P. Dinu

Download

Probing the Statistical Properties of Unknown Texts: Application to the Voynich Manuscript

PLoS ONE, 2013

Eduardo Altmann

Download

On variation of word frequencies in Russian literary texts

Physica A: Statistical Mechanics and its Applications

Vladislav Kargin

Download

On the nature of long-range letter correlation in texts

Dmitrii Manin

The origin of long-range letter correlations in natural texts is studied using random walk analysis and Jensen?Shannon divergence. It is concluded that they result from slow variations in letter frequency distribution, which are a consequence of slow variations in lexical composition within the text. These correlations are preserved by random letter shuffling within a moving window. As such, they do reflect structural properties of the text, but in a very indirect manner.

Download

Keep reading this paper — and 50 million others — with a free account

Sign up or log in to read or download the full paper for free.

Related Papers

Analytical Distribution Model for Syntactic Variables Average Values in Russian Literary Texts

Communications in Computer and Information Science, 2019

Tatiana Sherstinova

Download

PDF

Quantitative Text Typology: The Impact of Word Length

Peter Grzybek

Download

PDF

Analysis of Stylometric Variables in Long and Short Texts

Procedia - Social and Behavioral Sciences, 2013

Gerardo Sierra

Download

PDF

The Rank-Frequency Analysis for the Functional Style Corpora in the Ukrainian Language

Computing Research Repository, 2003

Andrij Rovenchak

Download

PDF

Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

PLOS ONE, 2015

Alvaro Corral

Download

PDF

RESEARCH ARTICLE Zipf’s Law for Word Frequencies: Word Forms versus Lemmas in Long Texts

Ramon Ferrer-i-Cancho

Download

PDF

A geometrical approach to literary text analysis

The Workshop Programme

Roberto Basili

Download

PDF

Quantitative Analysis of Poetic Texts

Quantitative Analysis of Poetic Texts, 2015

Mihaiela Lupea

Download

PDF

Rank-Frequency Analysis for Functional Style Corpora of Ukrainian

Journal of Quantitative Linguistics, 2004

Andrij Rovenchak

Download

PDF

Improved distance measures for 'fixed-content miscellanies': an adaptation for the collections of sayings of the desert fathers and mothers 1

Digital Scholarship in the Humanities Advance Article, 2022

Elisabet Göransson, Britt Dahlman

Download

PDF

Competition between two kinds of correlations in literary texts

Physical Review E, 2005

Vladyslav Golyk

Download

PDF

On the origin of long-range correlations in texts

Proceedings of the National Academy of Sciences, 2012

Eduardo Altmann, Giampaolo Cristadoro

Download

PDF

Comparative Computational Analysis of Global Structure in Canonical, Non-Canonical and Non-Literary Texts

ArXiv, 2020

Volker Gast

Download

PDF

Improved distance measures for ‘fixed-content miscellanies’: an adaptation for the collections of sayings of the desert fathers and mothers

Digital Scholarship in the Humanities

Karine Åkerman Sarkisian

Download

PDF

The Problems of Measuring Sentence-Length in Classical Texts

Studia Linguistica, 1964

Tore Janson

Download

PDF

Short Text Coherence Hypothesis

Marios Poulos, Sozon Papavlasopoulos

Download

PDF

Total rank distance and scaled total rank distance: two alternative metrics in computational linguistics

Proceedings of the Workshop on Linguistic Distances, 2006

Anca Dinu

Download

PDF

Word-length Entropies and Correlations of Natural Language Written Texts

Journal of Quantitative Linguistics, 2015

Vassilios Constantoudis

Download

PDF

Textual characteristics of different-sized corpora

R. Remus

Download

PDF

The Dynamics of Extensive Text Variables in Russian Short Stories

Tatiana Sherstinova

Download

PDF

The NP-INDEX. A user-friendly method for quantitative textual analysis [SOC 17.2 (2013)]

Valerio Polidori

Download

PDF
Simple Definition of Distances between Texts from Rank-frequency Distributions. A Case of Ukrainian Long Prose Works by Ivan Franko (2024)
Top Articles
Latest Posts
Article information

Author: Gov. Deandrea McKenzie

Last Updated:

Views: 5516

Rating: 4.6 / 5 (66 voted)

Reviews: 89% of readers found this page helpful

Author information

Name: Gov. Deandrea McKenzie

Birthday: 2001-01-17

Address: Suite 769 2454 Marsha Coves, Debbieton, MS 95002

Phone: +813077629322

Job: Real-Estate Executive

Hobby: Archery, Metal detecting, Kitesurfing, Genealogy, Kitesurfing, Calligraphy, Roller skating

Introduction: My name is Gov. Deandrea McKenzie, I am a spotless, clean, glamorous, sparkling, adventurous, nice, brainy person who loves writing and wants to share my knowledge and understanding with you.