Koleva, Mariya
Farasyn, Melissa
Desmet, Bart
Breitbarth, Anne
Hoste, Veronique
Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them.
In spite of growing interest in recent years, the syntax of Middle Low German (MLG) remains an extremely underresearched area. In light of recent research showing early North West Germanic languages to be partial null subject languages (Axel 2005; Walkden 2014; Kinn 2016; Volodina/Weiss 2016), the question arises where MLG is positioned in this respect. The present article presents novel data showing that MLG had referential null subjects (RNS) and can be classified as a partial null subject language. Based on a quantitative and qualitative corpus analysis of their syntactic distribution, we argue that two types of RNS must be distinguished in MLG, null topics in SpecCP and null clitics on C.