ID: Korlex-Croatian-Resource RESOURCE DESCRIPTION The lexical resource Korlex-Croatian-Resource provides a list of 118,252 Croatian lemmas, i.e., words in canonical form, annotated with part-of-speech (POS) tag and lexical features. The resource is a flat textual file in which each textual line contains information about one lemma. The format of a line can be captured with the following Perl regular expression: /^(.*\S)\t+(:\w+)(.*)$/; where $1 is lemma, $2 is POS tag, and $3 is a concatenated list of features. For example in: automobil :nn:m the lemma is "automobil", the POS tag is ":nn" and the lemma is annotated with one feature ":m". A lemma may contain the hash sign (#), in which case it denotes a frequently misspelled form. For example, in mijesec#mjesec :nn:m:x "mijesec" is an incorrect form, followed with a correct form "mjesec". Additionally, the incorrect forms are marked with the feature ":x". The resource is encoded using ISO-8859-2 encoding, and sorted according to the standard Croatian lexicographic order. The resource statistics is presented below: Table of POS Tags Tag Part of speech Count ------------------------------------------------------ cc coordinate conjunction 3 cd cardinal number 98 cs subordinate conjunction 33 etc list continuation (etc.) 5 in preposition 73 jj adjective 25619 jjr adjective, comparative 7772 jjs adjective, superlative 7770 nn noun 48025 nnc collective noun 112 nnp proper noun 4230 nns noun, plural (no regular singular form) 83 od ordinal number 70 pr pronoun 302 rb adverb 8985 spec special syntactic tag (e.g., clitic 71 and auxiliary verbs) uh interjection 64 vb verb 14937 ------------------------------------------------------ 118252 Table of Features Tag:Features Description Count ----------------------------------------------- :in:x incorrect form 1 :jj:f gender feminine 50 :m gender masculine 25464 :n gender neuter 105 :pl plural 6 :x incorrect form 64 :jjr:f gender feminine 3 :m gender masculine 7769 :pl plural 1 :x incorrect form 4 :jjs:m gender masculine 7770 :x incorrect form 3 :nn:dim diminutive 248 :f gender feminine 17831 :m gender masculine 17675 :n gender neuter 12519 :pl plural 61 :x incorrect form 143 :nnc:f gender feminine 25 :m gender masculine 4 :n gender neuter 83 :nnp:dim diminutive 1 :f gender feminine 838 :m gender masculine 3326 :n gender neuter 66 :pl plural 62 :nns:dim diminutive 17 :f gender feminine 57 :m gender masculine 11 :n gender neuter 15 :od:x incorrect form 1 :pr:dt demonstrative 52 :f gender feminine 95 :m gender masculine 117 :n gender neuter 67 :pl plural 104 :sg singular 180 :wh interrogative 22 :rb:wh interrogative 13 :x incorrect form 3 :vb:x incorrect form 128 -----------------------------------------------