In the next part of the study we aimed to obtain information on frequency of various grammatical features of Russian language nouns, based on the disambiguated subcorpus of Russian National Corpus (RNC
). One of the goals was to determine how frequent forms of nouns in different genders, cases and numbers, animate and inanimate nouns are, how these characteristics depend on inflectional paradigms (on the declination and the type of base) and how they correlate with each other. The second goal was to determine the frequency of different forms with different endings (depending on case, number, gender and declination and regardless of them).
These data are represented in a small database which can be downloaded.
Obviously, if one needs to compare, for example, the frequency of two cases this information can be easily obtained from the Russian National Corpus without using any database. The database is needed in order to get an overall picture (for example, the frequency of two cases of interest in comparison to all other cases) and to be able to subsequently include new factors, for example, number or animacy. After all, although various parsers and other similar programs developed for Russian are based on such statistics, the relevant information has not been presented in the public domain in the summarized form yet. It is also important to note that there are several projects studying frequency of case forms and other forms in Russian (for example, Kopotev 2008; Lyashevskaya 2013). However, they primarily aim to describe unusual features of the paradigms of individual words.
Data on the frequency of word forms taking into account different inflectional classes and frequency of affixes are required for theoretical and experimental linguistic research, especially for the full range of usage-based approaches (Baayen 2003; Bybee 2006; Dressler 1985; Milin et al. 2009; Moscoso del Prado Martín et al. 2004 and others) and for all mental lexicon models: whatever approach they adopt, frequency always plays a particular role. These data are important by themselves (for example, exploring how grammatical categories of gender, number, case are represented in the mental lexicon, we need to know know the frequency of various features), and also for solving auxiliary problems, for example, selecting stimuli for psycholinguistic experiments.