StimulStat is a lexical database for Russian language which allows selecting words and word forms based on different parameters and finding values of various parameters for a list of words or word forms. The resource was created primarily for experimental studies of Russian. In such studies, it is often necessary to select stimuli for which some target property varies (e.g. accent), while many other properties (e.g. length, frequency, part of speech) either coincide or are carefully balanced. However, the database can also be used for many other purposes: for example, for selecting words with certain characteristics in teaching Russian, in all sorts of tests and assignments, and also for scientific papers.

The database includes parameters associated with lemma and form frequency, orthographic representation, ideal and real phonemic representation, prosodic features, grammatical features of word forms and lemmas, polysemy, homonymy, homography, orthographic and phonological neighbourhoods (groups of words with similar spelling and pronunciation), as well as subjective age of acquisition and imageability. All parameters can be found on search pages, and a complete list of them with necessary comments is included in the instruction. Values of some parameters were taken from various sources (see the list below), in this case, the advantage of this database is the possibility to take them into account simultaneously. Some parameters were calculated specifically for the database.

On the page "Additional Resources" we present a separate project in which the frequencies of various grammatical characteristics and inflectional affixes were obtained for Russian nouns on the basis of the Russian National Corpus.

At this stage the main part of the project is completed, but we continue correcting errors and optimizing the interface. We will be grateful for comments or suggestions. In addition, we plan to add a few more parameters to the database.

The project is being developed at the Laboratory for Cognitive Studies, St. Petersburg State University.


Database sources:

Frequency values are drawn from Lyashevskaya and Sharov' dictionary (for lemmas), the project "Frequency grammar of Russian" (for word forms), the CORPRES database (for phonemic representations of forms)

The following parameters were calculated by the authors based on the sources mentioned above:

How to refer to the project:

Only the initial stage of the project is described in publications so far. If you use the database, please refer to this website and to the following article:

If you use the information on frequency from the page "Additional Resources", please refer to the site and to this article:

We would be grateful if you write us about the studies, where these resources are used, this is important and interesting for us.

Please note that in certain cases it is necessary to refer to the source of information that was found with the help of our database as well (e.g. to the "Frequency dictionary of modern Russian language" when it comes to frequencies of lemmas etc).

Students who helped us