The history of the application of mathematical methods in linguistics. Prospects for the application of mathematical methods in linguistics

Penetration into linguistics mathematical methods and "mathematical spirit" contributed to the development of linguistics in the direction of accuracy and objectivity. However, on her way further development there are serious obstacles in this direction. The author reflects on the reasons for the convergence of linguistics and mathematics, on the limits of applicability of mathematical methods in linguistics, and on the nature of factors preventing mutual understanding between mathematicians and linguists.

When, in the second half of the 1950s, some young linguists thought about applying mathematical methods to study the structure of language and began to collaborate with mathematicians, this caused surprise and even shock among very many of their colleagues - after all, they were convinced from childhood that humanitarian sciences, one of which is linguistics, do not have and cannot have anything in common with mathematics and other "exact" sciences.

Meanwhile, the existence of a close connection between natural language and mathematics was not at all a new discovery at that time. L. S. Vygotsky wrote in his book “Thinking and Speech” published in 1934: “The first to see in mathematics thinking that originates from language, but overcomes it, was, apparently, Descartes” and continued: “Our usual colloquial due to its inherent fluctuations and inconsistencies of grammatical and psychological, it is in a state of mobile equilibrium between the ideals of mathematical and fantastic harmony and in an unceasing movement, which we call evolution.

Arising in Ancient Greece the doctrine of grammatical categories was already a description of a number of the most important aspects of the structure of the language with the help of abstract models close in style to those models that were created by ancient Greek mathematicians to describe spatial forms; only the familiarity of such concepts as case, gender, etc., which have become, as H. Steinthal wrote, “our second nature”, prevents us from understanding what a high level of abstract thinking their creation required. So you should be surprised rather than that the first attempts to use real mathematical means to describe the linguistic “ideal of mathematical harmony” were made only in the middle of the twentieth century.

There are two reasons for this "delay". Firstly, the science of language, after significant steps taken in ancient era, began to really develop again only in the 19th century, but throughout this century the main attention of linguists was turned to the history of the language, and only in the next century, which in general was the age of structuralism for the humanities, did linguistics for the first time after the ancient period turned to the study of language structures, but at a new level. When linguists realized that language is, in the words of F. de Saussure, a “system of pure relations”, i.e. a system of signs, the physical nature of which is insignificant, and only the relations between them are significant, the parallel between language and mathematical constructions became quite obvious , which are also "systems of pure relations", and already at the beginning of the twentieth century, the same de Saussure dreamed of studying the language by mathematical means.

Secondly, quantitative methods came to the fore in mathematics at the beginning of the New Age, and only in the 19th century did mathematicians again begin to build non-quantitative abstract models that differed more from the ancient ones. high level abstractions, and also - which is especially important for our topic - by the fact that they could be used to describe a much wider range of phenomena than spatial forms; often such models turned out to be convenient and even necessary means to study phenomena that the mathematicians who built them did not think at all and did not even know about their existence. Among these models were those that later received application in linguistics; especially intensive development of mathematical disciplines, the content of which was their construction, occurred in the first half of the twentieth century. Therefore, the meeting of mathematics and linguistics in the middle of this century was quite natural.

One of the results of this meeting was the emergence of a new mathematical discipline - mathematical linguistics, the subject of which is the development of a mathematical apparatus for linguistic research. The central place in mathematical linguistics is occupied by the theory of formal grammars, which, by the nature of the apparatus used in it, is related to mathematical logic and, in particular, to the theory of algorithms. It provides formal methods for describing the correct language units. various levels, and also, which is especially important, formal methods for describing the transformations of language units - both at the same level and between levels. The theory of syntactic structures adjoins the theory of formal grammars, which is much simpler in terms of apparatus, but no less important for linguistic applications. In mathematical linguistics, analytical models of the language are also being developed, in which, on the basis of certain - considered known - data on “correct texts”, formal constructions are made, the result of which is a description of some “ constituent parts» language mechanism. In this way, one can obtain a formal description of some traditional grammatical concepts. This should also include the description of the meaning of the sentence using the apparatus of intensional logic (“Montagu semantics”).

Of course, with the help of the mathematical apparatus, only one of the two ideals of language that Vygotsky spoke about can be described; therefore, the often heard objections to the use of one or another mathematical model (or mathematical models in general) on the grounds that it does not cover such and such special cases do not make sense: to describe the “fluctuations and inconsistencies” inherent in the language, one needs absolutely other, non-mathematical means, and just a clear description of the "mathematical ideal" could help to find them, since it would make it possible to clearly delimit the "fantastic" from the "mathematical" in the language. But this is still a matter for the future.

No less, and perhaps more important than the emergence of mathematical linguistics, was the direct penetration into linguistics of fundamental mathematical ideas and concepts - such as set, function, isomorphism. In modern linguistic semantics, the concepts of predicate and quantifier, which came from mathematical logic, play an important role. (The first of them arose in logic even when it was not distinguished from linguistics, and now it has returned to linguistics in a generalized and mathematically processed form.)

And finally, very great importance has a refinement of the language of linguistic research, which occurs due to the penetration of the “mathematical spirit” into linguistics, not only in those areas where it is possible to use mathematical ideas and methods. All this can be briefly summarized as follows: linguistics is becoming more and more accurate and more objective science - without ceasing, of course, to be a science of the humanities.

However, on this natural way The development of linguistics faces serious obstacles that can slow it down for a long time. The main one is the “separation of faculties” that arose at the beginning of the New Age: natural scientists and mathematicians, on the one hand, and humanitarian scientists, on the other, are not interested in the work of colleagues “in another faculty” and, moreover, deep down, and often openly despise them. . Mathematicians and natural scientists (and even more “techies”) tend to see humanities research as just a kind of “decoration” or even “idle chatter”, while “humanities” are ready to tolerate mathematics and natural sciences only for the sake of practical benefit and are convinced that they are nothing can help to understand the nature of the human spirit.

Only in the middle 19th century in this, in the words of the great biologist and great thinker Konrad Lorenz, "the evil wall between the natural sciences and the humanities (die böse Mauer zwischen Natur- und Geistwissenschaften)" the first breach was made in the thinnest place separating logic from mathematics. In the 20th century, other gaps appeared - among them the one that was punched from both sides by mathematicians and linguists - but they are still few, the wall is still strong, and there is no shortage of efforts on both sides to strengthen it further and patch up the holes. Often these efforts are quite successful; the latest "achievement" in this direction is "profile education" in high school which already in childhood divides capable and interested people into "faculties" and teaches them to be proud of ignorance in "foreign" sciences - can greatly hinder the further convergence of the natural and human sciences, which is urgently necessary for the normal development of both. One of the consequences of erecting the wall is that "humanities", including the vast majority of linguists, know nothing even about the basics of precisely those sections of mathematics that have highest value for the humanities (and imagine a mathematician as a person who is exclusively occupied with calculations).

Another obstacle is the frantic race characteristic of the current state of science, the non-stop pursuit of more and more new “results”, narrowing the horizon and leaving no time to think about deeper problems or engage in a serious study of related and, moreover, not quite related scientific discipline. This applies equally to linguists and mathematicians - as, indeed, to all those who are professionally engaged in science.

And the third is inertia, or, more simply, laziness. At first glance, laziness and a frantic race are incompatible, but in reality they get along well with each other and, moreover, support and stimulate each other. When a person is too lazy to take on a difficult task, he grabs at an easier and more “reliable” one, success in which justifies and encourages his inertia. arrogant attitude towards smaller brothers, swarming on the other side of the wall, also encourages laziness and is encouraged by it. When, for example, a mathematician proposes to reconsider all ideas about ancient history, without giving himself the trouble to get at least a little acquainted with the ancient languages, the same lazy mother is to a very large extent responsible for this.

The danger to the development of science posed by these obstacles is much more serious than it might seem at first glance. When ignorance in "foreign" sciences becomes a matter of pride, this naturally leads to superficiality and ignorance in "our own" ones as well. There have long been many more than two "faculties", their number is growing from year to year, and each is fenced off by a wall from the others; walls appear inside the faculties as well. The horizons of researchers are gradually narrowing; it is true that the apparatus of research is becoming more and more subtle and refined, but almost exclusively small items, and the notion is reinforced that they are the only ones worth studying. There is every reason to talk about a crisis in science, and linguistics is no exception. Now, it seems to me, is the time to look back and think.

Here gathered linguists of the direction, which is associated with the model "Meaning - Text". This model, created in the 60s of the last century, was one of the first and best results meeting of linguistics and mathematics, after which two generations of linguists have grown up, accustomed to precise thinking from their student years. But they, unfortunately, are not free from inertia, which prevents them from realizing the existence of a crisis and thinking about ways to overcome it. Meanwhile, among all linguists - and perhaps even among all those involved in the humanities - they have the most objective opportunities for such an understanding, and I would like to hope that they will use these opportunities.

The text of the report was kindly provided by A.V. Gladkiy and the publishing house

The history of the application of mathematical methods in linguistics LECTURE No. 1

plan

Formation
structural linguistics
on turn of XIX- XX centuries.
Application of mathematical methods
in linguistics in the second half of the twentieth
century.
prospects
applications
mathematical methods in
linguistics.

Ferdinand de Saussure (1857-1913) language as a system

langue proper
speech - parole
speech activity
- language

I.A. Baudouin de Courtenay (1845 - 1929)

“Sounds are the “atoms” of language
systems that have
limited number
easily measurable properties.
This is the most convenient
material for formal,
strict methods
descriptions."

Structural linguistics -

it is a set of views on language and
methods of its research, based on
which lies the understanding of the language as
sign system with clearly distinguishable
structural elements (units
language, their classes, etc.) and the desire for
strict (approaching exact
sciences) formal description of the language.

Leningradskaya
phonological school
(L.V. Shcherba) used as
the main criterion for the generalization of sound in
as a phoneme psycholinguistic
analysis-based experiment
speech of native speakers.
Prague Linguistic Circle
(N.S. Trubetskoy) developed the theory
oppositions - semantic structure
language was described by them as a set
oppositionally constructed
semantic units - fam.

Application of mathematical methods in linguistics in the second half of the twentieth century

American
descriptivism
(L. Bloomfield and E. Sapir). Language
presented to descriptivists as
collection of speech utterances.
Formal grammar of N. Chomsky.
Moscow
phonological school,
whose representatives were A.A.
Reformatsky, V.N. Sidorov, P.S.
Kuznetsov, A.M. Sukhotin, R.I. Avanesov.

machine translation systems

Algorithm
consecutive translation
word by word, phrase by phrase.
T-systems (from English word transfer
- transformation), in which the translation
carried out at the level of syntactic
structures.
I-systems (from the word "interlingua") obtaining a semantic representation
input sentence through its
semantic analysis and synthesis
input offer on the received
semantic presentation.

10. Practical linguistics

studies
not a language in its state (i.e.
system), and the language in action (i.e. in
communication);
solves a specific application problem,
creating language models without
claims to explain the facts of language
(as theoretical linguistics);
targeting specific sublanguages
(i.e. selective knowledge of the language), and not
for the whole language.

11. quantitative linguistics

- interdisciplinary direction in
applied research, which
as the main study tool
language and speech are used
quantitative or statistical
analysis methods.

12. computational linguistics

– development of methods, technologies and
specific systems that provide
communication between a person and a computer on a natural
or limited natural language.

13. computational linguistics

creation of systems for processing natural
language (for example, communication processing systems
text);
development of information retrieval systems
(documentary, that is, in which
texts, and factual, i.e. in which
facts are stored, presented not only in
textual form, as well as in the form of tables,
formulas, etc.);
creation of hypertext systems (i.e.
set of texts with linking them
relationships);
development of computer technologies
compilation and use of dictionaries.

14. Thank you for your attention!

15. Reports:

Laws
nature and "humanitarian" laws.
Mathematical revolution in linguistics.
Copenhagen School of Structural
linguistics.
The formation of applied linguistics as
scientific discipline.

16. Practical session:

Description of the history of the application of mathematical
methods in linguistics from antiquity to our
days.
Manifestation of integration trends
mathematical, linguistic and other knowledge in
history of the development of the science of language.
Comparative characteristics of applied and
theoretical linguistics (fill in the table
Comparative characteristics of applied and
theoretical linguistics).
Corpus linguistics as an applied section
linguistics.
Applied aspects of quantitative linguistics.
Computational linguistics and its tools.

During the last century, linguistics has always been cited as an example of a science that developed rapidly and very quickly reached methodological maturity. Already in the middle of the last century, young science confidently took its place in the circle of sciences that had a thousand-year tradition, and one of its most prominent representatives - A. Schleicher - had the courage to believe that with his works he was already summing up the final line.<113>The history of linguistics, however, has shown that such an opinion was too hasty and unjustified. At the end of the century, linguistics underwent its first great shock associated with the criticism of neo-grammatical principles, followed by others. It should be noted that all the crises that we can uncover in the history of the science of language, as a rule, did not shake its foundations, but, on the contrary, contributed to the strengthening and ultimately brought with them a refinement and improvement of the methods of linguistic research, expanding along with themes and scientific issues.

But next to linguistics, other sciences also lived and developed, including a large number of new ones. The physical, chemical and technical (so-called "exact") sciences have received especially rapid development in our time, and their theoretical basis, mathematics, has reigned over all of them. The exact sciences have not only greatly pressed all the humanities, but at present they are striving to "bring them into their faith", to subordinate them to their customs, to impose their research methods on them. In the current situation, using a Japanese expression, one can say that now linguists-philologists are defiling the very edge of the mat, where the exact sciences, headed by mathematics, are triumphantly and freely located.

Wouldn't it be more expedient from the point of view of general scientific interests to capitulate to mathematics, to surrender entirely to the power of its methods, to which some voices are openly calling 59 , and thereby, perhaps, gain new strength? To answer these questions, we must first look at what mathematics claims in this case in which area of ​​linguistics mathematical methods find their application, to what extent they are consistent with the specifics of linguistic material and whether they are able to give or even just suggest answers to the questions that the science of language poses.

From the very beginning, it should be noted that among the enthusiasts of the new, mathematical trend in linguistics<114>There is no unanimity of opinions regarding its goals and objectives in static research. Acad. A. A. Markov, who was the first to apply mathematical methods to language, Boldrini, Yul, Mariotti consider language elements as suitable illustrative material for constructing quantitative methods, or for statistical theorems, without at all wondering whether the results of such a study are of interest to linguists 6 0 . Ross believes that probability theory and mathematical statistics provide a tool or, as they now prefer to say, a mathematical model for testing and confirming those linguistic conclusions that allow numerical interpretation. Thus, mathematical methods are conceived only as auxiliary means of linguistic research 6 1 . Much more is claimed by Herdan, who in his book not only summed up and systematized all attempts at the mathematical study of language problems, but also tried to give them a clear orientation in relation to further work. He focuses the presentation of the entire material of his book on “understanding literary statistics (as he calls the study of texts by methods of mathematical statistics. - AT 3.) as an integral part of linguistics” 6 2 , and formulates the essence and tasks of this new section in linguistics in the following words: “Literary statistics as a quantitative philosophy of language is applicable to all branches of linguistics. In our opinion, literary statistics is structural linguistics raised to the level of a quantitative science or a quantitative philosophy. Thus, it is equally wrong to define its results as being out of scope<115>linguistics or treat it as an auxiliary tool for research” 6 3 .

It is hardly advisable to go into theorizing as to whether it is legitimate in this case to speak of the emergence of a new branch of linguistics and resolve the issue of its claims, without first referring to the consideration of what has actually been done in this area, and to clarifying in what direction the application of new methods 6 4 . This will help us understand the differences of opinion.

The use of mathematical (or, more precisely, statistical) criteria for solving linguistic problems is by no means new to the science of language and, to one degree or another, has long been used by linguists. After all, in fact, such traditional concepts of linguistics as phonetic law (and related<116>nee with it - an exception to the law), the productivity of grammatical elements (for example, derivational suffixes), or even the criteria for related relations between languages, to a certain extent, are based on relative statistical features. After all, the sharper and more distinct the statistical opposition of the observed cases, the more reason we have to talk about productive and unproductive suffixes, about the phonetic law and exceptions to it, about the presence or absence of kinship between languages. But if in such cases the statistical principle was used more or less spontaneously, then in the future it began to be applied consciously and already with a certain goal setting. So, in our time, the so-called frequency dictionaries of vocabulary and expressions of individual languages ​​6 5 or even the meanings of multilingual words with a "general focus on reality" 6 6 have become widespread. The data of these dictionaries are used to compile foreign language textbooks (the texts of which are built on the most commonly used vocabulary) and minimum dictionaries. Statistical calculus found a special linguistic use in the method of lexicostatistics or glottochronology by M. Swadesh, where, on the basis of statistical formulas that take into account the cases of disappearance of words from the languages ​​of the main fund, it is possible to establish the absolute chronology of the division of language families 6 7 .

IN last years cases of applying mathematical methods to linguistic material have multiplied significantly, and in the mass of such attempts, more or less definite directions have been outlined. Let's turn<117>to their sequential consideration, without going into details.

Let's start with the direction that has been given the name of stylostatistics. In this case, we are talking about the definition and characterization of the stylistic features of individual works or authors through the quantitative relations of the linguistic elements used. The statistical approach to the study of stylistic phenomena is based on the understanding of literary style as an individual way of mastering the means of language. At the same time, the researcher is completely distracted from the question of the qualitative significance of the countable linguistic elements, focusing all his attention only on the quantitative side; the semantic side of the studied language units, their emotional and expressive load, as well as their share in the fabric of a work of art - all this remains out of account, refers to the so-called redundant phenomena. Thus, a work of art appears in the form of a mechanical aggregate, the specificity of the construction of which finds its expression only through the numerical relations of its elements. Representatives of stylostatistics do not turn a blind eye to all the circumstances noted, opposing the methods of traditional stylistics, which undoubtedly include elements of subjectivity, with one single quality of the mathematical method, which, in their opinion, compensates for all its shortcomings - the objectivity of the results achieved. “We strive,” writes, for example, V. Fuchs, “... to characterize the style of linguistic expression by mathematical means. For this purpose, methods should be created, the results of which should have the same objectivity as the results of the exact sciences ... This suggests that we, at least initially, will deal only with formal structural qualities, and not with the semantic content of linguistic expressions . In this way we will obtain a system of ordinal relations, which in its totality will be the basis and starting point of the mathematical theory of style” 6 8 .<118>

The simplest type of statistical approach to the study of the language of writers or individual works is to count the words used, since the richness of the dictionary, apparently, should characterize the author himself in a certain way. However, the results of such calculations give somewhat unexpected results in this regard and do not contribute in any way to aesthetic knowledge and evaluation of a literary work, which is not least one of the tasks of stylistics. Here are some data on the total number of words used in a number of works:

Bible (Latin). . . . . . . . . . 5649 words

Bible (Hebrew). . . . 5642 words

Demosthenes (speech). . . . . . . . . . . . 4972 words

Sallust. . . . . . . . . . . . . . . . . 3394 words

Horace. . . . . . . . . . . . . . . . . . . .6084 words

Dante (Divine Comedy) 5860 words

(this includes 1615 proper names and geographical names)

Tasso (Furious Orland). . . . 8474 words

Milton. . . . . . . . . . . . . . . . . . . . .8000 words (approx. given)

Shakespeare. . . . . . . . . . . . . . . . . . .15000 words

(approximately, according to other sources 20,000 words)

O. Jespersen points out that the dictionary of Zola, Kipling and Jack London significantly exceeds the dictionary of Milton, i.e. the number is 8000 6 9 . The calculation of the dictionary of speeches of US President W. Wilson found that it is richer than that of Shakespeare. To this should be added the data of psychologists. Thus, Terman, based on observations of a large number of cases, found that the vocabulary of an average child is about 3600 words, and at the age of 14 - already 9000. The average adult uses 11700 words, and a person of "increased intelligence" up to 13500 7 0 . Thus, such numerical data in themselves do not provide any grounds for identifying the stylistic qualities of works and only "objectively" con<119>they state the use of a different number of words by different authors, which, as the above calculations show, is not related to the relative artistic value of their works.

Calculations of the relative frequency of the use of words by individual authors are built somewhat differently. In this case, not only the total amount of words is taken into account, but also the frequency of use of individual words. Statistical processing of the material obtained in this way consists in the fact that words with equal frequency of use are grouped into classes (or ranks), which leads to the establishment of the frequency distribution of all words used by a given author. A special case of this kind of calculation is the determination of the relative frequency of special words (for example, Romance vocabulary in Chaucer's works, as was done by Mersand 7 1). The relative frequency of the words used by the authors contains the same objective information about the style of individual authors as the above total calculations, with the only difference that the result is more accurate numerical data. But it is also used to date individual works by the same author on the basis of a preliminary calculation of the relative frequency of his use of words in different periods of his life (according to works dated by the author himself). Another type of use of data from such calculations is to establish the authenticity of the authorship of works for which this question seems doubtful 7 2 . In this last case, everything is based on a comparison of statistical formulas for the frequency of use in genuine and controversial works. There is no need to talk about the very great relativity and approximateness of the results obtained by such methods. After all, the relative frequency of use changes not only with the age of the author, but also depending on the genre, plot, and also the historical environment of the action of the work (compare, for example, "Bread" and "Peter I" by A. Tolstoy).<120>

Deepening the method described above, stylostatistics as a style characteristic began to resort to the criterion of stability of the relative frequency of the most commonly used words. The method used in this case can be illustrated by the statistical processing of Pushkin's story "The Captain's Daughter" by Esselson and Epstein at the Institute of Slavic Languages ​​at the University of Detroit (USA) 7 3 . The entire text of the story (about 30,000 occurrences of words) was subjected to the survey, and then passages containing about 10,000 and 5,000 occurrences. Further, in order to determine the stability of the relative frequency of the use of words, the 102 most commonly used words (with a frequency of 1160 times to 35) were compared with the calculated relative frequency (made on the basis of selective passages) with the actual one. For example, the union "and" was used 1,160 times throughout the story. In a passage containing 5,000 occurrences of all words, this conjunction should be expected to be used 5,000 x 1,160:30,000, or rounded up 193 times, and in a passage containing 10,000 occurrences of all words, it is expected to be used 10,000 x 1,160: 30,000, or 386 times. Comparison of the data obtained using this kind of calculations with the actual data shows a very slight deviation (within 5%). Based on such calculations, it was found that in this story by Pushkin, the preposition "k" is used twice as often as "y", and the pronoun "you" is used three times more often than "them", etc. Thus, despite at all the vicissitudes of the plot, both throughout the story and in its individual parts, there is a stability in the relative frequency of the use of words. What is observed in relation to some (most common) words is presumably applicable to all words used in the work. It follows that the style of the author can be characterized by a certain ratio of the variability of the average frequency of using a word to the general frequency for a given language.<121>the frequency of its use. This ratio is considered as an objective quantitative characteristic of the author's style.

Other formal elements of the language structure are studied in a similar way. So, for example, V. Fuchs subjected the metrical features of the works of Goethe, Rilke, Caesar, Sallust, etc. to a comparative-statistical consideration. 7 4

The criterion of the stability of the relative frequency of the use of words, while clarifying the technique of the quantitative characterization of style, does not introduce anything fundamentally new in comparison with the more primitive methods analyzed above. All methods of stylostatistics ultimately produce equally dispassionate "objective" results, gliding over the surface of the tongue and clinging only to purely external signs. Quantitative methods, apparently, are not able to focus on the qualitative differences in the material under study and in fact level out all the objects under study.

Where maximum specification is needed, the most generalized criteria are offered; qualitative characteristics are expressed in the language of quantity. This is not only a logical contradiction, but also a disagreement with the nature of things. Indeed, what happens if we try to get a comparative stylistic (i.e., therefore, qualitative) characteristic of the works of Alexander Gerasimov and Rembrandt based on the quantitative ratio of red and black paint on their canvases? It seems to be an absolute nonsense. To what extent are completely “objective” quantitative information about a person’s physical data capable of giving us an idea of ​​​​everything that characterizes a person and makes him true essence? Obviously none. They can serve only as an individual sign that distinguishes one person from another, like an imprint of convolutions on the thumb. The situation is similar with the quantitative characteristics of literary style. If you look closely, they provide just as meager data for judging the actual stylistic<122>qualities of the author's language, as well as a description of the convolutions on the finger for the study of human psychology.

To all that has been said, it should be added that in the past, in the so-called formal school of literary criticism, an attempt was already made to quantitatively study the style of writers, when epithets, metaphors, and rhythmic-melodic elements of verse were counted. However, this attempt was not further developed.

Another area of ​​application of mathematical methods for the study of linguistic phenomena can be grouped under the name of linguistic statistics. It seeks to intrude into the fundamental questions of the theory of language and thus to obtain a vocation in the proper linguistic sphere. To get acquainted with this direction, it is best to turn to the already mentioned work of Herdan, in the words of one of its many reviewers, "a monstrously pretentious book" 7 5 , received, however, a wide response among linguists 7 6 . In view of the fact that Kherdan (as already mentioned above) sought to collect in his book everything most significant in the field of application of mathematical methods to linguistic problems, in his book we are actually dealing not so much with Kherdan as with a whole trend. As the very title of the book, “Language as Choice and Probability,” shows, its main focus is on clarifying what in language is left to the free choice of the speaker and what is due to the immanent structure of the language, just like on determining the quantitative ratio of the elements of the first and second order. Kherdan's book provides almost exhaustive information about all the work in this area carried out by representatives of various specialties.<123>(philosophers, linguists, mathematicians, technicians), but is not limited to this and includes many original observations, considerations and conclusions of the author himself. As a summarizing work, it gives a good idea of ​​the quantitative methods used, and of the results achieved with their help. The questions that we conditionally combine into the section of linguistic statistics are treated in the second and fourth parts of the book.

Of the many cases of applying the methods of mathematical statistics to the study of linguistic issues, we will focus on the most general ones, which at the same time can be considered as the most typical. Using data from other authors - Boldrini 7 7 , Mathesius 7 8 , Mariotti 7 9 , Zipf 8 0 , Deway 8 1 and others, as well as citing his own studies that determine the relative frequency of the distribution of phonemes, letters, word length (measured by the number of letters and syllables), grammatical forms and metric elements in Latin and Greek hexameter, Herdan establishes the fact of the stability of the relative frequency of linguistic elements as a common characteristic of all linguistic structures. He derives the following rule: “The proportions of linguistic elements belonging to one or another level or sphere of linguistic coding - phonology, grammar, metrics - remain more or less constant for a given language, in a given period of its development and within the limits of sufficiently extensive and impartially conducted observations. » 8 2 . This rule, which Herdan calls the basic law of language, he seeks to interpret and expand in a certain way. “He,” Herdan writes about this law, “is an expression of the fact that even here, where the human will and freedom of choice are granted<124>the broadest framework, where conscious choice and carefree play vividly alternate with each other, in general there is considerable stability... in grammar, but also in relation to the frequency of use of specific phonemes, lexical units (words) and grammatical phonemes and constructions; in other words, the similarity is not only in what is used, but also in how often it is used” 8 3 . This situation is due to understandable reasons, but this gives rise to new conclusions. When examining different texts or segments of a given language, for example, it is found that the relative frequencies of use of a given particular phoneme (or other speech elements) by different people remain basically the same. This leads to the interpretation of individual forms of speech as some fluctuations in the constant probability of using the considered phoneme in a given language. Thus, it turns out that in his speech activity a person is subject to certain laws of probability in relation to the number of linguistic elements used. And then, when we observe a huge number of linguistic elements in a large set of texts or speech segments, we get the impression of causal dependence in the sense that in this case there is also determination in relation to the use of certain linguistic elements. In other words, it turns out to be admissible to assert that what seems to be a causal relation from an intuitive point of view, is quantitatively a probability 8 4 . It is clear that the larger the total<125>the specificity of the examined texts or speech segments, the more clearly the stability of the relative frequency of the use of linguistic elements will be manifested also in individual use (the law of large numbers). From here a new one is made general conclusion that language is a mass phenomenon and should be treated as such.

These conclusions, reached on the basis of frequency calculations of phonetic elements, words and grammatical forms, which together constitute a language, are then applied to the "statistical interpretation" of Saussure's division into "language" (lalangue) and "speech" (laparole). According to Saussure, "language" is a set of linguistic habits that make communication possible between members of a given linguistic community. This is a social reality, a "mass phenomenon", obligatory for all people who speak this language. Herdan, as indicated, proves that the members of a single language community are similar to each other not only in that they use the same phonemes, lexical units and grammatical forms, but also in that all these elements are used with the same frequency. Thus, his statistical definition of "language" takes the following form: "language" (lalangue) is the totality of common linguistic elements plus their relative probability of being used.

This definition of "language" is also the starting point for the corresponding statistical interpretation of "speech", which, according to Saussure, is an individual utterance. Contrasting “language” as a social phenomenon of “speech” as an individual phenomenon, Saussure wrote: “Speech is an individual act of will and understanding, in which it is necessary to distinguish: 1. combinations with which the speaking subject uses the language code in order to express his personal thought; 2. a psychophysical mechanism that allows him to objectify these combinations” 8 5 . Since "language" in linguistic statistics is considered as a set of elements with a certain relative<126>certain probability of their use, insofar as it includes the statistical totality or ensemble (population) as the most essential characteristic and can be considered in this aspect. In accordance with this, "speech" turns into a separate sample taken from "language" as a statistical aggregate. The probability in this case is determined by the relation of "speech" to "language" (in their "quantitative" understanding), and the distribution of the relative frequency of the use of different elements of the language is interpreted as the result of a collective "choice" (choice) in a certain chronological period of the existence of the language. Realizing that such an interpretation of the differences between “language” and “speech” is nevertheless built on completely different grounds than Saussure’s, Herdan writes in this regard: “This apparently minor modification of Saussure’s concept has the important consequence that “language” ( lalangue) now acquires an essential characteristic in the form of a statistical aggregate (population). This population is characterized by certain relative frequencies or fluctuation probabilities, meaning that each linguistic element belongs to a certain linguistic level. In this case, "speech" (laparole), in accordance with its meaning, turns out to be a term for defining statistical samples taken from "language" as a statistical population. It becomes obvious that the choice (choice) appears here in the form of the ratio of "speech" to "language", being the ratio of a sample taken at random to a statistical aggregate (population). The very order of frequency distribution, as a deposit of the speech activity of a linguistic community over the centuries, is an element of choice (choice), but not of individual choice, as in style, but of collective choice. Using a metaphor, we can talk here about the choice made by the spirit of the language, if we understand by this the principles of linguistic communication, which are in accordance with the complex of mental data of the members of a particular linguistic community. The stability of series is the result of probability (chance)» 8 6 .

A special case of the application of the stated principle<127>pa is the delimitation in the language of normative phenomena from "exceptions" (deviations). In linguistic statistics, it is argued that the statistical method allows you to eliminate the existing this issue fuzziness and establish clear criteria for distinguishing between these phenomena. If the norm is understood as a statistical population (in the above sense), and the exception (or error) is a deviation from the frequencies shown by the statistical population, then a quantitative solution of the question suggests itself. It all boils down to a statistical relationship between "population" and "outlier". If the frequencies observed in an individual sample deviate from the probabilities due to the statistical population by more than is determined by a series of sample counts, then we have reason to conclude that the demarcation line between "the same" (norm) and "not the same" (exception) is violated.

Quantitative differences between "language" and "speech" are also used to distinguish two types of linguistic elements: grammatical and lexical. The starting point for solving this problem, which often presents great difficulties from a linguistic point of view, is the assumption that the degree of frequency of grammatical elements is different than that of lexical units. This is allegedly associated with the "generalization" of grammatical elements, how they differ from concepts fixed by lexical units. In addition, grammatical elements are supposedly, as a rule, much smaller in volume: as independent words (they include pronouns, prepositions, conjunctions and official words) they usually consist of a small number of phonemes, and in the form of "related forms" - of one or two phonemes 8 7 . The smaller the linguistic element, the less able its "length" (quantitative moment) to serve as a defining characteristic, and the more important the "quality" of phonemes acquires for this purpose. What methods are proposed to solve the problem under consideration? It is solved by referring to the purely quantitative concept of grammatical<128>load, “Suppose,” Herdan writes in this connection, “that we are interested in comparing two languages ​​in this respect. How do we determine with a certain degree of objectivity the "grammatical load" that a language carries? It is clear that this load will depend on the position of the demarcation line separating grammar from vocabulary. The first consideration that may come to our mind is to determine how "complex" the grammar of a given language is. After all, “complexity” is a qualitative characteristic, and the concept of “grammatical load” is a quantitative characteristic. True, the load to a certain extent depends on the complexity, but not entirely. A language may be rewarded with an extremely complex grammar, but only a comparatively small part of it is used in the activity of the language. We define "grammatical load" as the totality of grammar that a language carries when it is in action, which immediately brings our problem into the realm of structural linguistics in the sense in which the discipline was defined by Saussure. In the following presentation, quantitative methods are used to determine the difference between languages, depending on where the boundary lies, separating grammar from vocabulary” 8 8 . In other words, language differences in this case should be reduced to differences in numerical relations between grammatical and lexical elements.

The materials at our disposal paint the following picture. IN English language(only “grammatical words” were taken into account: pronouns, or, as they are also called, “substitutes”, prepositions, conjunctions and auxiliary verbs) in a segment that includes 78,633 cases of the use of all words (1027 different words), 53,102 cases of the use of grammatical elements, or, more precisely, “grammatical words” (149 different words), were found, which is 67.53% at 15.8 % different words. Such are the data of Deway 8 9 . Other data show a different percentage<129>ratio: 57.1% with 5.4% different words 9 0 . This significant discrepancy is explained by the difference between written and spoken language. Written forms languages ​​(first data) supposedly use more grammatical elements than oral ones (second case). In Dante's Divine Comedy (after the Italian original), Mariotti established 54.4% of the occurrences of "grammatical words".

Another and, apparently, a more perfect way to determine the grammatical load of a language is to count the phonemes included in the grammatical elements. In this case, not only independent grammatical words are taken into account, but also related forms. Here are possible various options. For example, determining the relative frequency of the use of individual consonant phonemes in grammatical elements and comparing them with the frequency of the total use of these same phonemes (the final data of such a ratio in English gives a proportion of 99.9% to 100,000 - total use); or a similar comparison of consonants according to separate classification groups (labial, palatal, velar and other phonemes). The final ratio here takes the form of a proportion of 56.47% (in grammatical elements) to 60.25% (in total usage); or the same comparison of the initial consonant phonemes (in this case, the ratio was 100.2% in grammatical words to 99.95 in total use). Other more complex statistical operations are also possible, which, however, result in similar quantitative expressions of the problem under study.

The given quantitative data serve as the basis for a general conclusion. It boils down to the fact that the distribution of phonemes in grammatical elements determines the nature of the distribution (in numerical terms, of course) of phonemes in the language as a whole. And this, in turn, allows us to conclude that the use of grammatical elements to the least extent depends on individual choice and constitutes that part of the linguistic expression that is controlled by probabilities.<130>ness. This speculative conclusion is confirmed by the calculation of grammatical forms in the Russian language, made by Esselson 9 1 . The study was subjected to 46896 words taken from II sources (works by Griboedov, Dostoevsky, Goncharov, Saltykov-Shchedrin, Garshin, Belinsky, Amfiteatrov, Gusev-Orenburgsky, Ehrenburg, Simonov and N. Ostrovsky). They were divided into colloquial words (17,756 words or 37.9%) and non-colloquial (29140 words or 62.1%). Then the entire set of words was divided into 4 groups depending on their grammatical nature: the 1st group included nouns, adjectives, adjectives in the function of nouns, pronouns and inflected numerals; in the 2nd group - verbs; in the 3rd group - verbal participles, participles in the function of adjectives and nouns and gerunds; in the 4th group - invariable forms of adverbs, prepositions, conjunctions and particles. The summary results (also tables with data for individual authors are given) give the following ratio:

1st group

2nd group

3rd group

4th group

colloquial

taciturn

Herdan characterizes the consideration of the quantitative data thus obtained in the following words: “They justify the conclusion that grammatical elements should be considered as a factor that determines the likelihood of a linguistic expression. Such a conclusion avoids the burdensome qualification of each word used. It is clear that, since grammar and vocabulary are not stored in watertight shells, neither is pure "choice" or pure "chance." Both grammar and vocabulary contain both elements, although in significantly varying proportions” 9 2 .<131>

A large section of Herdan's book is devoted to the study of duality or duality in language, and the very concept of duality is based on mathematical characteristics.

Thus, theorems in projective geometry can be arranged in two series, so that each theorem of one series can be obtained from some theorem of another series by replacing the words dot And straight. For example, if a statement is given: "any different points belong to one and only one line," then we can derive from it the corresponding statement: "any two different lines belong to one and only one point." Another method for determining duality is to plot along the abscissa and y-axis different plans phenomenon under study. So, as, for example, Yul 9 3 does, different frequencies of use are counted along the abscissa, and the number of lexical units whose frequency is determined, etc., is counted along the ordinate. This is how the concept of duality is interpreted, supposedly equally applicable to linguistic research.

Under the concept of duality defined in this way, which in all cases actually has the character of a binary code and which is also considered the most essential feature of the linguistic structure, phenomena of extremely different qualities are brought in, allowing opposition along two planes: the distribution of the use of words according to the nature of lexical units and the distribution of lexical units according to frequency. the use of words; written and spoken forms of speech; lexical and grammatical elements; synonyms and antonyms; phoneme and its graphic representation; definable and defining (Saussure's signifiant and signifiy), etc.

After a quantitative study of the duality of one or another particular, linguistic phenomenon or limited "text", as a rule, a conclusion is drawn, to which the qualities of linguistic universality are attributed. The nature of such conclusions and the way they are justified can be seen in the example<132>studies of the duality of the word and the concept (in fact, we are talking about the ratio of the length of the word and the volume of the concept - it must be borne in mind that the extremely free use of linguistic and other terms in such works often makes understanding very difficult). It is important to note that as the material that served as the source of observations of this type of linguistic duality, the following were used: the international nomenclature of diseases (about 1000 names) and the general register of diseases in England and Wells for 1949. In this case, the following general conclusion is made: “ Every concept denoting a general idea has what may be called a "sphere" or "volume." It allows through its medium to think about many objects or other concepts that are within its "sphere". On the other hand, all the items needed to define a concept constitute what is called its "content". Volume and content are mutually correlated - the smaller the content and, accordingly, the more abstract the concept, the larger its scope or volume, i.e., the more objects are brought under it. This can be seen as an analogy (in the conceptual sphere) to the principles of coding, according to which the length of a symbol and frequency of use are interdependent” 9 4 .

The principle of duality applies to particular problems as well. For example, when establishing the equivalence of the meanings of two words different languages. As a result of studying the English-German dictionary by Muret-Zanders using the mathematical iteration method, it is concluded that the probability of using an English word with one or more meanings in German translation remains constant for each initial letter in the entire dictionary 9 5 . Consideration of the word order in Chinese dictionaries leads to the conclusion that it is of a taxonomic nature, since the number of strokes in the character indicates its place (as an independent radical or a certain subclass subordinate to the radical). Taxonomy is a subordinating principle of classification used in zoology and botany. Kherdan claims that<133>the foundations of Chinese lexicography are also built on the principles of taxonomy 9 6, etc.

Making a general assessment of this area of ​​application of mathematical methods to the study of linguistic problems (i.e., linguistic statistics), it is necessary, apparently, to proceed from the position that was formulated by Ettinger: “Mathematics can be effectively used in the service of linguistics only when linguists are clear the real limits of its application, as well as the possibilities of the mathematical models used” 9 7 . In other words, we can talk about mathematical linguistics when mathematical methods prove their suitability for solving those linguistic problems proper, which in their totality constitute the science of language. If this is not the case, although this may open up new aspects of scientific research, then in this case you can talk about anything, but not about linguistics - in this case, we mean not different types applied linguistics (it will be discussed below), and scientific, or theoretical, linguistics. Based on this position, it should be noted that from the point of view of a linguist, much in linguistic statistics is doubtful and even bewildering.

Let us turn to the analysis of only two examples (so as not to clutter up the presentation), stipulating that very significant objections can be made to each of them. Here we have a quantitative distinction between grammatical and lexical units. It turns out that in order to make such a distinction, it is necessary to already know in advance what belongs to the field of grammar, and what to vocabulary, since the “grammatical load” of the language (i.e., the totality of grammatical elements used in speech), as indicated in quoted above, "depends on the line of demarcation that separates vocabulary from grammar." Without knowing where this line lies, it is therefore impossible to draw the indicated distinction. What then is the meaning of the quantitative method of distinguishing the lexical from the grammar?<134>matic? However, as for Herdan, he does not particularly think about this issue and boldly classifies linguistic elements, referring to the grammatical elements "connected forms", which, judging by the presentation, should be understood as external inflection, and "grammatical words", which include prepositions , conjunctions, auxiliary verbs and pronouns - the latter by virtue of the fact that they are "substitutes". But if we talk only about this quality of pronouns and on this basis relate them to grammatical elements, then, obviously, such words as “aforementioned”, “named”, “given”, etc., should also be attributed to them, so how they too act as deputies. In connection with the method of separating grammatical elements used in linguistic statistics, the question naturally arises of how to deal in this case with such “non-formal” grammatical phenomena as word order, tones, zero morphemes, paradigmatic relations (some of these phenomena, by the way, find reflection in those languages ​​that are studied by mathematical methods)? How to draw a distinction in languages ​​with rich internal inflection (as, for example, in the Semitic languages), where it not only grammatically modifies the root (radical), but also gives it lexical existence, since the root without permutations has no real existence in the language? What should be understood by the grammatical complexity of a language, by what criterion is it determined? If the quantitative point, which in this case is emphasized in every possible way, then one of the most grammatically difficult languages ​​​​will be English, which has such constructions as Ishallhavebeencalling or Hewouldhavebeencalling. In these sentences, only call can be classified as lexical, and everything else, therefore, must be considered grammatical. What grounds exist for linking the frequency of use of grammatical elements with the generality or abstractness of the meanings of grammatical words? After all, it is quite obvious that the relatively large frequency of use of grammatical elements is determined by their function in the construction of sentences, and as for the abstractness of meanings, it is very easy to find a large<135>the number of lexical elements that can easily compete with grammatical elements in this respect, being largely inferior to them in frequency (for example, being, existence, extension, space, substance etc).

A similar kind of absurdity arises before us in the case of the definition of duality (duality) of the word and concept. It is necessary to have an extremely peculiar understanding of the structural essence of the language in order to subject it to research using the nomenclature of diseases and the hospital register of diseases, which, as indicated above, served as the source material for very important linguistic conclusions. Without dwelling on the completely obscure use of such terms that do not have a linguistic existence, such as the sphere, volume and content of a concept (by the way, the lexical meaning of the word and the concept denoted by the scientific term are grossly confused), let us turn to the conclusion that is made in this case. As stated above, we are dealing with the assertion that "scope and content are mutually correlated." The entire course of reasoning that gives grounds for such a conclusion, as well as the method of mathematical operation of linguistic facts, clearly shows that in this case one very essential quality of the language is completely ignored, which upsets all the calculations being carried out: the ability to express the same thing " content” by linguistic units of different “volume”, which undoubtedly have, moreover, different relative frequency of use. So, we can designate the same person as Petrov, my acquaintance, he, a Muscovite, a young man, a university employee, my wife's brother, a man whom we met on the bridge, etc. In the light of such facts, there are no doubts only private conclusions, to which, however, as it was pointed out, universal significance is attached, but also the expediency of applying the quantitative methods themselves to such linguistic problems.

But sometimes linguists are offered conclusions, the validity of which is not in doubt. This is the "basic law of language", which consists in the fact that in the language there is a certain stability of its elements and the relative frequency of their mention.<136>consumption. The trouble with this kind of discoveries, however, is that they have long been known to linguists. After all, it is quite obvious that if the language did not have a certain stability and each member of a given linguistic community freely varied the elements of the language, then mutual communication would not be possible and the very existence of the language would become meaningless. As for the distribution of the relative frequency of the use of individual elements of the language, it has found its expression in linguistics in the form of the allocation of categories of passive and active vocabulary and grammar, to which L. V. Shcherba paid so much attention. In this case, statistical methods can only help linguists in the distribution of specific linguistic elements according to the categories of the relative frequency of their use, but they have no reason to claim to discover any new patterns that are of value to theoretical linguistics.

On the other hand, linguistic statistics offers a number of truly "original" conclusions that are extremely indicative of the nature of the scientific thinking of its adherents. Thus, the "political vocabulary" in the works of Churchill, Benes, Halifax, Stresemann and others is studied with complex statistical methods, and translations of their works into English are used in the calculations for non-English-speaking authors. The calculation results are presented in the form of numerous tables, mathematical formulas and equations. The linguistic interpretation of quantitative data in this case is reduced to the fact that Churchill's use of "political vocabulary" is the most typical (?) for this group of authors and that Churchill's use of words in cases where he deals with political issues is typical of the English speech group. 9 8 .

In another case, after appropriate statistical manipulations, it is concluded that Hitler violated the duality between "language" and "speech" in the quantitative sense of these terms in the use of Nazi Germany. A special case of the destruction of this duality is the literal understanding<137>metaphorical turns (for example, "pour salt into open wounds"). Nazi Germany branded itself with so many inhuman acts that there is hardly any need to convict it of this linguistic atrocity 9 9 . According to Kherdan, Marx's definition of language as the immediate reality of thought also leads to the violation of linguistic duality, and the law of dialectics about the transition of a phenomenon into its opposite is, in his opinion, the misunderstood linguistic law of the duality of language. Such interpretations speak for themselves.

Finally, a common shortcoming inherent in all the above cases of the quantitative method of studying linguistic material and thus acquiring a methodological character is the approach to linguistic elements as a mechanical set of facts absolutely independent of each other, in accordance with which, if any or patterns, they refer only to the numerical relations of the distribution of autonomous facts, outside of their systemic dependencies. True, J. Watmou tries in every possible way to assure that it is mathematics that is better than any kind of linguistic structural analysis, capable of revealing the structural features of a language. “Modern mathematics,” he writes, “does not deal with measurement and calculus, the accuracy of which is limited by their very nature, but primarily with structure. This is why mathematics is highly conducive to the accuracy of language learning - to the extent that a separate description, even more limited in nature, is not capable of ... Just as in physics, mathematical elements are used to describe the physical world, since they are assumed to correspond elements of the physical world, so in mathematical linguistics the mathematical elements are supposed to correspond to the elements of the world of speech” 101 . But such a formulation of the question by no means saves the situation, since at best it can<138>give an analysis of language either as a physical structure, which is still far from sufficient for language, and in the final analysis is still of the same mechanistic character, or as a logical-mathematical structure, and this transfers language to a different plane, and in many respects alien to it. It is not superfluous to note that Watmow foresees the successes of mathematical linguistics only in the future, and as for their real results, he evaluates them in the following words: “... almost all the work done to date by Herdan, Zipf, Yul, Giro (Guiraux) and others, is by no means outside the scope of criticism from both linguistics and mathematics; she smacks of amateurishness to a great extent” 103 . Thus, if we do not try to predict the future of mathematical methods in linguistic research, but try to appreciate what we have today, then we will have to admit that mathematics has actually been limited in the field of linguistics only to “measurement and counting”, and I could not give a qualitative analysis of the language, delving into its structure.<139>

Let's try to be as objective as possible. In a certain part, quantitative data, apparently, can be used by linguistics, but only as auxiliary and mainly in problems that have a practical orientation. With regard to most of the quantitative methods of studying individual linguistic phenomena, the general conclusion of R. Brown is undoubtedly justified: “They can be considered as Kherdan considers them, but what is the meaning of all this?” 104 . Let's imagine that we ask the question: "What are the trees in this garden?". And in response we get: "There are a hundred trees in this garden." Is this the answer to our question, and does it really make sense? But with regard to many linguistic questions, mathematical methods give just such answers.

However, there is a wide field of research activity, using mainly mathematical methods and at the same time orienting them to linguistic material, where the expediency of such a combination is beyond doubt. The "meaning" of this research activity, its significance is determined by the goals to which it aspires. It has already been tested in practice. In this case, we are talking about the problems associated with the creation of information machines, structures for machine translation of written scientific texts, automation of the translation of oral speech from one language to another and with all the complex of tasks that are combined in the linguistic issues of cybernetics. The whole set of such problems is usually given the general name of applied linguistics. Thus, it is distinguished from the so-called mathematical linguistics, which includes those areas of work that have been designated above as stylostatistics and linguistic statistics, although it by no means avoids the statistical processing of linguistic material. Perhaps the most important feature of applied linguistics, separating it from mathematical linguistics, as outlined above, is that the former has the opposite direction: not mathematics for linguistics, but linguistics.<140>(formalized by mathematical methods) for a wide range of practical problems.

There is no need to disclose the content of individual problems that are now included in the extremely wide area of ​​applied linguistics. In contrast to mathematical linguistics, these problems are actively discussed in Soviet linguistic literature and rightly begin to occupy an increasingly prominent place in the scientific problems of research institutes 105 . Thus, they are already well known to our linguistic community. This circumstance, however, does not relieve us of the need to subject them to reflection, in particular, from the point of view of the principles of the science of language. This will undoubtedly help to eliminate the misunderstandings that more and more often arise between representatives of sciences that are very distant from each other and take part in the work on the problems of applied linguistics, and will outline ways for their convergence, on the one hand, and delimitation of areas of research, on the other hand. It goes without saying that the following considerations will represent the point of view of the linguist, and it is necessary that mathematicians not only try to assimilate it, but, in connection with the questions raised, give them their interpretation.

The linguist-theorist cannot in any way be satisfied with the fact that in all cases<141>language for the purposes set by applied linguistics, their basis is a mathematical model. In accordance with this, observations on the phenomena of language and the results obtained in this way are expressed in terms and concepts of mathematics, i.e., through mathematical equations and formulas. Let's look at an example for clarity. Condon 1 06 and Zipf 1 07 established that the logarithms of the frequency ( f) the occurrences of words in a large text are located almost in a straight line, if they are correlated in the diagram with the logarithms of rank or rank ( r) of these words. The equation f=c:r, Where With is a constant reflects this relationship in the limited sense that c:r for set value r reproduces the observed frequency with great approximation. Relationship between f And r, expressed by a mathematical formula, is a model for the relationship between the observed values ​​of the frequency of use and the rank, or rank, of words. This is one of the cases of mathematical modeling. 

The entire theory of information is entirely based on the mathematical model of the communication process developed by C. Shannon 108 . It is defined as "a mathematical discipline devoted to the methods of calculating and estimating the amount of information contained in any data, and the study of the processes of storing and transmitting information" (TSB, vol. 51, p. 128). Accordingly, the basic concepts of information theory receive a mathematical expression. Information is measured in binits or binary units (a code, which is likened to a language, with two conditional equally probable signals transmits one binary unit of information during the transmission of each character). -either code and the average amount of information transmitted<142>formations. Redundancy is expressed as a percentage of the total transmitting ability of the code”, 1 09 etc. In the same way, machine translation requires algorithmic development of mapping elements of one language into another, etc. 1 10 . These are other cases of modeling.

The use of models without any meaning can be of very significant help, in particular, in all likelihood, in solving the problems that applied linguistics sets itself. However, for theoretical linguistics, it is very important that an abstract model, as a rule, does not reproduce all the features of a real phenomenon, all its functional qualities. So, an architect, before building a house, can create his model, which reproduces the house being designed in all the smallest details, and this helps him solve a number of practical issues related to the construction of the house itself. But such a model of a house, no matter how accurate it may be, is devoid of that “function” and that purpose for which all houses are built in general - it is not capable of providing a person with housing. The situation is similar with the language, where the model is not always able to reproduce all its qualities. In this case, the matter is further complicated by the fact that not linguistic, but mathematical criteria are used to build the model. “Mathematical models ... - writes A. Ettinger, - play an extremely important role in all areas of technology, but since they are a tool for synthesis, their significance for linguistics, which is primarily a historical and descriptive discipline, is naturally limited” 1 11 .<143>

Mathematical modeling of a language is actually applicable only to its static state, which is conditional for a linguist and in fact is in direct conflict with the basic quality of a language, the very form of existence of which is development. It goes without saying that the static study of a language is by no means excluded from linguistics and is the basis for compiling normative grammars and dictionaries, descriptive grammars, practical grammars and dictionaries that serve as a guide for the practical study of foreign languages, etc. However, in all such works, which are predominantly applied in nature, linguists consciously limit the field of research and by no means close their eyes to other aspects of the language 1 12 . With a static examination of the language, in particular, such qualities of the language associated with its dynamic nature, such as productivity, dependence on forms of thinking, and extensive interaction with cultural, social, political, historical and other factors, completely disappear from the researcher's field of vision. Only on the synchronic plane can language be considered as a system of conventional signs or codes, which, however, turns out to be completely unjustified as soon as we adopt a dynamic point of view more suitable for language. It is in the processes of development that language qualities such as motivation, polysemy of words that do not have stable boundaries, non-autonomy of the meaning of a word and its sound shell, and the creative potential of a word associated with the context are manifested, and all this is in sharp contradiction with the main characteristics of a code or a sign 1 13 . Obviously, in applied linguistics, one can also think of all these qualities of the language and, for practical purposes, be content with, so to speak, a “snapshot” of the language, which is still capable of giving a fairly approximate idea of ​​the mechanism of its functioning.<144>nirovaniya. However, each such "snapshot", if considered as a fact of language, and not as a fact of a system of conventional codes, must be included in the endless process of movement in which language always exists 1 14 . It cannot be studied outside the concrete conditions that characterize this movement, which leaves its mark on given state language and conditioning the potency of its further development. Here there is the same difference as between a momentary photograph of a person and his portrait painted with a brush. true artist. In the artist's work, we have before us a generalizing image of a person in all the originality of not only his physical appearance, but also his inner spiritual content. From an artistic portrait, we can also read the past of the person depicted on it and determine what he is capable of in his actions. And a snapshot, although capable of giving a more accurate image of the appearance of the original, is devoid of these qualities and often captures both an accidental pimple that jumped up on the nose and<145>a completely uncharacteristic pose or expression, which ultimately leads to a distortion of the original.

It should be noted that the method of "snapshots" can, of course, be applied to the facts of language development. But in this case, we will actually be dealing only with separate states of the language, which, in their quantitative characterization, turn out to be connected no more than a comparative quantitative characterization of different languages. This kind of quantitative "dynamics" will not contain anything organic, and the connection between the individual states of the language will rest only on the comparison of numerical relations. If in this case, too, to resort to an analogy, then we can refer to the growth of the child. His development, of course, can be represented in the form of the dynamics of numerical data about his weight, height, changing ratios of the volume of parts of his body, but all these data are absolutely detached from everything that primarily constitutes the individual essence of a person - his character, inclinations, habits. , flavors, etc.

Another negative side of the mathematical "modeling" of the language is the fact that it cannot serve as the general principle on the basis of which it is possible to carry out a comprehensive and comprehensive - systematic description of the language. Only a mathematical approach to the phenomena of language, for example, will not make it possible to answer even such fundamental questions (without which the very existence of the science of language is unthinkable), such as: what is language, what phenomena should be classified as proper linguistic, how a word or sentence is defined, what are the basic concepts and categories of language, etc. Before turning to the mathematical methods of studying language, it is necessary to already have answers (even in the form of a working hypothesis) to all these questions in advance. There is no need to turn a blind eye to the fact that in all cases known to us of the study of linguistic phenomena by mathematical methods, all these concepts and categories inevitably had to be accepted as they were defined by traditional or, relatively speaking, qualitative methods.

This feature of mathematical methods in their linguistic application was noted by Spang-Hanssen when pi<146>sal: “It should be borne in mind that observed facts that receive a quantitative expression ... have no value if they do not form part of the description, and for linguistic purposes it should be a systematic description, closely related to a qualitative linguistic description and theory” 1 15 . In another speech by Spang-Hanssen, we find a clarification of this idea: “Until the possibility of constructing a quantitative system is proved, and as long as there is a generally accepted qualitative system for a given field of study, frequency calculations and other numerical characteristics from a linguistic point of view vision do not make any sense" 1 16 . Similar ideas are expressed by Uldall, somewhat unexpectedly connecting them with the development of the general theoretical foundations of glossematics: “When a linguist considers or measures everything that he considers and measures, it is not itself determined quantitatively; for example, words, when they are counted, are defined, if they are defined at all, in quite different terms.<147>

Thus, it turns out that both in theoretical terms and in their practical application, mathematical methods are directly dependent on linguistic concepts and categories defined by traditional, philological, or, as mentioned above, qualitative methods. In terms of applied linguistics, it is important to realize this dependence, and, consequently, to get acquainted with the totality of the main categories of traditional linguistics.

True, there is no reason to reproach representatives of the exact sciences working in the field of applied linguistics for not using the data of modern linguistics. This does not correspond to the actual state of things. They not only know perfectly well, but also widely use in their work the systems of differential features established by linguists that are characteristic of different languages, the distribution and arrangement of linguistic elements within specific language systems, the achievements of acoustic phonetics, etc. But in this case, a very significant reservation is necessary. . In fact, representatives of the exact sciences use the data of only one direction in linguistics - the so-called descriptive linguistics, which deliberately distinguished itself from the traditional problems of theoretical linguistics, far from covering the entire field of linguistic research, from a proper linguistic point of view, it has significant methodological shortcomings, which led it to recently revealed crisis 1 18 , and, moreover, has a purely practical orientation, corresponding to the interests of applied linguistics. All the reservations and reproaches that were made above against the static consideration of language are applicable to descriptive linguistics. Such a one-sided approach of descriptive linguistics can, the investigator<148>However, it can be justified only by the tasks that applied linguistics sets itself, but it far from exhausts the entire content of the science of language.

In the process of developing questions of applied linguistics, new theoretical problems may arise, and in fact have already arisen. Some of these problems are closely related to the specific tasks of applied linguistics and are aimed at overcoming the difficulties that arise in solving these problems. Other problems are directly related to theoretical linguistics, allowing a new perspective on traditional ideas or opening up new areas of linguistic research, new concepts and theories. Among these latter, for example, is the problem of creating a "machine" language (or intermediary language), which is most closely related to a complex set of such cardinal issues of theoretical linguistics as the relationship of concepts and lexical meanings, logic and grammar, diachrony and synchrony, the sign nature of the language, the essence of linguistic meaning, the principles of constructing artificial languages, etc. 1 19 . In this case, it is especially important to establish mutual understanding and commonwealth in the common work of representatives of linguistic disciplines and the exact sciences. As for the linguistic side, in this case, apparently, we should not be talking about already limiting the efforts of, for example, designers of translation machines in advance” and trying to establish the working capabilities of such machines with the verses of N. Gribachev or the prose of V. Kochetov 1 20 . The machine itself will find the limits of its capabilities, and profitability - the limits of its use. But linguists, as their contribution to the common cause, must bring their knowledge of the features of the structure of the language, its versatility, the internal intersecting relations of its elements, as well as the wide and multilateral connections of language with physical, physiological, mental and logical<149>mi phenomena, specific patterns of functioning and development of the language. The totality of this knowledge is necessary for the designers of the corresponding machines in order not to wander in the wrong directions, but to make the search purposeful and clearly oriented. Even that one is very short review cases of application of mathematical methods to linguistic problems, which was made in this essay, convinces that such knowledge will by no means be superfluous for representatives of the exact sciences.

On the basis of all the above considerations, one can obviously come to some general conclusions.

So, mathematical linguistics? If this means the use of mathematical methods as a universal master key for solving all linguistic problems, then such claims should be recognized as absolutely unjustified. Everything that has been done in this direction has so far done very little or even not at all to solve the traditional problems of the science of language. At worst, the application of mathematical methods is accompanied by obvious absurdities or, from a linguistic point of view, is absolutely meaningless. At best, mathematical methods can be used as auxiliary methods of linguistic research, being placed at the service of specific and limited linguistic problems. There can be no question of any "quantitative philosophy of language" in this case. Physics, psychology, physiology, logic, sociology, and ethnology in their time encroached on the independence of the science of language, but they could not subjugate linguistics. The opposite happened - linguistics took advantage of the achievements of these sciences and, to the extent necessary for itself, began to use their help, thereby enriching the arsenal of its research methods. Now, apparently, it's the turn of mathematics. It is to be hoped that this new community will also contribute to the strengthening of the science of language, the improvement of its working methods, and the increase in their diversity. It is, therefore, just as legitimate to speak of mathematical linguistics as of physical linguistics, physiological linguistics, logical linguistics, psychological linguistics, and<150>etc. There are no such linguistics, there is only one linguistics, which profitably uses the data of other sciences as auxiliary research tools. Thus, there is no reason to retreat before the onslaught of the new science and to easily yield to it the positions it has won. Here it is very appropriate to recall the words of A. Martinet: “Perhaps it is tempting to join one or another major movement of thought by using a few well-chosen terms, or to declare with some mathematical formula the rigor of one’s reasoning. However, the time has come for linguists to realize the independence of their science and to free themselves from that inferiority complex that makes them associate any of their actions with one or another general scientific principle, as a result of which the contours of reality always become only more vague, instead of becoming clearer. 21 .

Therefore, mathematics in itself and linguistics in itself. This by no means excludes their mutual assistance or a friendly meeting in joint work on common problems. This kind of place of application of the concerted efforts of the two sciences is the whole wide range of problems that are part of applied linguistics and are of great national economic importance. One should only wish that in their joint work both sciences showed maximum mutual understanding, which, undoubtedly, would also contribute to the maximum fruitfulness of their cooperation.<151>

Table of contents
Introduction
Chapter 1. The history of the application of mathematical methods in linguistics
1.1. The Formation of Structural Linguistics at the Turn of the 19th – 20th Centuries
1.2. Application of mathematical methods in linguistics in the second half of the twentieth century
Conclusion
Literature
Introduction
In the 20th century, there has been a continuing trend towards the interaction and interpenetration of various fields of knowledge. The boundaries between individual sciences are gradually blurring; there are more and more branches of mental activity that are "at the junction" of humanitarian, technical and natural science knowledge.
Another obvious feature of modernity is the desire to study structures and their constituent elements. Therefore, more and more place in scientific theory and in practice is given to mathematics. Coming into contact, on the one hand, with logic and philosophy, on the other hand, with statistics (and, consequently, with the social sciences), mathematics penetrates deeper and deeper into those areas that for a long time were considered to be purely "humanitarian", expanding their heuristic potential (the answer to the question "how much" will often help answer the questions "what" and "how"). Linguistics was no exception. My goal term paper- briefly highlight the connection between mathematics and such a branch of linguistics as linguistics. Since the 1950s, mathematics has been used in linguistics to create a theoretical apparatus for describing the structure of languages ​​(both natural and artificial). However, it should be said that she did not immediately find her own kind. practical use. Initially, mathematical methods in linguistics began to be used to clarify the basic concepts of linguistics, however, with the development of computer technology, such a theoretical premise began to be applied in practice. The resolution of such tasks as machine translation, machine information retrieval, automatic text processing required a fundamentally new approach to the language. A question has arisen before linguists: how to learn to represent linguistic patterns in the form in which they can be applied directly to technology. The term “mathematical linguistics”, which is popular in our time, refers to any linguistic research that uses exact methods (and the concept of exact methods in science is always closely related to mathematics). Some scientists of past years believe that the expression itself cannot be elevated to the rank of a term, since it does not mean any special “linguistics”, but only a new direction focused on improving, increasing the accuracy and reliability of language research methods. Linguistics uses both quantitative (algebraic) and non-quantitative methods, which brings it closer to mathematical logic, and, consequently, to philosophy, and even to psychology. Even Schlegel noted the interaction of language and consciousness, and the prominent linguist of the early twentieth century, Ferdinand de Saussure (I will tell about his influence on the development of mathematical methods in linguistics later), connected the structure of the language with its belonging to the people. The modern researcher L. Perlovsky goes further, identifying the quantitative characteristics of the language (for example, the number of genders, cases) with the peculiarities of the national mentality (more on this in Section 2.2, "Statistical Methods in Linguistics").
The interaction of mathematics and linguistics is a multifaceted topic, and in my work I will not dwell on all, but, first of all, on its applied aspects.
Chapter I. History of the Application of Mathematical Methods in Linguistics
1.1 The formation of structural linguistics at the turn of the XIX - XX centuries
The mathematical description of the language is based on the idea of ​​language as a mechanism, dating back to the famous Swiss linguist of the early twentieth century, Ferdinand de Saussure.
The initial link of his concept is the theory of language as a system consisting of three parts (language itself - langue, speech - parole, and speech activity - langage), in which each word (member of the system) is considered not in itself, but in connection with others. ...


Top