Stochastic texts Although program-controlled, electronic data processors were initially developed to satisfy the needs of applied mathematics and computational engineering, it was soon apparent that the possible system applications could far exceed these limits. Today there seem to be endless application possibilities. Non the less, many scientists are still under the false impression that the use of electronic data processors is bound to the use of numbers. A variety of programs has shown, however, that such an assumption is incorrect. At a conference on questions of information theory recently held in Paris, American scientists reported on a program based on logical examination, which was capable of determining, in no time at all, a Euclidean geometric theorem to be true or false. So it is a program to prove elementary geometric theorems. In addition to this, programs which can translate texts into another language have been around for some time. An American office engineering works has reported that it has a program which can précis a pre-defined scientific text. Obviously, the existence of such programs clearly demonstrates that
the use of program-controlled electronic data processors is not only
confined to problems concerning the concept of numbers. Such programs
give the concept "computing" a fundamentally more general
meaning. For the users of such a system it isn't decisive, what the
machine does; it is how the functions of the machine are interpreted
that is of sole importance. Thus it is imperative for the modern scientist
to know how to program an electronic data processor and to understand
the nature of its structures. It is his task to interpret those structures
according to his science. At this stage we shall report on a program
which the author recently executed on the electronic mainframe ZUSE
Z 22 at the T.H. Stuttgart computer center. The machine was used to
generate stochastic texts i.e. sentences where the words are determined
randomly. The Z 22 is especially suited to applications in extra-mathematical
areas. It is particularly suited to programs with a very logical structure
i.e. for programs containing many logical decisions. The machine's ability
to be able to print the results immediately, on demand, on a teleprinter
is ideal for scientific problems. A new number is formed from an initial number by an arithmetic operation, and from this number digits are taken by intersection, which are then considered to be a random number. The number generated by this operation is the initial number to determine the next random number. By continuing this process, a sequence of numbers is obtained. The random nature of these numbers is proved empirically by generating a sufficiently large amount of numbers and counting them. There must be a uniform distribution of random numbers in the given range. The existence of such a random generator essentially solves the problem of stochastic texts. The machine stores a certain number of subjects, predicates, logical operators, logical constants and the word "IST" (engl.: "IS"), coded as binary numbers. Using the first random number the machine forms the address (i.e. the position number in the store) of a subject by adding a constant which the machine now has at its disposal. In the on-following memory cell, the program locates a code number which it evaluates as gender of the subject in question, e.g. 0= masculine, 1= feminine and 2 = neutral. The machine then determines a logical operator using a new random number and coordinates this with the gender of the subject, using the located code number. At this stage a print-out is done for the first time e.g. the teleprinter prints: NICHT JEDER BLICK Then the word "IST" is printed and, using the random generator, a predicate and a logical constant are selected and printed. Thus the machine has formed e.g. the sentence: NICHT JEDER BLICK IST NAH and has determined a logical constant i.e. a conjunction which connects this elementary sentence with further elementary sentences e.g. with KEIN DORF IST SPAET The result is a couple of elementary sentences connected by a logical constant: NICHT JEDER BLICK IST NAH UND KEIN DORF IST SPAET The program is ended here and goes to the beginning again, forming further pairs of elementary sentences. The machine continues to work until it is turned off. For the following random text the machine stored 16 subjects and 16 predicates in all, selected from F. Kafka's "Das Schloss" ("The Castle"): DER GRAF DER FREMDE DER BLICK DIE KIRCHE Each of the given subjects or predicates is to appear with the same frequency i.e. with the same probability. The two elementary sentences in a pair are to be linked using the following logical constants: a) with "UND" ("AND") with a relative frequency
of 1/8 The following logical operators have been used with the same frequency the particularizer "EIN, EINE, EIN" ("A, AN") In addition, it should be pointed out, that with these sentence constituents 4x16x16 = 1024 different elementary sentences can be formed. These can be combined in elementary sentence pairs in (l024)² different ways; if we take into account that we know 4 different operation modes, a combination possibility for the given amount of sentence constituents arises of 4 x (1024)²= 4174304 different pairs of elementary sentences. The machine has randomly determined approx 50 such couples and 35 of these couples are listed below. It should be pointed out that this program - it consisted of approx 50 single commands without text - is expandable in various ways e.g. it is possible, within the pre-defined amount of subjects and predicates, to highlight words occurring more frequently, by storing them several times. The arising text will contain these words in a corresponding frequency. Furthermore, the basic amount of words can be selected with regard to a specific language. The machine then produces sentences in this language. It seems to be very significant that it is possible to change the underlying word quantity into a "word field" using an assigned probability matrix, and to require the machine to print only those sentences where a probability exists between the subject and the predicate which exceeds a certain value. In this way it is possible to produce a text which is "meaningful" in relation to the underlying matrix. Such a rectangular matrix contains e.g. the so-called transition probability of subject m to predicate n at point (m, n) i.e. this is a correlation number between these two constituent parts of a sentence. If one extends the program via a super program so that this is capable of increasing the transition possibilities between subject and predicate in those sentences found to be "meaningful", and of reducing other probabilities in accordance with the mathematical connection, then the machine has "learned" in a certain way: It prefers certain subject/object combinations during the course of time. The results so far let us hope that program-controlled electronic data processors can be used with great success in language research and analytical language areas. It is to be hoped that the distrust of some more traditionally minded philologists towards the achievements of modern technology will soon make way for widespread and fruitful co-operation. Stochastic texts . A selection NICHT JEDER BLICK IST NAH. KEIN DORF IST SPAET.
1 2 (Translation by Helen MacCormac, 2005) |