Theo Lutz: "Stochastische Texte", in: augenblick 4 (1959), H. 1, S. 3-9

Stochastic texts


by Theo Lutz (23.7.1932 - 31.1.2010)


Although program-controlled, electronic data processors were initially developed to satisfy the needs of applied mathematics and computational engineering, it was soon apparent that the possible system applications could far exceed these limits. Today there seem to be endless application possibilities. Non the less, many scientists are still under the false impression that the use of electronic data processors is bound to the use of numbers. A variety of programs has shown, however, that such an assumption is incorrect.

At a conference on questions of information theory recently held in Paris, American scientists reported on a program based on logical examination, which was capable of determining, in no time at all, a Euclidean geometric theorem to be true or false. So it is a program to prove elementary geometric theorems. In addition to this, programs which can translate texts into another language have been around for some time. An American office engineering works has reported that it has a program which can précis a pre-defined scientific text.

Obviously, the existence of such programs clearly demonstrates that the use of program-controlled electronic data processors is not only confined to problems concerning the concept of numbers. Such programs give the concept "computing" a fundamentally more general meaning. For the users of such a system it isn't decisive, what the machine does; it is how the functions of the machine are interpreted that is of sole importance. Thus it is imperative for the modern scientist to know how to program an electronic data processor and to understand the nature of its structures. It is his task to interpret those structures according to his science. At this stage we shall report on a program which the author recently executed on the electronic mainframe ZUSE Z 22 at the T.H. Stuttgart computer center. The machine was used to generate stochastic texts i.e. sentences where the words are determined randomly. The Z 22 is especially suited to applications in extra-mathematical areas. It is particularly suited to programs with a very logical structure i.e. for programs containing many logical decisions. The machine's ability to be able to print the results immediately, on demand, on a teleprinter is ideal for scientific problems.
Our program's task was to take over the laborious production of stochastic texts. In the past such texts were determined by selecting sentences or constituents of a sentence by throwing dice, or using some other random process, and then connecting these. It seemed reasonable for the program-controlled data processor to work with so-called random numbers, as a stochastic process. Basically, a random generator works in the following manner:

A new number is formed from an initial number by an arithmetic operation, and from this number digits are taken by intersection, which are then considered to be a random number. The number generated by this operation is the initial number to determine the next random number. By continuing this process, a sequence of numbers is obtained. The random nature of these numbers is proved empirically by generating a sufficiently large amount of numbers and counting them. There must be a uniform distribution of random numbers in the given range.

The existence of such a random generator essentially solves the problem of stochastic texts. The machine stores a certain number of subjects, predicates, logical operators, logical constants and the word "IST" (engl.: "IS"), coded as binary numbers. Using the first random number the machine forms the address (i.e. the position number in the store) of a subject by adding a constant which the machine now has at its disposal. In the on-following memory cell, the program locates a code number which it evaluates as gender of the subject in question, e.g. 0= masculine, 1= feminine and 2 = neutral. The machine then determines a logical operator using a new random number and coordinates this with the gender of the subject, using the located code number. At this stage a print-out is done for the first time e.g. the teleprinter prints:

NICHT JEDER BLICK
(engl.: NOT EVERY LOOK)

Then the word "IST" is printed and, using the random generator, a predicate and a logical constant are selected and printed. Thus the machine has formed e.g. the sentence:

NICHT JEDER BLICK IST NAH
(NOT EVERY LOOK IS NEAR)

and has determined a logical constant i.e. a conjunction which connects this elementary sentence with further elementary sentences e.g. with

KEIN DORF IST SPAET
(NO VILLAGE IS LATE).

The result is a couple of elementary sentences connected by a logical constant:

NICHT JEDER BLICK IST NAH UND KEIN DORF IST SPAET
(NOT EVERY LOOK IS NEAR AND NO VILLAGE IS LATE)

The program is ended here and goes to the beginning again, forming further pairs of elementary sentences. The machine continues to work until it is turned off.

For the following random text the machine stored 16 subjects and 16 predicates in all, selected from F. Kafka's "Das Schloss" ("The Castle"):

DER GRAF DER FREMDE DER BLICK DIE KIRCHE
DAS SCHLOSS DAS BILD DAS AUGE DAS DORF
DER TURM DER BAUER DER WEG DER GAST
DER TAG DAS HAUS DER TISCH DER KNECHT
OFFEN STILL STARK GUT SCHMAL NAH NEU
LEISE FERN TIEF SPAET DUNKEL FREI
GROSS ALT WÜTEND 1

Each of the given subjects or predicates is to appear with the same frequency i.e. with the same probability. The two elementary sentences in a pair are to be linked using the following logical constants:

a) with "UND" ("AND") with a relative frequency of 1/8
b) with "ODER" ("OR") with a relative frequency of 1/8
c) with "SO GILT" ("THEREFORE") with a relative frequency of 1/8
d) with a full stop "." with a relative frequency of 5/8.

The following logical operators have been used with the same frequency

the particularizer "EIN, EINE, EIN" ("A, AN")
the generalizer "JEDER, JEDE, JEDES" ("EVERY")
the negative particularizer "KEIN, KEINE, KEINES" ("NO") and
the negative generalizer "NICHT JEDER, NICHT JEDE, NICHT JEDES" ("NOT EVERY").

In addition, it should be pointed out, that with these sentence constituents 4x16x16 = 1024 different elementary sentences can be formed. These can be combined in elementary sentence pairs in (l024)² different ways; if we take into account that we know 4 different operation modes, a combination possibility for the given amount of sentence constituents arises of 4 x (1024)²= 4174304 different pairs of elementary sentences. The machine has randomly determined approx 50 such couples and 35 of these couples are listed below.

It should be pointed out that this program - it consisted of approx 50 single commands without text - is expandable in various ways e.g. it is possible, within the pre-defined amount of subjects and predicates, to highlight words occurring more frequently, by storing them several times. The arising text will contain these words in a corresponding frequency. Furthermore, the basic amount of words can be selected with regard to a specific language. The machine then produces sentences in this language.

It seems to be very significant that it is possible to change the underlying word quantity into a "word field" using an assigned probability matrix, and to require the machine to print only those sentences where a probability exists between the subject and the predicate which exceeds a certain value. In this way it is possible to produce a text which is "meaningful" in relation to the underlying matrix.

Such a rectangular matrix contains e.g. the so-called transition probability of subject m to predicate n at point (m, n) i.e. this is a correlation number between these two constituent parts of a sentence. If one extends the program via a super program so that this is capable of increasing the transition possibilities between subject and predicate in those sentences found to be "meaningful", and of reducing other probabilities in accordance with the mathematical connection, then the machine has "learned" in a certain way: It prefers certain subject/object combinations during the course of time. The results so far let us hope that program-controlled electronic data processors can be used with great success in language research and analytical language areas. It is to be hoped that the distrust of some more traditionally minded philologists towards the achievements of modern technology will soon make way for widespread and fruitful co-operation.

Stochastic texts . A selection

NICHT JEDER BLICK IST NAH. KEIN DORF IST SPAET.
EIN SCHLOSS IST FREI UND JEDER BAUER IST FERN.
JEDER FREMDE IST FERN. EIN TAG IST SPAET.
JEDES HAUS IST DUNKEL. EIN AUGE IST TIEF.
NICHT JEDES SCHLOSS IST ALT. JEDER TAG IST ALT.
NICHT JEDER GAST IST WUETEND. EINE KIRCHE IST SCHMAL.
KEIN HAUS IST OFFEN UND NICHT JEDE KIRCHE IST STILL.
NICHT JEDES AUGE IST WUETEND. KEIN BLICK IST NEU.
JEDER WEG IST NAH. NICHT JEDES SCHLOSS IST LEISE.
KEIN TISCH IST SCHMAL UND JEDER TURM IST NEU.
JEDER BAUER IST FREI. JEDER BAUER IST NAH.
KEIN WEG IST GUT ODER NICHT. JEDER GRAF IST OFFEN.
NICHT JEDER TAG IST GROSS. JEDES HAUS IST STILL.
EIN WEG IST GUT. NICHT JEDER GRAF IST DUNKEL.
JEDER FREMDE IST FREI. JEDES DORF IST NEU.
JEDES SCHLOSS IST FREI. NICHT JEDER BAUER IST GROSS.
NICHT JEDER TURM IST GROSS ODER NICHT JEDER BLICK IST FREI.
EINE KIRCHE IST STARK ODER NICHT JEDES DORF IST FERN.
JEDER FREMDE IST NAH SO GILT KEIN FREMDER IST ALT.
EIN HAUS IST OFFEN. KEIN WEG IST OFFEN.
EIN TURM IST WUETEND. JEDER TISCH IST FREI.
EIN FREMDER IST LEISE UND NICHT JEDES SCHLOSS IST FREI.
EIN TISCH IST STARK UND EIN KNECHT IST STILL.
NICHT JEDES AUGE IST ALT. JEDER TAG IST GROSS.
KEIN AUGE IST OFFEN. EIN BAUER IST LEISE.
NICHT JEDER BLICK IST STILL. NICHT JEDER TURM IST STILL.
KEIN DORF IST SPÄT ODER JEDER KNECHT IST GUT.
NICHT JEDER BLICK IST STILL. EIN HAUS IST DUNKEL.
KEIN GRAF IST LEISE SO GILT NICHT JEDE KIRCHE IST WUETEND.
EIN BILD IST FREI ODER EIN FREMDER IST TIEF.
EIN GAST IST TIEF UND KEIN TURM IST FERN.
EIN GAST IST LEISE. JEDES BILD IST FERN.
EIN TISCH IST OFFEN. JEDER KNECHT IST FREI.
JEDER TURM IST NEU UND EIN BILD IST ALT.
NICHT JEDER TISCH IST GROSS ODER JEDES DORF IST ALT.2


Notes

1
THE COUNT THE STRANGER THE LOOK THE CHURCH
THE CASTLE THE PICTURE THE EYE THE VILLAGE
THE TOWER THE FARMER THE WAY THE GUEST
THE DAY THE HOUSE THE TABLE THE LABOURER
OPEN SILENT STRONG GOOD NARROW NEAR NEW
QUIET FAR DEEP LATE DARK FREE
LARGE OLD ANGRY

2
NOT EVERY LOOK IS NEAR. NO VILLAGE IS LATE.
A CASTLE IS FREE AND EVERY FARMER IS FAR.
EVERY STRANGER IS FAR. A DAY IS LATE.
EVERY HOUSE IS DARK. AN EYE IS DEEP.
NOT EVERY CASTLE IS OLD. EVERY DAY IS OLD.
NOT EVERY GUEST IS ANGRY: A CHURCH IS NARROW.
NO HOUSE IS OPEN AND NOT EVERY CHURCH IS SILENT.
NOT EVERY EYE IS ANGRY. NO LOOK IS NEW.
EVERY WAY IS NEAR.NOT EVERY CASTLE IS QUIET.
NO TABLE IS NARROW AND EVERY TOWER IS NEW.
EVERY FARMER IS FREE. EVERY FARMER IS NEAR.
NO WAY IS GOOD OR NOT EVERY COUNT IS OPEN.
NOT EVERY DAY IS LARGE. EVERY HOUSE IS SILENT.
A WAY IS GOOD. NOT EVERY COUNT IS DARK.
EVERY STRANGER IS FREE. EVERY VILLAGE IS NEW.
EVERY CASTLE IS FREE. NOT EVERY FARMER IS LARGE.
NOT EVERY TOWER IS LARGE OR NOT EVERY LOOK IS FREE.
A CHURCH IS STRONG OR NOT EVERY VILLAGE IS FAR
EVERY STRANGER IS NEAR THEREFORE NO STRANGER IS OLD
A HOUSE IS OPEN. NO WAY IS OPEN.
A TOWER IS ANGRY. EVERY TABLE IS FREE.
A STRANGER IS QUIET AND NOT EVERY CASTLE IS FREE.
A TABLE IS STRONG AND A LABOURER IS SILENT.
NOT EVERY EYE IS OLD. EVERY DAY IS LARGE.
NO EYE IS OPEN. A FARMER IS QUIET.
NOT EVERY LOOK IS SILENT. NOT EVERY TOWER IS SILENT.
NO VILLAGE IS LATE OR EVERY LABOURER IS GOOD.
NOT EVERY LOOK IS SILENT. A HOUSE IS DARK.
NO COUNT IS QUIET THEREFORE NOT EVERY CHURCH IS ANGRY.
A PICTURE IS FREE OR A STRANGER IS DEEP.
A GUEST IS DEEP AND NO TOWER IS FAR.
A GUEST IS QUIET. EVERY PICTURE IS FAR.
A TABLE IS OPEN. EVERY LABOURER IS FREE.
EVERY TOWER IS NEW AND A PICTURE IS OLD.
NOT EVERY TABLE IS LARGE OR EVERY VILLAGE IS OLD.

(Translation by Helen MacCormac, 2005)