The Understanding Machine

Questions whether artificially intelligent machines can understand.

I’ve had an ongoing interest in the topic of artificial intelligence and in the idea that machines might one day be able to think and understand. I’ve written several blogs where I compare the biology of thinking organisms with computers. In those blogs I tend to highlight the differences between the two and I tend to criticize computationally intelligent machines. This blog will be no different in that respect, but here I would like to delve into the more philosophical subject of language acquisition. I want to highlight some properties of language which are necessary so thinking machines can understand languages. The question that comes to mind is does it matter whether machines can actually understand language. Some intelligent machine engineers and researchers might say that whether a machine understands has no implications on the pragmatic function of intelligent machines. But I think if machines understood they could probably do a lot more, computationally. In my opinion only an understanding machine has any chance of passing the Turing Test. This is because an understanding machine would be able to consistently compose meaningful sentences and it would be able to communicate with us efficiently. If the machine doesn’t possess any semantic properties it must rely on syntax, such a machine would only be able to compose a limited number of intelligent sentences since any enduring linguistic communication is primarily determined by the semantic composition of words and sentences.

Words and Associations

In the Chinese Room experiment John Searle proposed that artificially intelligent machines, even if they were conscious, would not be able to understand language because of how they function. Computers manipulate symbols and words based on syntactic instructions which are pre-written in the program. The computer is only following instructions, and it does that very well, but there is little learning or understanding involved when the computer is processing these instructions and generating an output. It seems that it’s hard to understand how an intelligent being begins to understand language. And this is of course understandable, because the understanding of language relies on the emergence of meaning. One has to understand the meaning of words in order to understand language. So we can think about how the meaning of words begins to emerge. We can take a simple example from our everyday experience and we can look at how children learn a new language. Children learn to speak before they learn how to read, they don’t have the convenience of consulting dictionaries. We learn very few words by consulting dictionaries and if we think about the emergence of language in general, language must have originated before words were institutionally defined. I am suggesting that we learn language by associating words with experiences, phenomena and concepts.

Because words are defined by associations, there are debates over the meaning of words. It could be why words change over time and in different cultural contexts. Further because words are defined by associations they have an abstract property where one can define several different phenomena with the same word as long as one can conceptually associate these phenomena with the given word. For instance whether we observe an actual cat directly or a picture of a cat we would associate both objects with the word cat. If someone asked us to identify the cat we would identify the cat in the picture and the actual cat in front of us. The word cat in this example, which I think can be generalized to how we use all words, is associated both with the picture and the actual cat. We also see a connection between the picture and the actual cat so we are associating those two phenomena as well. Finally both phenomena and the word “cat” are going to be associated with our general concept of a cat.

Such associations and ambiguity in language provides the right amount of abstraction which makes definitions, meanings and words generalizable to a wide range of phenomena. It also appears that one cannot separate the meaning of the word from the concept because the concept provides the necessary abstraction that makes the association between various concrete objects and the word possible, for instance both the picture of a cat and the actual cat are associated with each other and the word cat because they are associated with a common concept.

If what I wrote so far seems plausible then we can conclude several things from what I stated above. If we learn what words mean by associations and if words and their meaning originated because of associations, then associations are necessary for the emergence of meaning in words. If meaningful words are necessary for the understanding of words and language then we can also conclude that association is necessary for the understanding of language. However association may not be sufficient for the emergence of meaning and the understanding of language.

The Problem of Precise Definitions

The meaning of the word is associated with a phenomenon it is describing. However we can also semantically associate a given word with other words which describe or define the word in question. Often times we attempt to provide a precise definition of a word in terms of other words. For instance we can identify the word tree and the object it is associated with but we can’t describe what the word “tree” means, precisely, unless we use other words to provide an accurate description. We can say that a tree is a living organism, it has branches and it has leaves and so on.

However language acquisition through precise definitions is impossible because if we fail to associate words with anything else other than words, we are stuck in a cycle of ignorance. If we start looking for the precise definition of any given word we have to understand it in terms of other words but since we don’t understand what those words mean we have to look for their definition as well. We endlessly keep looking for a definition through words we don’t understand. If we fail to associate words with anything other than words we fail to understand what the words mean.

Further the concept of precise definitions is also problematic and this is because the precise definition of a word is related in terms of other words. We can think of a given word W, this word is precisely defined by another set of words W1. However because all the words in the set W1 also have to be precisely defined there has to be another set of words W2 which define all the words in the set W1. But since the set of words in W2 also need to be precisely defined there has to be yet another set of words W3 that define all the words in the set W2.

The problem here is that once we start to define words in terms of other words we end up going into an infinite regress of definitions in order to find the precise definition of any given word. We endlessly define without ever hitting the precise definition. Words are naturally vague and ambiguous, a precise definition of words cannot exist.

Neural Networks and Computation

Previously I highlighted some of the differences between brains and computers. Here I am going to talk about yet another difference. The difference I would like to highlight here is the one between central processing and decentralized processing. Many computers utilize central processing units, CPUs, however in the human brain there is no such thing as central processing. The processing that occurs in the brain is highly decentralized. The processing in the brain is done at the level of individual nerve cells. I mentioned this in an earlier blog, we can think of every neuron in the human brain as a microscopic computer or processor.

This kind of decentralized processing may offer certain computing advantages over CPUs. Anything which mimics how neural networks work usually offers some advantage over traditional computing. Recently Stanford's Artificial Intelligence Lab constructed the biggest artificial neural network which is engineered to mimic the function of a human brain. A smaller model that Google built was able to identify cats in youtube videos the same way a person would. This operation mimics a person’s ability to identify objects in videos through association and abstraction. If one were to hypothetically further this computational model one can try to, for instance, build a computer that could recognize cats in videos, pictures of cats and real live cats as well. This kind of identification would require a whole other level of abstraction which is more akin to human abstraction. Then we can take this kind of computing even further and build computers that associate the many representations of cats with the word cat. It seems that neural network architecture is a move in the right direction towards building artificially intelligent machines.

But we are concerned with whether the artificial neural networks understand what cats are. Is the change in computing power or processing architecture, as a result of neural networks architecture, going to change the computer’s inability to understand? Well if we assume the computer is running on pre-written software and we apply Searle’s Chinese room thought experiment we would quickly come to the conclusion that the computer does not understand what cats are although the fact that it can recognize cats would be a great feat of computational achievement. The computing power or processing architecture that results from switching to artificial neural networks is not going to change the instructions in the software. The artificial networks would still follow instructions and generate an output, the machine does not need to understand what cats are in order to do that. The computer simply needs to follow the instructions in the program.

We can imagine a computer that makes great associations but it is still following instructions and therefore meaning doesn’t emerge and neither does understanding. The computer makes precise associations that mimic human associations but because it runs on a pre-written program it does not understand what it is identifying. Although the artificial neural network like the one Google constructed has the capacity to learn and teach itself, it is not just following instructions. It appears that association by itself is not sufficient for the emergence of meaning and understanding. We can conclude that association is necessary but not sufficient for the emergence of meaning and understanding.

The Learning Machine

Learning is crucial for language semantics. I say this because from what we know about language acquisition, one cannot develop the semantics and the understanding of language without first learning the language. The process of learning language is also the process which allows us to also start understanding language. Language acquisition is therefore developmental by nature, it is something that must develop through interaction between an organism and the environment.

Learning by association also fits into this developmental property of language. The organism learns language by associating experiences and phenomena with words and concepts. We know from fact and observation that an organism must first learn a language before it can understand language. Learning the meaning of words through association with other words is problematic. This method of learning will not allow for understanding because of reasons stated above. Similarly we can think of a Turing machine which has a database of all the words, books and encyclopedias in the world. This machine has an endless supply of literature and words. It has enough computing power to associate words with words, words with sentences and from these correlations it is able to compose intelligible sentences. But for the same reasons stated above such a computer cannot understand the meaning of the words. It can only correlate words with words, and words with sentences. Under these conditions semantics and understanding cannot emerge. The machine knows how words are correlated with each other but it will not know how these words are correlated to the phenomenon they are supposed to describe.

What if we added pictures to the database and we programmed the machine to associate those pictures with words? In that case the computer might make intelligible associations between pictures and words but as stated above we come across a problem where the computer might be associating pictures and words very accurately but it fails to understand because it is running on a static pre-determined code. Organisms that understand language cannot rely on a static pre-determined code and this is because one cannot program the inductive knowledge which is necessary for language acquisition using a static pre-written code. The machine might have an infinite number of words and images programmed in its dataset but if it cannot encounter the actual phenomena words are meant to describe it cannot have perception and understanding of what those phenomena mean. It cannot understand how the words and images stored in its database are related to existing non-virtual objects for instance. Even if the computer does have a pre-written code to respond to an infinite number of non-virtual objects and associate those with pictures and words it still cannot understand because it lacks the capacity to acquire inductive knowledge.

If we posit that a machine can understand without acquiring inductive knowledge it would be similar to stating that the machine can somehow know or understand facts about the world without encountering them or observing them. It seems a little ridiculous to claim that a machine can acquire such “magical” or “psychic” knowledge, especially when we are discussing the kind of knowledge which is necessary for language acquisition.

A machine that runs on a static pre-written code is incapable of acquiring inductive knowledge because upon observation or upon encounter of a particular fact the machine needs to undergo changes in processing or operation. The state of the machine has to transition from “not knowing that particular fact” to “knowing that particular fact.” Such changes are regularly observed in organisms that have the capacity to acquire inductive knowledge and learn. Learning a language requires some degree of creative self-processing because learning something new alters processing in a way which is not predictable by a static pre-written program.

I will admit that the term creative self-processing is somewhat vague. A creative self-processing machine would alter its own processing for instance. But we could always ask at what point is the machine altering its own processing, and at what point is the environment altering the processing for the machine. But we can say that any machine which evolves and acquires linguistic skills through association with its environment is probably learning because this is exactly how people learn languages.

The machine doesn’t have a pre written program for language but it picks up language as it interacts with the speakers it comes across and the surrounding environment. We could say that such a machine is probably learning. But if it is learning by association can we assume that the machine understands? We could assume that it understands but this interpretation of understanding is a little simplistic for several reasons. I will go into detail about this later.

Other important aspects in human learning are related to the plasticity of the brain. During learning, the brain’s connectivity undergoes various changes, new connections and new synapses are being formed. We know that as new neural synapses are being formed this phenomenon will alter the nature of processing at individual neurons. In the human brain, not only are we concerned with decentralized processing but the processing of the processing units changes as the human brain learns. There are also many other factors which are involved in human learning such as epigenetic factors which alter gene expression in the neural cells, once again altering the nature of the processing.

I would like to emphasize that human learning is a dynamic system where constant changes occur in the processing and these changes might have an effect on how humans come to learn and understand language. These kinds of changes are the the kind of changes that we would expect a creative self-processing machine to undergo.

How Do We Know When a Machine Begins to Learn?

Does the machine begin to learn while we are building it? Or does it learn when all the hardware is in place and we just upload the software and spark life into the dead piece of silicon? There is no way to directly answer these questions because we do not even have a clear model of how understanding emerges in complicated systems like the human brain. We can only rely on certain brain structures and functions that correlate with the emergence of understanding in humans. We can hypothesize that such brain structures and functions are necessary for processing new knowledge. Analogously we can say that once we manage to build structural and dynamic functional units that correlate with learning behavior in a machine we can say that the machine is acquiring new knowledge. If we don’t have evidence of such structural and functional units in the machine we do not have a good reason to assume that the machine is capable of acquiring inductive knowledge.

Is the Learning Machine an Understanding Machine?

Previously I stated that association is necessary for understanding languages but it is probably not sufficient, learning is required as well. Now we can ask if learning by association is sufficient for understanding. Once again the answer is probably no. We know that both learning and association are important for language acquisition but we are still a long way from understanding the understanding machine. For instance we know that there are computers which play chess and which have learning algorithms but is it valid to claim that these computers understand what chess is? I don’t think we can say that the chess learning computers understand chess because they don’t have a wide enough understanding of the cultural context in which chess is played and the context in which chess is defined as a leisure game or a sport. The chess playing computer therefore cannot understand chess, or it cannot understand chess the way we understand it.

For us, understanding appears to emerge from perceiving an integrated network of phenomena and experiences. A machine which lacks such perception will not be able to understand things the way we do. And it would be ridiculous for us to expect the machine to understand phenomena the way we do if we don’t share a common means of perceiving with the machine.

If we want to build a machine which understands language, we have to ask ourselves what else does the machine need to understand besides language. Like chess language is probably not an isolated phenomenon, it is integrated with many other aspects of our phenomenal experience which we must be aware of and we must understand before we can start to develop our understanding of language. To conclude, here I have outlined two necessary conditions that must be satisfied before linguistic understanding can emerge in a machine:

The machine must be able to associate experiences and phenomena with words and concepts.
The machine must be able to learn.

The list that might eventually allow us to build an understanding machine is probably a lot bigger than this as the understanding of language probably requires the understanding of a wider and more integrated phenomenal experience.