A recent experience reminded me that if I want to get anything done in life that requires significant brainpower (at least the kind that allows one to develop new abstractions), I had better do it while I’m still relatively young. It seems like peoples’ thought patterns follow a predictable trajectory over the course of their lifetimes: first, as infants, they form seemingly random connections; then, through childhood and adolescence, they’re prone to generalization; then, through adulthood and early middle age, abstraction; and finally, in late middle and old age, they become less likely to form new abstractions and instead focus on concrete matters. This reflects the development of the brain, which starts out with a vast number of connections that are progressively winnowed down over the course of a lifetime. My dad compared this to sculpture, where excess material is gradually removed until only the target form remains. This suggests two things. First, if possible, one must direct the course of sculpture towards a goal state that will be of use in later years. One finds the best examples of this by considering careers that lend themselves towards working well beyond the normal retirement age–judges and teachers, for example. Second, one should take full advantage of the period in which one’s capacity for abstraction is at its peak. I suspect this is the reason why mathematicians are widely expected to produce their best work before they turn 30; mathematics is abstraction in its most pure form.
After a lot of thinking and Wikipedia research, I’ve realized that I was somewhat naive in my choice of test bed. Handling the real-valued inputs and outputs of the agents in the previous post poses a challenge beyond that of handling discrete values–for example, in storage, how does one group similar values in order to avoid creating a new mapping for every unique number, no matter how slight the difference between it and its neighbors? The easiest solution would be to use uniform quantization, though that would require us to know in advance the desired granularity. In researching alternatives that would remove that requirement, I learned about the general problem of clustering, and algorithms such as k-means clustering, which groups similar values together into a predetermined number (k) of clusters using a simple iterative process. An extension of this is fuzzy c-means clustering, which removes the hard boundaries between clusters in favor of granting each point a weighted degree of membership in each cluster. An agent that learns over time would require an online/sequential version of the algorithm. For example, consider a “fuzzy map” data structure with capacity k, requiring a distance function for keys and blend functions for keys and values. The first k values would be inserted unchanged, with retrieved values blended according to key distance. Subsequent insertions would adjust the key/value pairs according to their distance to the new key, the total weight of values already inserted, and perhaps an additional factor to allow more recent mappings to overcome the inertia of historical ones. An advantage of this approach is that it sets a bound on the amount of memory (and lookup time, insertion time, etc.) required for the mapping.
On considering how the agent must process values, it becomes clear that derived data–values obtained through transformations of the raw stream–are often more significant than the originals. For example, when a mouse sniffs around to locate the source of a scent, the important datum is whether the intensity of the smell increases or decreases with each movement: if the smell decreases when the mouse rotates left, then the source of the smell is to the right. This suggests providing the scent delta instead of or in addition to the absolute intensity. In general, the problem of augmenting a stream of raw values with layers of derived information suitable for input into a higher level cognitive algorithm is likely to require both simple transformations like differences and more advanced analyses such as the identification of repeated patterns, even when they occur at different scales or contain loops or gaps. This pattern recognition requirement is one of the most compelling reasons to think that the animal brain, with its robust pattern matching ability enabled by massive parallelism, may have a distinct advantage over any currently realizable artificial system–at least in terms of interacting with the real world, or any artificial environment that approaches its complexity. For instance, consider that the sparse coding believed to be employed by the visual cortex uses a large number of neurons to analyze small sections of the visual field in a manner that is robust to input noise, scaling, and translation.
Apart from simple transformations and pattern identification, it may be necessary to process input further in order to create mental models of context. That is, the brain’s representation of the current environment may not be created–or created entirely–by the principal cognitive process, but rather by a preprocessing phase that specializes in determining spatial relationships. The equivalent in simulation terms would be to provide the locations, shapes, movement vectors, etc., of objects in the environment (including the agent itself) instead of or in addition to the raw sensory inputs. Indeed, assuming that more information is always more useful than less (assuming effective means of determining relevance) and that an emergent process would have to derive such spatial information anyway, it would seem that providing the information explicitly is a no-brainer (so to speak). However, considering this and the many other layers of derived data with which we plan to augment the raw input, we rapidly encounter problems relating to the curse of dimensionality. For example, were we to use the k-nearest neighbors algorithm to find historical contexts similar to the current one in order to find outputs likely to result in reward, the proximity in relevant inputs would likely be drowned out by differences in irrelevant ones.
A 2D environment seems like a reasonable test bed for simulating animal behavior: simple to implement, efficient to execute, and providing ample opportunities to tweak and extend both agents and environment. Following the scientific tradition of running rodents through mazes, I am imagining the agents as “mice” pursuing food “pellets” within a bounded space, receiving input from their virtual senses and producing movement controls as output. At least for the first experiments, the cognitive model (and the simulation) will execute in discrete time steps. At each step, the mice will collect input from their senses and provide it to their cognitive model, which will update its state and provide output. The simulation will enact the chosen output, and report back to the model with reinforcement. For now, input will consist of “smell” derived from the proximity of nearby pellets; output will consist of rotation and forward advancement; and reinforcement will consist of a unit of “pleasure” when the mouse consumes a pellet.
The first goal will simply be to have the mice learn to follow smells in order to consume pellets more efficiently than they would by moving randomly. Assuming that can be accomplished, further experiments would have them learn to navigate both static and moving obstacles, and eventually to traverse complicated courses such as mazes. In the course of experimentation, I expect to expand their sensual repertoire to include sight and touch.
Here’s the control case: entirely random movement.
Recently, I’ve been trying to develop a low-level model of animal behavior in the hopes of expanding it into a general model of feedback-based cognition, since higher cognition presumably evolved from simpler processes. The goal of this exercise is to find a model capable of “learning” at the most basic and natural level: e.g., “If I push this button, I receive immediate pleasure; therefore, I shall push the button as often as possible,” or “when I experience something like this sequence of inputs, and I subsequently produce something like this sequence of outputs, I will receive pleasure in roughly this amount of time.” Note that “pleasure” in these cases is provided externally; it is simply the abstract input whose maximization is the goal of the model, though it can be assumed to represent the satisfaction of physical desires, such as hunger. Also note that the model is probabilistic and “fuzzy”: rather than learning and reproducing exact sequences, it recognizes input with relative certainty and may produce output with stochastic variations.
The essential form of the model requires that it generate output (O) over time (t), based on input (I) and pleasure (P). While P is assumed to be real-valued, I and O are generic, and may assume any type that provides the set of functions, such as random() and distance(A, B), required by the model. To simplify the definition of prospective models, it’s useful to divide them into two separate categories: discrete time models, wherein t progresses in unit steps (0, 1, 2, 3…) and at each step the model accepts values for I and P and generates a value for O; and continuous time models, in which I and P events may be received, and O events generated, at any real-valued t. Discrete models are easier to generate and analyze, but continuous models are better representations of real world behavior.
In both cases, the model must choose O to maximize expected P, E(P). A perfectly rational model would presumably attempt to maximize total E(P) (the sum of all expected pleasure events over the lifetime of the model), but assuming that the model’s lifetime can’t be known in advance, and considering that the animal behavior I’m attempting to model seems generally to prefer immediate over delayed gratification, it seems reasonable instead to have the model attempt to maximize E(P) within a finite span ahead of the current time, weighting more immediate expected pleasure events higher than more distant ones. I’ll refer to this metric as the immediacy-weighted expected pleasure (IWEP). In the discrete case, the goal of the model is to select O at each time step to maximize IWEP, given the historical values for I, O, and P at previous time steps, and the current I.
As a baseline for comparison, one approach would simply be to choose randomly: O = random(). More usefully, one could assume that the best way to predict the optimal output is to compare the current context (the sequence of preceding I, O, and P, and the current I) to historical contexts, producing a new output by combining previous outputs according to the total immediacy-weighted pleasure that followed them, including random changes to avoid stagnating in local maxima. This approach is analogous to optimization via genetic algorithms: the principal force is the combination of previously successful candidates via a crossover function that merges their attributes, while a mutate function serves to introduce variation. The model would start by generating random output, only attempting to reproduce earlier outputs when they seemed sufficiently likely to result in higher pleasure. In all likelihood, the model would periodically have to return to experimenting with (increasingly) random output in order to expand the search space. This “experimentation” mode or factor would have to take effect or become dominant either when the model can afford to take the risk (because immediate pleasure is guaranteed, perhaps), or when it must (because no suitable existing solution is known).
One of the most interesting further steps in developing such a model is the incorporation of feedback and the subsequent development of a search process within the optimization framework. The next step towards true cognition is the ability to traverse networks of associations formed by past experience in order to find paths representing better solutions to current problems than those that may be generated through simple combination and mutation of previous solutions.
Although I’ve decided to put Witgap on hold while I attempt to write some actual code in support of my AI investigations, I wanted to document one additional aspect of the game: its theme. I’ve given a fair amount of thought to the technical basis of Witgap, but fairly little to its content. One of the crucial factors determining the success of any game is the combination of imagined settings, characters, plots, and devices that transform a set of data artifacts into a virtual world.
One thing I want to avoid is leaving the theme completely open, like that of Whirled. In the same way that complete freedom of gameplay saps players’ motivation, complete freedom of theming starves their imaginations. People thrive under constraints. However, the themes typical of online games–notably, those derived from fantasy and science fiction–can lead to overconstraint. For instance, in a world of orcs and wizards, a spaceship would seem out of place; likewise, a “hard” science fiction setting has no room for gods and magic. More importantly, the theme of a world tends to limit the stories one can tell or that audiences are willing to accept. This is the classic problem with “genre fiction”: the demands of the genre, be they establishing a unique and consistent fantasy world or following a formula like that of the mystery, precolor the plot and characters of the story. In particular, genre trappings tend to expand the scale of drama at the expense of subtlety: when murder is afoot or the fate of the universe is at stake, the minute drama that we experience every day falls beneath the noise floor. Fiction without genre, on the other hand–in particular, fiction that employs as its setting the world that its readers live in, at least as a baseline–starts with a neutral white, and provides a greater opportunity to explore subtlety of experience. Additionally, if one is to compare actual present reality to any imagined fictional world or historical setting, it becomes clear that the real world has vastly more source material with which to build.
One could argue that science fiction subsumes present reality, particularly when it employs devices like Doctor Who’s time travel, Star Trek’s holodeck excursions, or the simulated world of The Matrix. Similarly, “urban fantasy” settings augment reality with monsters and magic. Unfortunately, all of these devices end up trivializing the real world: again, our day to day experiences become inconsequential in the face of attacks by aliens and robots and vampires. Conversely, one could explore fantastic settings in dreams, hallucinations, or stories embedded within a container of realism–but that would in turn trivialize the fantasy worlds. The “magic realism” approach, in contrast, manages to blend fantastic elements into actual reality without sacrificing subtlety, but it does so at the expense of consistency. Where fantasy and science fiction thrive by constructing and obeying novel sets of rules for their environments, making them well-suited to the algorithmic world of gaming, magic realism relies on the unpredictable whims of the author. As soon as a magic realist world solidifies and becomes consistent in the mind of the reader, it becomes “mere fantasy.”
Still, magic realism–or some internally consistent approximation thereof–is the most promising basis for the theme of Witgap. Starting with a representation of actual, current reality as a baseline will provide the greatest level of freedom to expand into fantastic and speculative branches. The ideal is “reality++”: a versatile version of our present reality that will support excursions into other realities and inclusions of their aspects and elements without trivializing or rendering unreal either the primary or the alternate dimensions, all while maintaining a set of consistent game rules and a continuity of experience. Part of the challenge will be that of “scale reset”: because stakes may vary greatly from one story to the next, there must exist some mechanism to ensure that the epic scope of one story doesn’t overshadow the subtle realism of the next.
The practicality of realism is one of the major factors separating games as a medium from books and movies: realistic novels are just as expensive to produce as fantastic ones, and realistic films are cheaper, but the cost of producing games increases dramatically with increased graphical realism. This is one of the reasons that independent films tend to focus on realistic stories, whereas independent games tend towards abstract representation and cartoonishness, applying their innovation towards mechanics. A game like Witgap, however, that avoids or limits graphical representation, is free to represent both realistic and fantastic settings to an extent that graphical games cannot match.
Recently, I’ve been trying to think of applications to drive my AI research. Firstly, I know from experience that developing technology without an application (or set of applications) in mind leads to a state of aimless, meandering expansion. When I worked on NPSNET-V, for instance, I never managed to come up with a “killer app” to drive development of the framework. While I had the pie-in-the-sky ideal end state in mind (that of the Metaverse, or at least an equivalent in terms of networked military simulations), I didn’t bother to envision the intermediate state: a modestly scoped, easily demonstrable application based on the framework that would clearly show the abilities of the architecture. Without such an application in mind, I simply added features according to what seemed interesting at the time: an HTTP server module exposing application state as serialized XML, a Telnet server module allowing remote manipulation, etc. In contrast, when I created Clyde, I had Spiral Knights in mind at all times. Although I intended the library to be generalizable to other games, having Spiral Knights as a baseline was crucial as a source of inspiration and focus.
Secondly, I’ve been considering how to reach an optimal state in my own life, and it seems like the ideal situation would allow me to develop my AI project in a setting that fulfills my other desire: to work in a close-knit group of smart and motivated people in a sustainable fashion. It seems like one possible way to effect that scenario would be to design a product based on the technology and either form a start-up to develop it or convince an existing organization to take it on.
So far, the most promising application idea I’ve had is a combination virtual pet and recommendation system (and/or other systems providing “useful” functionality on top of the virtual pet experience). The trick with getting users to train an AI system is that they have to put up with and push past the early stages of learning, where the output will consist primarily of useless (but perhaps entertaining) babble. Children seem to manage by being cute and triggering an emotional reaction, and, if I had to guess, the appeal of pets is related to that response. As human beings, we seem to have a preformed slot for endearingly ignorant creatures, and it’s to that slot one must aim if one is to take advantage of the human tendency to educate without external incentive.
Virtual pets have been done before many times, in many formats, but they tend not to use any “real” intelligence, nor to provide any “real” utility; instead, they consist of simple rule-based systems in an entertainment-oriented environment, typically providing a set of games for users to play with their pets (example: NeoPets). My idea is something of an inversion of the process of gamification: rather than shoehorning game-derived techniques into non-game applications, I want to take a game environment and add aspects of practical utility to it. The eventual goal, after users have trained their pets sufficiently, is to encourage a symbiotic relationship between pets and their owners: that is, one in which both parties rely on each other to their mutual benefit.
One of the most fundamental decisions for such a system is the extent to which the “mind” of each symbiont is separate from the others. On the one hand, there’s an undeniable appeal to starting each new pet with a blank mental slate: a completely fresh start for the user and their pet. On the other hand, that approach will lead to a large amount of redundancy, and won’t allow users to benefit directly from the training supplied by other users. It may be that the answer is to allow pets to communicate with one another directly to share information, or it may be that the ideal form of the software is like that of the mythical Hydra: a single entity presenting a separate face to each user.
Over the past several months, I’ve come up with a few rules of thumb to guide my AI research. In no particular order:
Limbs, not wheels. This concept comes from a Straight Dope article: “Why has no animal species ever evolved wheels?” The presumed answer is that evolution requires continuity of function–that is, in order to evolve a new feature over the course of successive generations, each stage of that feature’s development must provide an increased advantage over the one before. Members of a lineage evolving legs can use half-limbs to propel themselves, but a half-wheel is useless. Mutations allow for sudden jumps of form, but the likelihood of forming something as complex as a wheel from a mutation is infinitesimal (unless that wheel was already present as a latent form in the genome). Similarly, our mental apparatus is not likely to contain complex features unless their partial implementation was present in and useful to our ancestors. Non-human animals are the most obvious source of inspiration to consider when attempting to trace human intelligence back to its more primitive forms in order to simplify the process of modeling it. Similarly, it’s useful to consider the development of human society (language, etc.) as a continuous process made possible by progressive mental advancement.
Time and space are fundamental. This is related to the concept of embodied cognition that I mentioned previously. For evolutionary reasons, human-like (or animal-like) thought is inextricably dependent on the perception of our spatial and temporal environments. We understand concepts as basic as containment (as of a member in a set), proximity, sequentiality, and causality in terms of association with our perceptions of and interactions with the physical world in time and space. A model of human consciousness would require equivalents to the mechanisms that allow us to understand time and space in order to relate to us in a meaningful fashion.
Unguided learning is preferable (if possible). This is mostly a practical concern. I don’t have access to a stable of graduate students to act as trainers, so I am particularly interested in learning processes that do not require manual control: for instance, creating associations by freely exploring an environment.
The Internet is the (an) environment. Never before has there existed such a vast body of information accessible freely and instantly to any connected program. It seems clear that any successful attempt to model intelligence would do well to make use of this resource to create associations equivalent to the ones that drive human consciousness. It makes sense to think of the Internet as an environment not unlike a physical one: programs “sense” and “act” through application protocols much as they would interact with objects and agents occupying a real or simulated space.
The crowd are the (some of the) trainers. The Internet makes instantly available not only a huge body of information, but also a huge number of potential human interactors. If they can be motivated to do so, they will act as guides to learning.
The trainers are part of the equation. The assumption that the model to be trained mimics the process of human cognition implies that the behavior of the human trainers can be also be understood in terms of the model. This is important when determining how to motivate the trainers and guide them into teaching the models correctly.
The (first) language of the machine may not be English. The idea of a “chat bot” that speaks intelligible English may be somewhat specious. While English synthesis is eventually desirable, there are many other format possibilities for input and output, such as images, sounds, or computer languages like HTML or Scheme. Some of these may prove more fruitful for experimentation, particularly in the early stages, since they have qualities different from English: simplicity, for instance, or amenability to gene-like “crossover” synthesis.
Dreaming, “free” association, and normal thought aren’t separate cases; they are variants of the same process. My suspicion is that normal thought can best be described as “constrained association,” and that dreaming is a more intense, closed-loop form of thought wherein the stream of associations assumes the quality of real, waking experience. I think that I’ve experienced a halfway point to this at the threshold of sleep, where thoughts seem especially vivid and automatic, and the outside world shrinks in significance, but a flicker of waking consciousness remains, along with the understanding that the experiences aren’t real.
Synthesis/Analysis Feedback Architecture Lab (SAFA Lab) is what I’m calling the software that I’m building to experiment with the process that I described in the previous post. As with Witgap, I’m writing it in C++ and making heavy use of Qt. Also, like Witgap, it uses a client/server model. In the case of SAFA Lab, this allows the cognitive process (the server) to run in a prolonged continuous fashion (most likely on one or more dedicated hosts, such as EC2 instances) with sensors, effectors, and administrative tools (the clients) being attached and detached as necessary.
Rather than having each server process run a single experiment, making instance management a matter of Linux daemon administration, the server will be able to host an arbitrary number of separate instances concurrently. Instances will be identified by name, and client applications will be able to enumerate, create, destroy, start, stop, archive, and fork instances, as well as modify instance parameters. I plan to supply an administrative client with a Qt GUI, likely also including basic input and output channels.
Each server process will distribute computation amongst a number of worker threads to take advantage of multiple processors, and in order to scale to multiple hosts, servers will communicate with one another using a fully connected peer network. In this model, which we used to good effect in Spiral Knights and which I am also using for Witgap, peers are largely interchangeable, making it easy to expand or shrink the network by adding or removing EC2 instances. The peer protocol that I developed for Witgap and plan to extend to this project uses Qt streaming along with a custom preprocessor (modeled after Qt’s Meta Object Compiler) that generates streaming code for classes based on header file annotations. It employs a remote procedure call mechanism (with callbacks) built upon Qt’s dynamic method invocation.
The core process that I want to explore with my AI project is the cognitive feedback loop wherein external stimuli and internally synthesized sensations are combined, analyzed, and used to update the mental state, which in turn governs the production of both external actions and new internal syntheses. For lack of a better term, I think of this as the “synthesis/analysis feedback loop.” My assumptions of the system are that the same apparatus is used to process internal and external stimuli, that the same mechanism is used to generate both imagined and enacted responses, that the granules into which input is decomposed become the quanta of mental state and the building blocks of synthesis, and that the fundamental process of cognition can be described as the continuous operation of such a system in response to changing stimuli as it attempts to optimize for maximum pleasure (as determined by innate drives and the associations built upon them). Note that these assumptions do not address how input is granulated (or perhaps regranulated in the case of synthesis input), how mental state is represented, how input affects mental state, or how output is synthesized. These are all areas for exploration.
However, with a high level outline of the process that I want to simulate, I can start to design a cognitive architecture that will realize the process and facilitate experimentation with the undetermined components. Aside from being flexible in terms of these components, my requirements for the architecture software are that it not be limited in terms of the type or level of granularity of the input it accepts, nor in terms of the type of output it can generate; that it support prolonged continuous operation and dynamic attachment and removal of input and output channels; that it allow simultaneous execution of multiple concurrent experiments; that it allow mental states to be archived and “forked” to pursue different paths of experimentation; that it scale efficiently to take advantage of available computing power; and that it be network-transparent in order, for example, to host input or output channels on a different machine than that hosting an ongoing experiment.
Back when I was at Santa Cruz, I took a class by David Cope in which he described an experimental chat bot of his that interpreted input by updating levels of association between words and phrases based on their relative proximity in each sentence, then generated output stochastically according to those levels. For example, entering the sentence “The cat is dead and orange” might create a strong association between the words “cat” and “dead” as well as a weaker one between “cat” and “orange.” Entering another sentence containing the word “cat” would then be very likely to provoke a response containing the word “dead,” with responses containing “orange” being slightly less likely. As forms of positive and negative reinforcement, the operator could react to each response with a simple “YES” or “NO” evaluation, strengthening or weakening the associations used to synthesize the response. Cope experimented with various weighting strategies, attempting to find language-independent rules that would lead to intelligible output. Over time and with extended training, he managed to get the system to produce what he considered interestingly human-like responses, though the database of associations would grow to the point of requiring minutes to generate each response.
That experiment was representative of a family of approaches that the AI community refers to as connectionism: the idea that complex cognitive models can be formed from networks of association between simple discrete nodes. Contrast that approach with a triplestore, whose representation of knowledge requires, in addition to the two connected nodes, a third element: the nature of the connection, as opposed to the strength. The most prominent form of connectionism is the neural network, which takes inspiration from the biological structure of the brain. However, the brain is prohibitively complex to simulate at the neuronal level, as it has billions of neurons, each with thousands of connections to others, all operating in parallel. Some of this complexity may be unnecessary for a machine intelligence. For example, a computer doesn’t need our visual apparatus to recognize a written word; we can simply provide the word as a direct input. Likewise, it may be that much of complexity of the human brain arises from the necessity of storing data at the neuronal level that could be represented more efficiently using standard machine encodings: ASCII, JPEG, MP3, etc. This suggests a connectionist approach at a higher level than neural networks, or a hybrid approach combining neural networks with symbolic representations.
The reason Cope’s experiment has stuck with me through the years is simply because it feels right. It seems natural that that kind of associative process would underly cognition in some form, given that the subjective experience of conscious thought often feels like traversing a graph of connected concepts. This is not a new idea; in fact, an early school of psychology known as associationism attempted to explain all manner of cognitive processes in terms of association. This approach fell out of favor as a sole means of explaining cognition, but it’s hard to deny its intuitive appeal. For instance, how do we understand the concept of one thing being inside another (for instance, an element being a member of a set)? An associationist might say that we link the word “inside” to our various memories of physical containment, and perhaps especially to some early “ur-inside” memory that we formed as children when we first came to understand the concept. Drawing on these memories, we somehow create templates to be applied in the synthesis of new ideas, allowing us effectively to imagine anything inside of anything else.
But how would a machine come to know the definition of “inside”? In order to create the formative associations allowing it to understand basic concepts in the way that we do, it would have to develop in an environment approaching the richness of the physical world, if not in the physical world itself. Even our abstract thought is deeply rooted in physical metaphors, such that it’s hard to imagine a machine attaining human-like intelligence without sharing at some level our physical experience. This observation drives the study of embodied cognition, which posits that human cognition is shaped by our physical form and interactions with the world. However, it seems likely that a virtual world could be substituted for the real one if it provided equivalent opportunities for perception and interaction. If so, what is the minimum form that such a world should take?