After a lot of thinking and Wikipedia research, I’ve realized that I was somewhat naive in my choice of test bed. Handling the real-valued inputs and outputs of the agents in the previous post poses a challenge beyond that of handling discrete values–for example, in storage, how does one group similar values in order to avoid creating a new mapping for every unique number, no matter how slight the difference between it and its neighbors? The easiest solution would be to use uniform quantization, though that would require us to know in advance the desired granularity. In researching alternatives that would remove that requirement, I learned about the general problem of clustering, and algorithms such as k-means clustering, which groups similar values together into a predetermined number (k) of clusters using a simple iterative process. An extension of this is fuzzy c-means clustering, which removes the hard boundaries between clusters in favor of granting each point a weighted degree of membership in each cluster. An agent that learns over time would require an online/sequential version of the algorithm. For example, consider a “fuzzy map” data structure with capacity k, requiring a distance function for keys and blend functions for keys and values. The first k values would be inserted unchanged, with retrieved values blended according to key distance. Subsequent insertions would adjust the key/value pairs according to their distance to the new key, the total weight of values already inserted, and perhaps an additional factor to allow more recent mappings to overcome the inertia of historical ones. An advantage of this approach is that it sets a bound on the amount of memory (and lookup time, insertion time, etc.) required for the mapping.
On considering how the agent must process values, it becomes clear that derived data–values obtained through transformations of the raw stream–are often more significant than the originals. For example, when a mouse sniffs around to locate the source of a scent, the important datum is whether the intensity of the smell increases or decreases with each movement: if the smell decreases when the mouse rotates left, then the source of the smell is to the right. This suggests providing the scent delta instead of or in addition to the absolute intensity. In general, the problem of augmenting a stream of raw values with layers of derived information suitable for input into a higher level cognitive algorithm is likely to require both simple transformations like differences and more advanced analyses such as the identification of repeated patterns, even when they occur at different scales or contain loops or gaps. This pattern recognition requirement is one of the most compelling reasons to think that the animal brain, with its robust pattern matching ability enabled by massive parallelism, may have a distinct advantage over any currently realizable artificial system–at least in terms of interacting with the real world, or any artificial environment that approaches its complexity. For instance, consider that the sparse coding believed to be employed by the visual cortex uses a large number of neurons to analyze small sections of the visual field in a manner that is robust to input noise, scaling, and translation.
Apart from simple transformations and pattern identification, it may be necessary to process input further in order to create mental models of context. That is, the brain’s representation of the current environment may not be created–or created entirely–by the principal cognitive process, but rather by a preprocessing phase that specializes in determining spatial relationships. The equivalent in simulation terms would be to provide the locations, shapes, movement vectors, etc., of objects in the environment (including the agent itself) instead of or in addition to the raw sensory inputs. Indeed, assuming that more information is always more useful than less (assuming effective means of determining relevance) and that an emergent process would have to derive such spatial information anyway, it would seem that providing the information explicitly is a no-brainer (so to speak). However, considering this and the many other layers of derived data with which we plan to augment the raw input, we rapidly encounter problems relating to the curse of dimensionality. For example, were we to use the k-nearest neighbors algorithm to find historical contexts similar to the current one in order to find outputs likely to result in reward, the proximity in relevant inputs would likely be drowned out by differences in irrelevant ones.