4.5 Globalist versus Localist Representations 83 SENSORY WORLD MODELING BEHAVIOR GENERATION PROCESSING . VALUE JUDGMENT ES MAPS — Section Task Entities, Events Labeled Features Command Attributes IMAGES Attributes Cost, Risk Labeled Regions Plants Sere Attributes Relationships status we SECTION simulator | PLANNER - 10 min horizon groups SP5 classification confirm grouping Plan pole alia EXECUTOR | Vehicle WM VEHICLE simulator | PLANNER grouping attention SP4 classification confirm grouping - 1 min horizon filter 500 m range compute attributes objects 4 m resolution i grouping EXECUTOR attention simulator_| PLANNER - 5s horizon Plan SP3 classification a IO. | as confirm grouping filter compute attributes grouping attention EXECUTOR WM PRIMITIVE simulator | PLANNER features SP2 classification ‘confirm grouping 500 ms ho fitter labeled é -] compute attnbutes. features m range Plan grouping 4 cm resolution attention EXECUTOR —s status om vehicle state sensor state SERVO coordinate . we PLANNER s labeled transformations simulator pixel attributes pixels 50 ms horizon actuator state } [ SP1 compute attributes, filter, classification EXECUTOR ladar stereoCCD = stereo FLIR color CCD radar navigational actuator signals signals signals signals signals signals signals SENSORS ACTUATOR Fig. 4.12: Albus’s perceptual, motor and modeling hierarchies 4.5.6 Joshua Blue Sam Adams and his colleagues at IBM have created a cognitive architecture called Joshua Blue [AABLO2], which has some significant similarities to CogPrime. Similar to our current research direction with CogPrime, Joshua Blue was created with loose emulation of child cognitive development in mind; and, also similar to CogPrime, it features a number of cognitive processes acting on a common neural-symbolic knowledge store. The specific cognitive processes involved in Joshua Blue and CogPrime are not particularly similar, however. At time of writing (2012) HOUSE_OVERSIGHT_012999
84 4 Brief Survey of Cognitive Architectures Joshua Blue is not under active development and has not been for some time; however, the project may be reanimated in future. Joshua Blue’s core knowledge representation is a semantic network of nodes connected by links along which activation spreads. Although many of the nodes have specific semantic refer- ents, as in a classical semantic net, the spread of activation through the network is designed to lead to the emergence of “assemblies” (which could also be thought of as dynamical attractors) in a manner more similar to an attractor neural network. A major difference from typical semantic or neural network models is the central role that affect plays in the system’s dynamics. The weights of the links in the knowledge base are adjusted dynamically based on the emotional context — a very direct way of ensuring that cognitive processes and mental representations are continuously influenced by affect. Qualitatively, this mimics the way that particular emotions in the human brain correlate with the dissemination throughout the brain of particular neurotransmitters, which then affect synaptic activity. A result of this architecture is that in Joshua Blue, emotion directs attention in a very direct way: affective weighting is important in determining which associated objects will become part of the focus of attention, or will be retained from memory. A notable similarity between CogPrime and Joshua Blue is that in both systems, nodes are assigned two quantitative attention values, one governing allocation of current system resources (mainly processor time; this is CogPrime’s ShortTermImportance) and one governing the long-term allocation of memory (CogPrime’s LongTermImportance). The concrete work done with Joshua Blue involved using it to control a simple agent in a sim- ulated world, with the goal that via human interaction, the agent would develop a complex and humanlike emotional and motivational structure from its simple in-built emotions and drives, and would then develop complex cognitive capabilities as part of this development process. 4.5.7 LIDA The LIDA architecture developed by Stan Franklin and his colleagues [BF 09] is based on the concept of the “cognitive cycle” - a notion that is important to nearly every BICA (Biologically Inspired Cognitive Architectures) and also to the brain, but that plays a particularly central role in LIDA. As Franklin says, "as a matter of principle, every autonomous agent, be it human, animal, or artificial, must frequently sample (sense) its environment, process (make sense of) this input, and select an appropriate response (action). The agent’s “life” can be viewed as consisting of a continual sequence of iterations of these cognitive cycles. Such cycles constitute the indivisible elements of attention, the least sensing and acting to which we can attend. A cognitive cycle can be thought of as a moment of cognition, a cognitive "moment"." 4.5.8 The Global Workspace LIDA is heavily based on the “global workspace” concept developed by Bernard Baars. As this concept is also directly relevant to CogPrime it is worth briefly describing here. In essence Baars’ Global Workspace Theory (GWT) is a particular hypothesis about how working memory works and the role it plays in the mind. Baars conceives working memory as the HOUSE_OVERSIGHT_013000
4.5 Globalist versus Localist Representations 85 “inner domain in which we can rehearse telephone numbers to ourselves or, more interestingly, in which we carry on the narrative of our lives. It is usually thought to include inner speech and visual imagery.” Baars uses the term “consciousness” to refer to the contents of working memory — a theoretical commitment that is not part of the CogPrime design. In this section we will use the term “consciousness” in Baars’ way, but not throughout the rest of the book. Baars conceives working memory and consciousness in terms of a “theater metaphor” — ac- cording to which, in the “theater of consciousness” a “spotlight of selective attention” shines a bright spot on stage. The bright spot reveals the global workspace — the contents of con- sciousness, which may be metaphorically considered as a group of actors moving in and out of consciousness, making speeches or interacting with each other. The unconscious is represented by the audience watching the play ... and there is also a role for the director (the mind’s ex- ecutive processes) behind the scenes, along with a variety of helpers like stage hands, script writers, scene designers, etc. GWT describes a fleeting memory with a duration of a few seconds. This is much shorter than the 10-30 seconds of classical working memory — according to GWT there is a very brief “cognitive cycle” in which the global workspace is refreshed, and the time period an item remains in working memory generally spans a large number of these elementary “refresh” actions. GWT contents are proposed to correspond to what we are conscious of, and are said to be broadcast to a multitude of unconscious cognitive brain processes. Unconscious processes, operating in parallel, can form coalitions which can act as input processes to the global workspace. Each unconscious process is viewed as relating to certain goals, and seeking to get involved with coalitions that will get enough importance to become part of the global workspace — because once they’re in the global workspace they’ll be allowed to broadcast out across the mind as a whole, which include broadcasting to the internal and external actuators that allow the mind to do things. Getting into the global workspace is a process’s best shot at achieving its goals. Obviously, the theater metaphor used to describe the GWT is evocative but limited; for instance, the unconscious in the mind does a lot more than the audience in a theater. The unconscious comes up with complex creative ideas sometimes, which feed into consciousness — almost as if the audience is also the scriptwriter. Baars’ theory, with its understanding of uncon- scious dynamics in terms of coalition-building, fails to describe the subtle dynamics occurring within the various forms of long-term memory, which result in subtle nonlinear interactions between long term memory and working memory. But nevertheless, GWT successfully models a number of characteristics of consciousness, including its role in handling novel situations, its limited capacity, its sequential nature, and its ability to trigger a vast range of unconscious brain processes. It is the framework on which LIDA’s theory of the cognitive cycle is built. 4.5.9 The LIDA Cognitive Cycle The simplest cognitive cycle is that of an animal, which senses the world, compares sensation to memory, and chooses an action, all in one fluid subjective moment. But the same cognitive cycle structure/process applies to higher-level cognitive processes as well. The LIDA architecture is based on the LIDA model of the cognitive cycle, which posits a particular structure underlying the cognitive cycle that possess the generality to encompass both simple and complex cognitive moments. HOUSE_OVERSIGHT_013001
86 4 Brief Survey of Cognitive Architectures The LIDA cognitive cycle itself is a theoretical construct that can be implemented in many ways, and indeed other BICAs like CogPrime and Psi also manifest the LIDA cognitive cycle in their dynamics, though utilizing different particular structures to do so. Figure 4.13 shows the cycle pictorially, starting in the upper left corner and proceeding clockwise. At the start of a cycle, the LIDA agent perceives its current situation and allocates attention differentially to various parts of it. It then broadcasts information about the most important parts (which constitute the agent’s consciousness), and this information gets features extracted from it, when then get passed along to episodic and semantic memory, that interact in the “global workspace” to create a model for the agent’s current situation. This model then, in interaction with procedural memory, enables the agent to choose an appropriate action and execute it - the critical “action-selection” phase! (Slip Net) 1, Fig. 4.13: The LIDA Cognitive Cycle The LIDA Cognitive Cycle in More Depth 2 We now run through the cognitive cycle in more detail. It begins with sensory stimuli from the agent’s external internal environment. Low-level feature detectors in sensory memory begin the process of making sense of the incoming stimuli. These low-level features are passed to perceptual memory where higher-level features, objects, categories, relations, actions, situations, ? This section paraphrases heavily from [Fra06] HOUSE_OVERSIGHT_013002
4.5 Globalist versus Localist Representations 87 etc. are recognized. These recognized entities, called percepts, are passed to the workspace, where a model of the agent’s current situation is assembled. Workspace structures serve as cues to the two forms of episodic memory, yielding both short and long term remembered local associations. In addition to the current percept, the workspace contains recent percepts that haven’t yet decayed away, and the agent’s model of the then- current situation previously assembled from them. The model of the agent’s current situation is updated from the previous model using the remaining percepts and associations. This updating process will typically require looking back to perceptual memory and even to sensory memory, to enable the understanding of relations and situations. This assembled new model constitutes the agent’s understanding of its current situation within its world. Via constructing the model, the agent has made sense of the incoming stimuli. Now attention allocation comes into play, because a real agent lacks the computational re- sources to work with all parts of its world-model with maximal mental focus. Portions of the model compete for attention. These competing portions take the form of (potentially overlap- ping) coalitions of structures comprising parts the model. Once one such coalition wins the competition, the agent has decided what to focus its attention on. And now comes the purpose of all this processing: to help the agent to decide what to do next. The winning coalition passes to the global workspace, the namesake of Global Workspace Theory, from which it is broadcast globally. Though the contents of this conscious broadcast are available globally, the primary recipient is procedural memory, which stores templates of possible actions including their context and possible results. Procedural memory also stores an activation value for each such template — a value that attempts to measure the likelihood of an action taken within its context producing the ex- pected result. It’s worth noting that LIDA makes a rather specific assumption here. LIDA’s “activation” values are like the probabilistic truth values of the implications in CogPrime’s Context \ Procedure > Goal triples. However, in CogPrime this probability is not the same as the ShortTermImportance “attention value” associated with the Implication link representing that implication. Here LIDA merges together two concepts that in CogPrime are separate. Templates whose contexts intersect sufficiently with the contents of the conscious broadcast instantiate copies of themselves with their variables specified to the current situation. These instantiations are passed to the action selection mechanism, which chooses a single action from these instantiations and those remaining from previous cycles. The chosen action then goes to sensorimotor memory, where it picks up the appropriate algorithm by which it is then executed. The action so taken affects the environment, and the cycle is complete. The LIDA model hypothesizes that all human cognitive processing is via a continuing iter- ation of such cognitive cycles. It acknowledges that other cognitive processes may also occur, refining and building on the knowledge used in the cognitive cycle (for instance, the cognitive cycle itself doesn’t mention abstract reasoning or creativity). But the idea is that these other processes occur in the context of the cognitive cycle, which is the main loop driving the internal and external activities of the organism. 4.5.9.1 Avoiding Combinatorial Explosion via Adaptive Attention Allocation LIDA avoids combinatorial explosions in its inference processes via two methods, both of which are also important in CogPrime : e combining reasoning via association with reasoning via deduction HOUSE_OVERSIGHT_013003
88 4 Brief Survey of Cognitive Architectures e foundational use of uncertainty in reasoning One can create an analogy between LIDA’s workspace structures and codelets and a logic- based architecture’s assertions and functions. However, LIDA’s codelets only operate on the structures that are active in the workspace during any given cycle. This includes recent percep- tions, their closest matches in other types of memory, and structures recently created by other codelets. The results with the highest estimate of success, i.e. activation, will then be selected. Uncertainty plays a role in LIDA’s reasoning in several ways, most notably through the base activation of its behavior codelets, which depend on the model’s estimated probability of the codelet’s success if triggered. LIDA observes the results of its behaviors and updates the base activation of the responsible codelets dynamically. We note that for this kind of uncertain inference/activation interplay to scale well, some level of cognitive synergy must be present; and based on our understanding of LIDA it is not clear to us whether the particular inference and association algorithms used in LIDA possess the requisite synergy. 4.5.9.2 LIDA versus CogPrime The LIDA cognitive cycle, broadly construed, exists in CogPrime as in other cognitive archi- tectures. To see how, it suffices to map the key LIDA structures into corresponding CogPrime structures, as is done in Table 4.1. Of course this table does not cover all CogPrime processes, as LIDA does not constitute a thorough explanation of CogPrime structure and dynamics. And in most cases the corresponding CogPrime and LIDA processes don’t work in exactly the same way; for instance, as noted above, LIDA’s action selection relies solely on LIDA’s “activation” values, whereas CogPrime’s action selection process is more complex, relying on aspects of CogPrime that lack LIDA analogues. 4.5.10 Psi and MicroPsi We have saved for last the architecture that has the most in common with CogPrime : Joscha Bach’s MicroPsi architecture, closely based on Dietrich Dorner’s Psi theory. CogPrime has borrowed substantially from Psi in its handling of emotion and motivation; but Psi also has other aspects that differ considerably from CogPrime. Here we will focus more heavily on the points of overlap, but will mention the key points of difference as well. The overall Psi cognitive architecture, which is centered on the Psi model of the motivational system, is roughly depicted in Figure 4.14. Psi’s motivational system begins with Demands, which are the basic factors that motivate the agent. For an animal these would include things like food, water, sex, novelty, socialization, protection of one’s children, and so forth. For an intelligent robot they might include things like electrical power, novelty, certainty, socialization, well-being of others and mental growth. Psi also specifies two fairly abstract demands and posits them as psychologically fundamental (see Figure 4.15): @ competence, the effectiveness of the agent at fulfilling its Urges e@ certainty, the confidence of the agent’s knowledge HOUSE_OVERSIGHT_013004
4.5 Globalist versus Localist Representations 89 LIDA CogPrime Declarative memory Atomspace attentional codelets Schema that adjust importance of Atoms explicitly coalitions maps global workspace attentional focus behavior codelets schema procedural memory (scheme net)|procedures in ProcedureRepository; and network of SchemaNodes in the Atomspace action selection (behavior net) propagation of STICurrency from goals to actions, and action selection process ransient episodic memory perceptual atoms entering AT with high STI, which rapidly decreases in most cases ocal workspaces bubbles of interlinked Atoms with moderate impor- ance, focused on by a subset of MindAgents (defined in Chapter 19 of Part 2) for a period of time perceptual associative memory |HebbianLinks in the AT sensory memory spaceserver /timeserver, plus auxiliary stores for other senses sensorimotor memory Atoms storing record of actions taken, linked in with Atoms indexed in sensory memory Table 4.1: CogPrime Analogues of Key LIDA Features Each demand is assumed to come with a certain “target level” or “target range” (and these may fluctuate over time, or may change as a system matures and develops). An Urge is said to develop when a demand deviates from its target range: the urge then seeks to return the demand to its target range. For instance, in an animal-like agent the demand related to food is more clearly described as “fullness,” and there is a target range indicating that the agent is neither too hungry nor too full of food. If the agent’s fullness deviates from this range, an Urge to return the demand to its target range arises. Similarly, if an agent’s novelty deviates from its target range, this means the agent’s life has gotten either too boring or too disconcertingly weird, and the agent gets an Urge for either more interesting activities (in the case of below-range novelty) or more familiar ones (in the case of above-range novelty). There is also a primitive notion of Pleasure (and its opposite, displeasure), which is consid- ered as different from the complex emotion of “happiness.” Pleasure is understood as associated with Urges: pleasure occurs when an Urge is (at least partially) satisfied, whereas displeasure occurs when an urge gets increasingly severe. The degree to which an Urge is satisfied is not necessarily defined instantaneously; it may be defined, for instance, as a time-decaying weighted average of the proximity of the demand to its target range over the recent past. So, for instance if an agent is bored and gets a lot of novel stimulation, then it experiences some pleasure. If it’s bored and then the monotony of its stimulation gets even more extreme, then it experiences some displeasure. Note that, according to this relatively simplistic approach, any decrease in the amount of dissatisfaction causes some pleasure; whereas if everything always continues within its accept- able range, there isn’t any pleasure. This may seem a little counterintuitive, but it’s important to understand that these simple definitions of “pleasure” and “displeasure” are not intended to fully capture the natural language concepts associated with those words. The natural language terms are used here simply as heuristics to convey the general character of the processes in- HOUSE_OVERSIGHT_013005
90 4 Brief Survey of Cognitive Architectures Protocol and Situation Memory > : Action A —_ (aust + active Urges (Drives) Fig. 4.14: High-Level Architecture of the Psi Model volved. These are very low level processes whose analogues in human experience are largely below the conscious level. A Goal is considered as a statement that the system may strive to make true at some future time. A Motive is an (urge, goal) pair, consisting of a goal whose satisfaction is predicted to imply the satisfaction of some urge. In fact one may consider Urges as top-level goals, and the agent’s other goals as their subgoals. In Psi an agent has one “ruling motive” at any point in time, but this seems an oversimpli- fication more applicable to simple animals than to human-like or other advanced AI systems. In general one may think of different motives having different weights indicating the amount of resources that will be spent on pursuing them. Emotions in Psi are considered as complex systemic response-patterns rather than explicitly constructed entities. An emotion is the set of mental entities activated in response to a certain set of urges. Dorner conceived theories about how various common emotions emerge from the dynamics of urges and motives as described in the Psi model. “Intentions” are also considered as composite entities: an intention at a given point in time consists of the active motives, together with their related goals, behavior programs and so forth. HOUSE_OVERSIGHT_013006
4.5 Globalist versus Localist Representations 91 The basic logic of action in Psi is carried out by “triples” that are very similar to CogPrime’s Context \ Procedure > Goal triples. However, an important role is played by four modulators that control how the processes of perception, cognition and action selection are regulated at a given time: e activation, which determines the degree to which the agent is focused on rapid, intensive activity versus reflective, cognitive activity e resolution level, which determines how accurately the system tries to perceive the world e certainty, which determines how hard the system tries to achieve definite, certain knowledge e selection threshold, which determines how willing the system is to change its choice of which goals to focus on These modulators characterize the system’s emotional and cognitive state at a very abstract level; they are not emotions per se, but they have a large effect on the agent’s emotions. Their intended interaction is depicted in Figure 4.15. Specific Securing Acquisition of Flight : : es : 9 Exploration Behavior Efficiency Signals Unspecific Sympathicus Syndrome/ further Behavior Modulation Urge for Urge for Competence Uncertainty Reduction [cme ) Certainty Uncartainty Efficiency Inefficiency Signals Signals from other Signals Signals (Confirmation (Disconfirmation Motivators of Expectations) of Expectations) Fig. 4.15: Primary Interrelationships Between Psi Modulators 4.5.11 The Emergence of Emotion in the Psi Model We now briefly review the specifics of how Psi models the emergence of emotion. The basic idea is to define a small set of proto-emotional dimensions in terms of basic Urges and modulators. Then, emotions are identified with regions in the space spanned by these dimensions. The simplest approach uses a six-dimensional continuous space: 1. pleasure HOUSE_OVERSIGHT_013007
92 4 Brief Survey of Cognitive Architectures arousal resolution level selection threshold (i.e. degree of dominance of the leading motive) level of background checks (the rate of the securing behavior) level of goal-directed behavior Figure 4.16 shows how the latter 5 of these dimensions are derived from underlying urges and modulators. Note that these dimensions are not orthogonal; for instance resolution is mainly in- versely related to arousal. Additional dimensions are also discussed, for instance it is postulated that to deal with social emotions one may wish to introduce two more demands corresponding to inner and outer obedience to social norms, and then define dimensions in terms of these. -| Resolution = Level Importance: leading Motive )_ } leading Motive all Motives \ / all Motives Goal a Directedness Competence: specific to current task | 7 general Fig. 4.16: Five Proto-Emotional Dimensions Implicit in the Psi Model Specific emotions are then characterized in terms of these dimensions. According to [Bac09], for instance, “Anger ... is characterized by high arousal, low resolution, strong motive dominance, few background checks and strong goal-orientedness; sadness by low arousal, high resolution, strong dominance, few background-checks and low goal-orientedness.” I’m a bit skeptical of the contention that these dimensions fully characterize the relevant emotions. Anger for instance seems to have some particular characteristics not implied by the above list of dimensional values. The list of dimensional values associated with anger doesn’t tell us that an angry person is more likely to punch someone than to bounce up and down, for example. However, it does seem that the dimensional values associated with an emotion are HOUSE_OVERSIGHT_013008
4.5 Globalist versus Localist Representations 93 informative about the emotion, so that positioning an emotion on the given dimensions tells one a. lot. 4.5.12 Knowledge Representation, Action Selection and Planning in Psi In addition to the basic motivation/emotion architecture of Psi, which has been adopted (with some minor changes) for use in CogPrime, Psi has a number of other aspects that are somewhat different from their CogPrime analogues. First of all, on the micro level, Psi represents knowledge using structures called “quads.” Each quad is a cluster of 5 neurons containing a core neuron, and four other neurons representing before/after and part-of/has-part relationships in regard to that core neuron. Quads are natu- rally assembled into spatiotemporal hierarchies, though they are not required to form part of such a structure. Psi stores knowledge using quads arranged in three networks, which are conceptually similar to the networks in Albus’s 4D/RCS and Arel’s DeSTIN architectures: e A sensory network, which stores declarative knowledge: schemas representing images, ob- jects, events and situations as hierarchical structures. e A motor network, which contains procedural knowledge by way of hierarchical behavior programs e A motivational network handling demands Perception in Psi, which is centered in the sensory network, follows principles similar to DeSTIN (which are shared also by other systems), for instance the principle of perception as prediction. Psi’s “HyPercept” mechanism performs hypothesis-based perception: it attempts to predict what is there to be perceived and then attempts to verify these predictions using sen- sation and memory. Furthermore HyPercept is intimately coupled with actions in the external world, according to the concept of “Neisser’s perceptual cycle,” the cycle between exploration and representation of reality. Perceptually acquired information is translated into schemas ca- pable of guiding behaviors, and these are enacted (sometimes affecting the world in significant ways) and in the process used to guide further perception. Imaginary perceptions are handled via a “mental stage” analogous to CogPrime’s internal simulation world. Action selection in Psi works based on what are called “triplets,” each of which consists of e asensor schema (pre-conditions, “condition schema”; like CogPrime’s “context”) e a subsequent motor schema (action, effector; like CogPrime’s “procedure”) e a final sensor schema (post-conditions, expectations; like an CogPrime predicate or goal) What distinguishes these triplets from classic production rules as used in (say) Soar and ACT-R is that the triplets may be partial (some of the three elements may be missing) and may be uncertain. However, there seems no fundamental difference between these triplets and CogPrime’s concept /procedure/goal triplets, at a high level; the difference lies in the underlying knowledge representation used for the schemata, and the probabilistic logic used to represent the implication. The work of figuring out what schema to execute to achieve the chosen goal in the current context is done in Psi using a combination of processes called the “Rasmussen ladder” (named HOUSE_OVERSIGHT_013009
94 4 Brief Survey of Cognitive Architectures after Danish psychologist Jens Rasmussen). The Rasmussen ladder describes the organization of action as a movement between the stages of skill-based behavior, rule-based behavior and knowledge-based behavior, as follows: e If a given task amounts to a trained routine, an automatism or skill is activated; it can usually be executed without conscious attention and deliberative control. e If there is no automatism available, a course of action might be derived from rules; before a known set of strategies can be applied, the situation has to be analyzed and the strategies have to be adapted. e In those cases where the known strategies are not applicable, a way of combining the available manipulations (operators) into reaching a given goal has to be explored at first. This stage usually requires a recomposition of behaviors, that is, a planning process. The planning algorithm used in the Psi and MicroPsi implementations is a fairly simple hill-climbing planner. While it’s hypothesized that a more complex planner may be needed for advanced intelligence, part of the Psi theory is the hypothesis that most real-life planning an organism needs to do is fairly simple, once the organism has the right perceptual representations and goals. 4.5.13 Psi versus CogPrime On a high level, the similarities between Psi and CogPrime are quite strong: e interlinked declarative, procedural and intentional knowledge structures, represented using neural-symbolic methods (though, the knowledge structures have somewhat different high- level structures and low-level representational mechanisms in the two systems) @ perception via prediction and perception/action integration e action selection via triplets that resemble uncertain, potentially partial production rules e similar motivation/emotion framework, since CogPrime incorporates a variant of Psi for this On the nitty-gritty level there are many differences between the systems, but on the big- picture level the main difference lies in the way the cognitive synergy principle is pursued in the two different approaches. Psi and MicroPsi rely on very simple learning algorithms that are closely tied to the “quad” neurosymbolic knowledge representation, and hence interoperate in a fairly natural way without need for subtle methods of “synergy engineering.” CogPrime uses much more diverse and sophisticated learning algorithms which thus require more sophisticated methods of interoperation in order to achieve cognitive synergy. HOUSE_OVERSIGHT_013010
Chapter 5 A Generic Architecture of Human-Like Cognition 5.1 Introduction When writing the first draft of this book, some years ago, we had the idea to explain CogPrime by aligning its various structures and processes with the ones in the "standard architecture diagram" of the human mind. After a bit of investigation, though, we gradually came to the realization that no such thing existed. There was no standard flowchart or other sort of di- agram explaining the modern consensus on how human thought works. Many such diagrams existed, but each one seemed to represent some particular focus or theory, rather than an overall integrative understanding. Since there are multiple opinions regarding nearly every aspect of human intelligence, it would be difficult to get two cognitive scientists to fully agree on every aspect of an overall human cognitive architecture diagram. Prior attempts to outline detailed mind architectures have tended to follow highly specific theories of intelligence, and hence have attracted only moderate interest from researchers not adhering to those theories. An example is Minsky’s work presented in The Emotion Machine [Min07], which arguably does constitute an architecture diagram for the human mind, but which is only loosely grounded in current empirical knowledge and stands more as a representation of Minsky’s own intuitive understanding. But nevertheless, it seemed to us that a reasonable attempt at an integrative, relatively theory-neutral "human cognitive architecture diagram" would be better than nothing. So nat- urally, we took it on ourselves to create such a diagram. This chapter is the result — it draws on the thinking of a number of cognitive science and AGI researchers, integrating their perspectives in a coherent, overall architecture diagram for human, and human-like, general intelligence. The specific architecture diagram of CogPrime, given in Chapter 6 below, may then be understood as a particular instantiation of this generic architecture diagram of human-like cognition. There is no getting around the fact that, to a certain extent, the diagram presented here reflects our particular understanding of how the mind works. However, it was intentionally constructed with the goal of not being just an abstracted version of the CogPrime architecture diagram! It does not reflect our own idiosyncratic understanding of human intelligence, as much as a combination of understandings previously presented by multiple researchers (including ourselves), arranged according to our own taste in a manner we find conceptually coherent. With this in mind, we call it the "Integrative Human-Like Cognitive Architecture Diagram," or for short "the integrative diagram." We have made an effort to ensure that as many pieces of the integrative diagram as possible are well grounded in psychological and even neuroscientific 95 HOUSE_OVERSIGHT_013011
96 5 A Generic Architecture of Human-Like Cognition data, rather than mainly embodying speculative notions; however, given the current state of knowledge, this could not be done to a complete extent, and there is still some speculation involved here and there. While based on understandings of human intelligence, the integrative diagram is intended to serve as an architectural outline for human-like general intelligence more broadly. For example, CogPrime is explicitly not intended as a precise emulation of human intelligence, and does many things quite differently than the human mind, yet can still fairly straightforwardly be mapped into the integrative diagram. The integrative diagram focuses on structure, but this should not be taken to represent a valuation of structure over dynamics in our approach to intelligence. Following chapters treat various dynamical phenomena in depth. 5.2 Key Ingredients of the Integrative Human-Like Cognitive Architecture Diagram The main ingredients we’ve used in assembling the integrative diagram are as follows: e Our own views on the various types of memory critical for human-like cognition, and the need for tight, "synergetic" interactions between the cognitive processes focused on these e Aaron Sloman’s high-level architecture diagram of human intelligence [Slo01], drawn from his CogAff architecture, which strikes me as a particularly clear embodiment of "modern common sense" regarding the overall architecture of the human mind. We have added only a couple items to Sloman’s high-level diagram, which we felt deserved an explicit high-level role that he did not give them: emotion, language and reinforcement. e The LIDA architecture diagram presented by Stan Franklin and Bernard Baars [BF 09]. We think LIDA is an excellent model of working memory and what Sloman calls "reactive processes", with well-researched grounding in the psychology and neuroscience literature. We have adapted the LIDA diagram only very slightly for use here, changing some of the terminology on the arrows, and indicating where parts of the LIDA diagram indicate processes elaborated in more detail elsewhere in the integrative diagram. e The architecture diagram of the Psi model of motivated cognition, presented by Joscha Bach in [Bac09] based on prior work by Dietrich Dorner [Dér02]. This diagram is presented without significant modification; however it should be noted that Bach and Dorner present this diagram in the context of larger and richer cognitive models, the other aspects of which are not all incorporated in the integrative diagram. e James Albus’s three-hierarchy model of intelligence [AMO1], involving coupled perception, action and reinforcement hierarchies. Albus’s model, utilized in the creation of intelligent unmanned automated vehicles, is a crisp embodiment of many ideas emergent from the field of intelligent control systems. e Deep learning networks as a model of perception (and action and reinforcement learning), as embodied for example in the work of Itamar Arel [ARC09] and Jeff Hawkins [HB06]. The integrative diagram adopts this as the basic model of the perception and action subsystems of human intelligence. Language understanding and generation are also modeled according to this paradigm. HOUSE_OVERSIGHT_013012
5.3 An Architecture Diagram for Human-Like General Intelligence 97 One possible negative reaction to the integrative diagram might be to say that it’s a kind of Frankenstein monster diagram, piecing together aspects of different theories in a way that violates the theoretical notions underlying all of them! For example, the integrative diagram takes LIDA as a model of working memory and reactive processing, but from the papers on LIDA it’s unclear whether the creators of LIDA construe it more broadly than that. The deep learning community tends to believe that the architecture of current deep learning networks, in itself, is close to sufficient for human-level general intelligence — whereas the integrative diagram appropriates the ideas from this community mainly for handling perception, action and language, etc. On the other hand, in a more positive perspective, one could view the integrative diagram as consistent with LIDA, but merely providing much more detail on some of the boxes in the LIDA diagram (e.g. dealing with perception and long-term memory). And one could view the integrative diagram as consistent with the deep learning paradigm — via viewing it, not as a description of components to be explicitly implemented in an AGI system, but rather as a description of the key structures and processes that must emerge in deep learning network, based on its engagement with the world, in order for it to achieve human-like general intelligence. Our own view, underlying the creation of the integrative diagram, is that different commu- nities of cognitive science researchers have focused on different aspects of intelligence, and have thus each created models that are more fully fleshed out in some aspects than others. But these various models all link together fairly cleanly, which is not surprising as they are all grounded in the same data regarding human intelligence. Many judgment calls must be made in fusing multiple models in the way that the integrative diagram does, but we feel these can be made without violating the spirit of the component models. In assembling the integrative diagram, we have made these judgment calls as best we can, but we’re well aware that different judgments would also be feasible and defensible. Revisions are likely as time goes on, not only due to new data about human intelligence but also to evolution of understanding regarding the best approach to model integration. Another possible argument against the ideas presented here is that there’s nothing new — all the ingredients presented have been given before elsewhere. To this our retort is to quote Pascal: "Let no one say that I have said nothing new ... the arrangement of the subject is new." The various architecture diagrams incorporated into the integrative diagram are either extremely high level (Sloman’s diagram) or focus primarily on one aspect of intelligence, treating the others very concisely by summarizing large networks of distinction structures and processes in small boxes. The integrative diagram seeks to cover all aspects of human-like intelligence at a roughly equal granularity — a different arrangement. This kind of high-level diagramming exercise is not precise enough, nor dynamics-focused enough, to serve as a guide for creating human-level or more advanced AGI. But it can be a useful tool for explaining and interpreting a concrete AGI design, such as CogPrime. 5.3 An Architecture Diagram for Human-Like General Intelligence The integrative diagram is presented here in a series of seven Figures. Figure 5.1 gives a high-level breakdown into components, based on Sloman’s high-level cognitive-architectural sketch [Slo01]. This diagram represents, roughly speaking, "modern com- mon sense" about how a human-like mind is architected. The separation between structures HOUSE_OVERSIGHT_013013
98 5 A Generic Architecture of Human-Like Cognition HIGH LEVEL MIND ARCHITECTURE META COGNITIVE J Ff —*_ PROCESSES } » SELF/SOCIAL ~ L =-_A ll : | —_ 4 P E R if ee — : A ¢ | ¥ : | ’ C = : DELIBERATIVE < > T — nnn LONG TERM ‘ PROCESSES ; MEMORY I L @ ¥ oO I v"% ry a ‘ N N S , ; emotion sf - LANGUAGE F * MOTIVATION/ Af U U ' Y ACTION SELECTION B B Y s| 8 oT =) WORKING S ¥ “processes [é——> MEMORY T Ss m~ E i M M |e S = REINFORCEMENT fe |__ ee > 2 > | ENVIRONMENT Fig. 5.1: High-Level Architecture of a Human-Like Mind and processes, embodied in having separate boxes for Working Memory vs. Reactive Processes, and for Long Term Memory vs. Deliberative Processes, could be viewed as somewhat artificial, since in the human brain and most AGI architectures, memory and processing are closely inte- grated. However, the tradition in cognitive psychology is to separate out Working Memory and Long Term Memory from the cognitive processes acting thereupon, so we have adhered to that convention. The other changes from Sloman’s diagram are the explicit inclusion of language, representing the hypothesis that language processing is handled in a somewhat special way in the human brain; and the inclusion of a reinforcement component parallel to the perception and action hierarchies, as inspired by intelligent control systems theory (e.g. Albus as mentioned above) and deep learning theory. Of course Sloman’s high level diagram in its original form is intended as inclusive of language and reinforcement, but we felt it made sense to give them more emphasis. Figure 5.2, modeling working memory and reactive processing, is essentially the LIDA di- agram as given in prior papers by Stan Franklin, Bernard Baars and colleagues [BF 09]. The boxes in the upper left corner of the LIDA diagram pertain to sensory and motor processing, which LIDA does not handle in detail, and which are modeled more carefully by deep learning theory. The bottom left corner box refers to action selection, which in the integrative diagram is modeled in more detail by Psi. The top right corner box refers to Long-Term Memory, which the integrative diagram models in more detail as a synergetic multi-memory system (Figure 5.4), The original LIDA diagram refers to various "codelets", a key concept in LIDA theory. We have replaced "attention codelets" here with "attention flow", a more generic term. We suggest one can think of an attention codelet as: a piece of information stating that, for a certain group of items, it’s currently pertinent to pay attention to this group as a collective. HOUSE_OVERSIGHT_013014
5.3 An Architecture Diagram for Human-Like General Intelligence 99 PERCEPTION/ ACTION SUBSYSTEMS Fig. 5.2: Architecture of Working Memory and Reactive Processing, closely modeled on the LIDA architecture Figure 5.3, modeling motivation and action selection, is a lightly modified version of the Psi diagram from Joscha Bach’s book Principles of Synthetic Intelligence [Bac09|. The main difference from Psi is that in the integrative diagram the Psi motivated action framework is embedded in a larger, more complex cognitive model. Psi comes with its own theory of working and long-term memory, which is related to but different from the one given in the integrative diagram — it views the multiple memory types distinguished in the integrative diagram as emergent from a common memory substrate. Psi comes with its own theory of perception and action, which seems broadly consistent with the deep learning approach incorporated in the integrative diagram. Psi’s handling of working memory lacks the detailed, explicit workflow of LIDA, though it seems broadly conceptually consistent with LIDA. In Figure 5.3, the box labeled "Other portions of working memory" is labeled "Protocol and situation memory" in the original Psi diagram. The Perception, Action Execution and Action Selection boxes have fairly similar semantics to the similarly labeled boxes in the LIDA-like Figure 5.2, so that these diagrams may be viewed as overlapping. The LIDA model doesn’t explain action selection and planning in as much detail as Psi, so the Psi-like Figure 5.3 could be viewed as an elaboration of the action-selection portion of the LIDA-like Figure 5.2. In Psi, reinforcement is considered as part of the learning process involved in action selection and planning; in Figure 5.3 an explicit "reinforcement box" has been added to the original Psi diagram, to emphasize this. Figure 5.4, modeling long-term memory and deliberative processing, is derived from our own prior work studying the "cognitive synergy" between different cognitive processes associated with different types of memory. The division into types of memory is fairly standard. Declarative, procedural, episodic and sensorimotor memory are routinely distinguished; we like to distinguish attentional memory and intentional (goal) memory as well, and view these as the interface between long-term memory and the mind’s global control systems. One focus of our AGI design work has been on designing learning algorithms, corresponding to these various types of memory, HOUSE_OVERSIGHT_013015
100 5 A Generic Architecture of Human-Like Cognition other portions of working memory Action —>| Perception execution [> . seam Il -4 active motive \ reinforcement Fig. 5.3: Architecture of Motivated Action CONCEPT }—— FORMATION | HEBBIAN LEARNING | Il | REINFORCEMENT ; \ "REASONING | fpecimearve| arrentionaL -— UEARHINE | | | re SENSOR! __|proceoure | [rroceourat |) SENSOF | LEARNING | 4 j Neree | / STORY-TELLING | PATTERN MINING PLAN LEARNING & OPTIMIZATION ALL THE DELIBERATIVE PROCESSES [ ARE REGULATED BY EMOTION, ATTENTION SIMULATION 4 WORKING MEMORY Fig. 5.4: Architecture of Long-Term Memory and Deliberative and Metacognitive Thinking that interact with each other in a synergetic way [Goe09c], helping each other to overcome their intrinsic combinatorial explosions. There is significant evidence that these various types of long-term memory are differently implemented in the brain, but the degree of structure and dynamical commonality underlying these different implementations remains unclear. HOUSE_OVERSIGHT_013016
5.3 An Architecture Diagram for Human-Like General Intelligence 101 Each of these long-term memory types has its analogue in working memory as well. In some cognitive models, the working memory and long-term memory versions of a memory type and corresponding cognitive processes, are basically the same thing. CogPrime is mostly like this — it implements working memory as a subset of long-term memory consisting of items with particularly high importance values. The distinctive nature of working memory is enforced via using slightly different dynamical equations to update the importance values of items with importance above a certain threshold. On the other hand, many cognitive models treat working and long term memory as more distinct than this, and there is evidence for significant functional and anatomical distinctness in the brain in some cases. So for the purpose of the integrative diagram, it seemed best to leave working and long-term memory subcomponents as parallel but distinguished. Figure 5.4 also encompasses metacognition, under the hypothesis that in human beings and human-like minds, metacognitive thinking is carried out using basically the same processes as plain ordinary deliberative thinking, perhaps with various tweaks optimizing them for thinking about thinking. If it turns out that humans have, say, a special kind of reasoning faculty exclusively for metacognition, then the diagram would need to be modified. Modeling of self and others is understood to occur via a combination of metacognition and deliberative thinking, as well as via implicit adaptation based on reactive processing. PERCEPTUAL SUBSYSTEMS MORE ABSTRACT ASPECTS OF SENSORIMOTOR MEMORY ary SOMATO- | OLFACTION SENSORY VISION AUDITION HIERARCHY HIERARCHY ACTION HIERARCHY Fig. 5.5: Architecture for Multimodal Perception Figure 5.5 models perception, according to the basic ideas of deep learning theory. Vision and audition are modeled as deep learning hierarchies, with bottom-up and top-down dynamics. The lower layers in each hierarchy refer to more localized patterns recognized in, and abstracted from, sensory data. Output from these hierarchies to the rest of the mind is not just through the top layers, but via some sort of sampling from various layers, with a bias toward the top layers. The different hierarchies cross-connect, and are hence to an extent dynamically coupled together. It is also recognized that there are some sensory modalities that aren’t strongly hierarchical, e.g HOUSE_OVERSIGHT_013017
102 5 A Generic Architecture of Human-Like Cognition touch and smell (the latter being better modeled as something like an asymmetric Hopfield net, prone to frequent chaotic dynamics [LLW~05]) — these may also cross-connect with each other and with the more hierarchical perceptual subnetworks. Of course the suggested architecture could inclide any number of sensory modalities; the diagram is restricted to four just for simplicity. The self-organized patterns in the upper layers of perceptual hierarchies may become quite complex and may develop advanced cognitive capabilities like episodic memory, reasoning, lan- guage learning, etc. A pure deep learning approach to intelligence argues that all the aspects of intelligence emerge from this kind of dynamics (among perceptual, action and reinforcement hierarchies). Our own view is that the heterogeneity of human brain architecture argues against this perspective, and that deep learning systems are probably better as models of perception and action than of general cognition. However, the integrative diagram is not committed to our perspective on this — a deep-learning theorist could accept the integrative diagram, but argue that all the other portions besides the perceptual, action and reinforcement hierarchies should be viewed as descriptions of phenomena that emerge in these hierarchies due to their interaction. ACTION AND REINFORCEMENT SUBSYSTEM SENSORY-MOTOR MEMORY 3 T MOTIVATION, HIGHER LEVEL - ACTION SELECTION MOTOR PLANNING —— | ——1 |} {| HIERARCHY \ a HIERARCHY / REINFORCEM EN T HIERARCHY HIERARCHY Fig. 5.6: Architecture for Action and Reinforcement Figure 5.6 shows an action subsystem and a reinforcement subsystem, parallel to the per- ception subsystem. Two action hierarchies, one for an arm and one for a leg, are shown for HOUSE_OVERSIGHT_013018
5.3 An Architecture Diagram for Human-Like General Intelligence 103 concreteness, but of course the architecture is intended to be extended more broadly. In the hierarchy corresponding to an arm, for example, the lowest level would contain control patterns corresponding to individual joints, the next level up to groupings of joints (like fingers), the next level up to larger parts of the arm (hand, elbow). The different hierarchies corresponding to different body parts cross-link, enabling coordination among body parts; and they also con- nect at multiple levels to perception hierarchies, enabling sensorimotor coordination. Finally there is a module for motor planning, which links tightly with all the motor hierarchies, and also overlaps with the more cognitive, inferential planning activities of the mind, in a manner that is modeled different ways by different theorists. Albus [AMO1] has elaborated this kind of hierarchy quite elaborately. The reward hierarchy in Figure 5.6 provides reinforcement to actions at various levels on the hierarchy, and includes dynamics for propagating information about reinforcement up and down the hierarchy. LANGUAGE SUBSYSTEM PERCEPTUAL HIERARCHY : ) | MOTOR HIERARCHY Fig. 5.7: Architecture for Language Processing Figure 5.7 deals with language, treating it as a special case of coupled perception and action. The traditional architecture of a computational language comprehension system is a pipeline [JMO09] [Goel0d], which is equivalent to a hierarchy with the lowest-level linguistic features (e.g. sounds, words) at the bottom, and the highest level features (semantic abstractions) at the top, and syntactic features in the middle. Feedback connections enable semantic and cognitive mod- ulation of lower-level linguistic processing. Similarly, language generation is commonly modeled hierarchically, with the top levels being the ideas needing verbalization, and the bottom level corresponding to the actual sentence produced. In generation the primary flow is top-down, with bottom-up flow providing modulation of abstract concepts by linguistic surface forms. So, that’s it — an integrative architecture diagram for human-like general intelligence, split among seven different pictures, formed by judiciously merging together architecture diagrams produced via a number of cognitive theorists with different, overlapping foci and research paradigms. Is anything critical left out of the diagram? A quick perusal of the table of contents of cognitive psychology textbooks suggests to me that if anything major is left out, it’s also unknown to current cognitive psychology. However, one could certainly make an argument for explicit inclusion of certain other aspects of intelligence, that in the integrative diagram are HOUSE_OVERSIGHT_013019
104 5 A Generic Architecture of Human-Like Cognition left as implicit emergent phenomena. For instance, creativity is obviously very important to intelligence, but, there is no "creativity" box in any of these diagrams — because in our view, and the view of the cognitive theorists whose work we’ve directly drawn on here, creativity is best viewed as a process emergent from other processes that are explicitly included in the diagrams. 5.4 Interpretation and Application of the Integrative Diagram A tongue-partly-in-cheek definition of a biological pathway is "a subnetwork of a biological network, that fits on a single journal page." Cognitive architecture diagrams have a similar property — they are crude abstractions of complex structures and dynamics, sculpted in ac- cordance with the size of the printed page, and the tolerance of the human eye for absorbing diagrams, and the tolerance of the human author for making diagrams. However, sometimes constraints — even arbitrary ones — are useful for guiding creative ef- forts, due to the fact that they force choices. Creating an architecture for human-like general intelligence that fits in a few (okay, seven) fairly compact diagrams, requires one to make many choices about what features and relationships are most essential. In constructing the integrative diagram, we have sought to make these choices, not purely according to our own tastes in cog- nitive theory or AGI system design, but according to a sort of blend of the taste and judgment of a number of scientists whose views we respect, and who seem to have fairly compatible, complementary perspectives. What is the use of a cognitive architecture diagram like this? It can help to give newcomers to the field a basic idea about what is known and suspected about the nature of human-like general intelligence. Also, it could potentially be used as a tool for cross-correlating different AGI architectures. If everyone who authored an AGI architecture would explain how their archi- tecture accounts for each of the structures and processes identified in the integrative diagram, this would give a means of relating the various AGI designs to each other. The integrative diagram could also be used to help connect AGI and cognitive psychology to neuroscience in a more systematic way. In the case of LIDA, a fairly careful correspondence has been drawn up between the LIDA diagram nodes and links and various neural structures and processes [F B08]. Similar knowledge exists for the rest of the integrative diagram, though not organized in such a systematic fashion. A systematic curation of links between the nodes and links in the integrative diagram and current neuroscience knowledge, would constitute an interesting first approximation of the holistic cognitive behavior of the human brain. Finally (and harking forward to later chapters), the big omission in the integrative diagram is dynamics. Structure alone will only get you so far, and you could build an AGI system with reasonable-looking things in each of the integrative diagram’s boxes, interrelating according to the given arrows, and yet still fail to make a viable AGI system. Given the limitations the real world places on computing resources, it’s not enough to have adequate representations and algorithms in all the boxes, communicating together properly and capable doing the right things given sufficient resources. Rather, one needs to have all the boxes filled in properly with structures and processes that, when they act together using feasible computing resources, will yield appropriately intelligent behaviors via their cooperative activity. And this has to do with the complex interactive dynamics of all the processes in all the different boxes — which is HOUSE_OVERSIGHT_013020
5.4 Interpretation and Application of the Integrative Diagram 105 something the integrative diagram doesn’t touch at all. This brings us again to the network of ideas we’ve discussed under the name of "cognitive synergy," to be discussed later on. It might be possible to make something similar to the integrative diagram on the level of dynamics rather than structures, complementing the structural integrative diagram given here; but this would seem significantly more challenging, because we lack a standard set of tools for depicting system dynamics. Most cognitive theorists and AGI architects describe their structural ideas using boxes-and-lines diagrams of some sort, but there is no standard method for depicting complex system dynamics. So to make a dynamical analogue to the integrative diagram, via a similar integrative methodology, one would first need to create appropriate diagrammatic formalizations of the dynamics of the various cognitive theories being integrated — a fascinating but onerous task. When we first set out to make an integrated cognitive architecture diagram, via combining the complementary insights of various cognitive science and AGI theorists, we weren’t sure how well it would work. But now we feel the experiment was generally a success — the resultant integrated architecture seems sensible and coherent, and reasonably complete. It doesn’t come close to telling you everything you need to know to understand or implement a human-like mind — but it tells you the various processes and structures you need to deal with, and which of their interrelations are most critical. And, perhaps just as importantly, it gives a concrete way of understanding the insights of a specific but fairly diverse set of cognitive science and AGI theorists as complementary rather than contradictory. In a CogPrime context, it provides a way of tying in the specific structures and dynamics involved in CogPrime, with a more generic portrayal of the structures and dynamics of human-like intelligence. HOUSE_OVERSIGHT_013021
HOUSE_OVERSIGHT_013022
Chapter 6 A Brief Overview of CogPrime 6.1 Introduction Just as there are many different approaches to human flight — airplanes, helicopters, balloons, spacecraft, and doubtless many methods no person has thought of yet — similarly, there are likely many different approaches to advanced artificial general intelligence. All the different approaches to flight exploit the same core principles of aerodynamics in different ways; and similarly, the various different approaches to AGI will exploit the same core principles of general intelligence in different ways. In the chapters leading up to this one, we have taken a fairly broad view of the project of engineering AGI. We have presented a conception and formal model of intelligence, and described environments, teaching methodologies and cognitive and developmental pathways that we believe are collectively appropriate for the creation of AGI at the human level and ultimately beyond, and with a roughly human-like bias to its intelligence. These ideas stand alone and may be compatible with a variety of approaches to engineering AGI systems. However, they also set the stage for the presentation of CogPrime, the particular AGI design on which we are currently working. The thorough presentation of the CogPrime design is the job of Part 2 of this book — where, not only are the algorithms and structures involved in CogPrime reviewed in more detailed, but their relationship to the theoretical ideas underlying CogPrime is pursued more deeply. The job of this chapter is a smaller one: to give a high-level overview of some key aspects the CogPrime architecture at a mostly nontechnical level, so as to enable you to approach Part 2 with a little more idea of what to expect. The remainder of Part 1, following this chapter, will present various theoretical notions enabling the particulars, intent and consequences of the CogPrime design to be more thoroughly understood. 6.2 High-Level Architecture of CogPrime Figures 6.1, 6.2 , 6.4 and 6.5 depict the high-level architecture of CogPrime, which involves the use of multiple cognitive processes associated with multiple types of memory to enable an intelligent agent to execute the procedures that it believes have the best probability of working toward its goals in its current context. In a robot preschool context, for example, the 107 HOUSE_OVERSIGHT_013023
108 6 A Brief Overview of CogPrime top-level goals will be simple things such as pleasing the teacher, learning new information and skills, and protecting the robot’s body. Figure 6.3 shows part of the architecture via which cognitive processes interact with each other, via commonly acting on the AtomSpace knowledge repository. Comparing these diagrams to the integrative human cognitive architecture diagrams given in Chapter 5, one sees the main difference is that the CogPrime diagrams commit to specific structures (e.g. knowledge representations) and processes, whereas the generic integrative archi- tecture diagram refers merely to types of structures and processes. For instance, the integrative diagram refers generally to declarative knowledge and learning, whereas the CogPrime diagram refers to PLN, as a specific system for reasoning and learning about declarative knowledge. Ta- ble 6.1 articulates the key connections between the components of the CogPrime diagram and those of the integrative diagram, thus indicating the general cognitive functions instantiated by each of the CogPrime components. 6.3 Current and Prior Applications of OpenCog Before digging deeper into the theory, and elaborating some of the dynamics underlying the above diagrams, we pause to briefly discuss some of the practicalities of work done with the OpenCog system currently implementing parts of the CogPrime architecture. OpenCog, the open-source software framework underlying the “OpenCogPrime” (currently partial) implementation of the CogPrime architecture, has been used for commercial applica- tions in the area of natural language processing and data mining; for instance, see [GPPG06] where OpenCogPrime’s PLN reasoning and RelEx language processing are combined to do automated biological hypothesis generation based on information gathered from PubMed ab- stracts. Most relevantly to the present work, it has also been used to control virtual agents in virtual worlds [GEA08]. Prototype work done during 2007-2008 involved using an OpenCog variant called the Open- PetBrain to control virtual dogs in a virtual world (see Figure 6.6 for a screenshot of an OpenPetBrain-controlled virtual dog). While these OpenCog virtual dogs did not display in- telligence closely comparable to that of real dogs (or human children), they did demonstrate a variety of interesting and relevant functionalities including: e learning new behaviors based on imitation and reinforcement e responding to natural language commands and questions, with appropriate actions and natural language replies @ spontaneous exploration of their world, remembering their experiences and using them to bias future learning and linguistic interaction One current OpenCog initiative involves extending the virtual dog work via using OpenCog to control virtual agents in a game world inspired by the game Minecraft. These agents are initially specifically concerned with achieving goals in a game world via constructing structures with blocks and carrying out simple English communications. Representative example tasks would be: e Learning to build steps or ladders to get desired objects that are high up e Learning to build a shelter to protect itself from aggressors HOUSE_OVERSIGHT_013024
6.3 Current and Prior Applications of OpenCog 109 SPACETIME DIMENSIONAL BACKUP SERVER EMBEDDING STORE SPACE ASSOCIATIVE EPISODIC MEMORY REPOSITORY / \ FORMATION MaSES/ WORLD CLIMBING ‘errr PROCEDURE AGING Te EMBEDDING ATTENTION AGRA ALLOCATION EPISODIC ENCODING / RECALL FORGETTING/ FREEZING / DEFROSTING LEARNING BLENDING [ey ~DECLARATIVE/ y | SEMANTIC ATOMS cuusTerine |S | : \, a HEBBIAN \, @) ATOMS ARE \ oy atoms \ . SELECTIVELY PLN A/T \ 9 \ “FORGOTTEN” PROBABILSTIC / he pee h \ 4 INFERENCE / PROCEDURE Se | 23 f ATOMS Es 7) NEW atoms = \ | ARE FORMED i A 2 |e ‘ | ot ALL ATOMS | / oe. @\| piatocve =| Pal HAVE SHORT | f s PROCEDURE / a ager AND LONG-TERM i 4 TOMS ( i2? IMPORTANCE VALUES ( jf MOTOR ¥ ¥ i f PROCEDURE f | / ATOMS _—— NY SOME ATOMS HAVE “N (UNCERTAIN) GOAL | (@ FEELING TRUTH VALUES ATOMS e ATOMS ATOM SPACE PATTERN MINER PATTERN IMPRINTER PERCEPTION HIERARCHY LANGUAGE LANGUAGE COMPREHENSION GENERATION HERARCHY HIERARCHY, SENSORS ~ ACTUATORS Fig. 6.1: High-Level Architecture of CogPrime. This is a conceptual depiction, not a detailed flowchart (which would be too complex for a single image). Figures 6.2 , 6.4 and 6.5 highlight specific aspects of this diagram. e Learning to build structures resembling structures that it’s shown (even if the available materials are a bit different) e Learning how to build bridges to cross chasms Of course, the AI significance of learning tasks like this all depends on what kind of feedback the system is given, and how complex its environment is. It would be relatively simple to make an AI system do things like this in a trivial and highly specialized way, but that is not the intent of the project the goal is to have the system learn to carry out tasks like this using general learning mechanisms and a general cognitive architecture, based on embodied experience and HOUSE_OVERSIGHT_013025
110 6 A Brief Overview of CogPrime only scant feedback from human teachers. If successful, this will provide an outstanding platform for ongoing AGI development, as well as a visually appealing and immediately meaningful demo for OpenCog. Specific, particularly simple tasks that are the focus of this project team’s current work at time of writing include: e Watch another character build steps to reach a high-up object e Figure out via imitation of this that, in a different context, building steps to reach a high up object may be a good idea e Also figure out that, if it wants a certain high-up object but there are no materials for building steps available, finding some other way to get elevated will be a good idea that may help it get the object 6.3.1 Transitioning from Virtual Agents to a Physical Robot Preliminary experiments have also been conducted using OpenCog to control a Nao robot as well as a virtual dog [GdGO08]. This involves hybridizing OpenCog with a separate (but interlinked) subsystem handling low-level perception and action. In the experiments done so far, this has been accomplished in an extremely simplistic way. How to do this right is a topic treated in detail in Chapter 26 of Part 2. We suspect that reasonable level of capability will be achievable by simply interposing DeS- TIN (or some other system in its place) as a perception /action “black box” between OpenCog and a robot. Some preliminary experiments in this direction have already been carried out, con- necting the OpenPetBrain to a Nao robot using simpler, less capable software than DeSTIN in the intermediary role (off-the-shelf speech-to-text, text-to-speech and visual object recognition software). However, we also suspect that to achieve robustly intelligent robotics we must go beyond this approach, and connect robot perception and actuation software with OpenCogPrime in a “white box” manner that allows intimate dynamic feedback between perceptual, motoric, cognitive and linguistic functions. We will achieve this via the creation and real-time utilization of links between the nodes in CogPrime’s and DeSTIN’s internal networks (a topic to be explored in more depth later in this chapter). 6.4 Memory Types and Associated Cognitive Processes in CogPrime Now we return to the basic description of the CogPrime approach, turning to aspects of the relationship between structure and dynamics. Architecture diagrams are all very well, but, ultimately it is dynamics that makes an architecture come alive. Intelligence is all about learning, which is by definition about change, about dynamical response to the environment and internal self-organizing dynamics. CogPrime relies on multiple memory types and, as discussed above, is founded on the premise that the right course in architecting a pragmatic, roughly human-like AGI system is to handle different types of memory differently in terms of both structure and dynamics. HOUSE_OVERSIGHT_013026
6.4 Memory Types and Associated Cognitive Processes in CogPrime 111 CogPrime’s memory types are the declarative, procedural, sensory, and episodic memory types that are widely discussed in cognitive neuroscience [TCO05], plus attentional memory for allocating system resources generically, and intentional memory for allocating system resources in a goal-directed way. Table 6.2 overviews these memory types, giving key references and indi- cating the corresponding cognitive processes, and also indicating which of the generic patternist cognitive dynamics each cognitive process corresponds to (pattern creation, association, etc.). Figure 6.7 illustrates the relationships between several of the key memory types in the context of a simple situation involving an OpenCogPrime-controlled agent in a virtual world. In terms of patternist cognitive theory, the multiple types of memory in CogPrime should be considered as specialized ways of storing particular types of patterns, optimized for spacetime efficiency. The cognitive processes associated with a certain type of memory deal with creating and recognizing patterns of the type for which the memory is specialized. While in principle all the different sorts of pattern could be handled in a unified memory and processing architecture, the sort of specialization used in CogPrime is necessary in order to achieve acceptable efficient general intelligence using currently available computational resources. And as we have argued in detail in Chapter 7, efficiency is not a side-issue but rather the essence of real-world AGI (since as Hutter has shown, if one casts efficiency aside, arbitrary levels of general intelligence can be achieved via a trivially simple program). The essence of the CogPrime design lies in the way the structures and processes associated with each type of memory are designed to work together in a closely coupled way, yielding coop- erative intelligence going beyond what could be achieved by an architecture merely containing the same structures and processes in separate “black boxes.” The inter-cognitive-process interactions in OpenCog are designed so that e conversion between different types of memory is possible, though sometimes computation- ally costly (e.g. an item of declarative knowledge may with some effort be interpreted procedurally or episodically, etc.) e when a learning process concerned centrally with one type of memory encounters a situation where it learns very slowly, it can often resolve the issue by converting some of the relevant knowledge into a different type of memory: i.e. cognitive synergy 6.4.1 Cognitive Synergy in PLN To put a little meat on the bones of the "cognitive synergy" idea, discussed repeatedly in prior chapters and more extensively in latter chapters, we now elaborate a little on the role it plays in the interaction between procedural and declarative learning. While MOSES handles much of CogPrime’s procedural learning, and CogPrime’s internal simulation engine handles most episodic knowledge, CogPrime’s primary tool for handling declarative knowledge is an uncertain inference framework called Probabilistic Logic Networks (PLN). The complexities of PLN are the topic of a lengthy technical monograph [GMIT08], and are summarized in Chapter 34; here we will eschew most details and focus mainly on pointing out how PLN seeks to achieve efficient inference control via integration with other cognitive processes. Asa logic, PLN is broadly integrative: it combines certain term logic rules with more standard predicate logic rules, and utilizes both fuzzy truth values and a variant of imprecise probabilities called indefinite probabilities. PLN mathematics tells how these uncertain truth values propagate HOUSE_OVERSIGHT_013027
112 6 A Brief Overview of CogPrime through its logic rules, so that uncertain premises give rise to conclusions with reasonably accurately estimated uncertainty values. This careful management of uncertainty is critical for the application of logical inference in the robotics context, where most knowledge is abstracted from experience and is hence highly uncertain. PLN can be used in either forward or backward chaining mode; and in the language intro- duced above, it can be used for either analysis or synthesis. As an example, we will consider backward chaining analysis, exemplified by the problem of a robot preschoolstudent trying to determine whether a new playmate “Bob” is likely to be a regular visitor to is preschool or not (evaluating the truth value of the implication Bob > regular_ visitor). The basic backward chaining process for PLN analysis looks like: 1. Given an implication L = A — B whose truth value must be estimated (for instance L = Concept Procedure > Goal as discussed above), create a list (Az, ..., An) of (inference rule, stored knowledge) pairs that might be used to produce L 2. Using analogical reasoning to prior inferences, assign each A; a probability of success e If some of the A; are estimated to have reasonable probability of success at generating reasonably confident estimates of D’s truth value, then invoke Step 1 with A; in place of LE (at this point the inference process becomes recursive) e If none of the A; looks sufficiently likely to succeed, then inference has “gotten stuck” and another cognitive process should be invoked, e.g. — Concept creation may be used to infer new concepts related to A and B, and then Step 1 may be revisited, in the hope of finding a new, more promising A; involving one of the new concepts — MOSES may be invoked with one of several special goals, e.g. the goal of finding a procedure P so that P(X) predicts whether X — B. If MOSES finds such a procedure P then this can be converted to declarative knowledge understandable by PLN and Step 1 may be revisited.... — Simulations may be run in CogPrime’s internal simulation engine, so as to observe the truth value of A > B in the simulations; and then Step 1 may be revisited... The combinatorial explosion of inference control is combatted by the capability to defer to other cognitive processes when the inference control procedure is unable to make a sufficiently confident choice of which inference steps to take next. Note that just as MOSES may rely on PLN to model its evolving populations of procedures, PLN may rely on MOSES to create complex knowledge about the terms in its logical implications. This is just one example of the multiple ways in which the different cognitive processes in CogPrime interact synergetically; a more thorough treatment of these interactions is given in [Goe09al. In the “new playmate” example, the interesting case is where the robot initially seems not to know enough about Bob to make a solid inferential judgment (so that none of the A; seem particularly promising). For instance, it might carry out a number of possible inferences and not come to any reasonably confident conclusion, so that the reason none of the A; seem promising is that all the decent-looking ones have been tried already. So it might then recourse to MOSES, simulation or concept creation. For instance, the PLN controller could make a list of everyone who has been a regular visitor, and everyone who has not been, and pose MOSES the task of figuring out a procedure for distinguishing these two categories. This procedure could then be used directly to make the needed assessment, or else be translated into logical rules to be used within PLN inference. For HOUSE_OVERSIGHT_013028
6.5 Goal-Oriented Dynamics in CogPrime 113 example, perhaps MOSES would discover that older males wearing ties tend not to become regular visitors. If the new playmate is an older male wearing a tie, this is directly applicable. But if the current playmate is wearing a tuxedo, then PLN may be helpful via reasoning that even though a tuxedo is not a tie, it’s a similar form of fancy dress — so PLN may extend the MOSES-learned rule to the present case and infer that the new playmate is not likely to be a regular visitor. 6.5 Goal-Oriented Dynamics in CogPrime CogPrime’s dynamics has both goal-oriented and “spontaneous” aspects; here for simplicity’s sake we will focus on the goal-oriented ones. The basic goal-oriented dynamic of the CogPrime system, within which the various types of memory are utilized, is driven by implications known as “cognitive schematics”, which take the form Context \ Procedure > Goal < p> (summarized C A P > G). Semi-formally, this implication may be interpreted to mean: “If the context C' appears to hold currently, then if I enact the procedure P, I can expect to achieve the goal G with certainty p.” Cognitive synergy means that the learning processes corresponding to the different types of memory actively cooperate in figuring out what procedures will achieve the system’s goals in the relevant contexts within its environment. CogPrime’s cognitive schematic is significantly similar to production rules in classical ar- chitectures like SOAR and ACT-R (as reviewed in Chapter 4; however, there are significant differences which are important to CogPrime’s functionality. Unlike with classical production rules systems, uncertainty is core to CogPrime’s knowledge representation, and each CogPrime cognitive schematic is labeled with an uncertain truth value, which is critical to its utilization by CogPrime’s cognitive processes. Also, in CogPrime, cognitive schematics may be incomplete, missing one or two of the terms, which may then be filled in by various cognitive processes (generally in an uncertain way). A stronger similarity is to MicroPsi’s triplets; the differences in this case are more low-level and technical and have already been mentioned in Chapter 4. Finally, the biggest difference between CogPrime’s cognitive schematics and production rules or other similar constructs, is that in CogPrime this level of knowledge representation is not the only important one. CLARION [SZ04], as reviewed above, is an example of a cognitive architecture that uses production rules for explicit knowledge representation and then uses a totally separate subsymbolic knowledge store for implicit knowledge. In CogPrime both explicit and implicit knowledge are stored in the same graph of nodes and links, with e explicit knowledge stored in probabilistic logic based nodes and links such as cognitive schematics (see Figure 6.8 for a depiction of some explicit linguistic knowledge.) e implicit knowledge stored in patterns of activity among these same nodes and links, defined via the activity of the “importance” values (see Figure 6.9 for an illustrative example thereof) associated with nodes and links and propagated by the ECAN attention allocation process The meaning of a cognitive schematic in CogPrime is hence not entirely encapsulated in its explicit logical form, but resides largely in the activity patterns that ECAN causes its activation or exploration to give rise to. And this fact is important because the synergetic interactions of system components are in large part modulated by ECAN activity. Without the real-time HOUSE_OVERSIGHT_013029
114 6 A Brief Overview of CogPrime combination of explicit and implicit knowledge in the system’s knowledge graph, the synergetic interaction of different cognitive processes would not work so smoothly, and the emergence of effective high-level hierarchical, heterarchical and self structures would be less likely. 6.6 Analysis and Synthesis Processes in CogPrime We now return to CogPrime’s fundamental cognitive dynamics, using examples from the “virtual dog” application to motivate the discussion. The cognitive schematic Context \ Procedure — Goal leads to a conceptualization of the internal action of an intelligent system as involving two key categories of learning: e Analysis: Estimating the probability p of a posited C A P > G relationship e Synthesis: Filling in one or two of the variables in the cognitive schematic, given as- sumptions regarding the remaining variables, and directed by the goal of maximizing the probability of the cognitive schematic More specifically, where synthesis is concerned, e The MOSES probabilistic evolutionary program learning algorithm is applied to find P, given fixed C' and G. Internal simulation is also used, for the purpose of creating a simulation embodying C and seeing which P lead to the simulated achievement of G. — Example: A virtual dog learns a procedure P to please its owner (the goal G) in the contert C where there is a ball or stick present and the owner is saying “fetch”. e PLN inference, acting on declarative knowledge, is used for choosing C, given fixed P and G (also incorporating sensory and episodic knowledge as appropriate). Simulation may also be used for this purpose. — Example: A virtual dog wants to achieve the goal G of getting food, and it knows that the procedure P of begging has been successful at this before, so it seeks a context C where begging can be expected to get it food. Probably this will be a context involving a friendly person. e PLN-based goal refinement is used to create new subgoals G to sit on the right hand side of instances of the cognitive schematic. — Example: Given that a virtual dog has a goal of finding food, it may learn a subgoal of following other dogs, due to observing that other dogs are often heading toward their food. e Concept formation heuristics are used for choosing G and for fueling goal refinement, but especially for choosing C (via providing new candidates for C). They are also used for choosing P, via a process called “predicate schematization” that turns logical predicates (declarative knowledge) into procedures. — Example: At first a virtual dog may have a hard time predicting which other dogs are going to be mean to it. But it may eventually observe common features among a number of mean dogs, and thus form its own concept of “pit bull,” without anyone ever teaching it this concept explicitly. HOUSE_OVERSIGHT_013030
6.6 Analysis and Synthesis Processes in CogPrime 115 Where analysis is concerned: e PLN inference, acting on declarative knowledge, is used for estimating the probability of the implication in the cognitive schematic, given fixed C, P and G. Episodic knowledge is also used in this regard, via enabling estimation of the probability via simple similarity matching against past experience. Simulation is also used: multiple simulations may be run, and statistics may be captured therefrom. — Example: To estimate the degree to which asking Bob for food (the procedure P is “asking for food”, the context C is “being with Bob”) will achieve the goal G of getting food, the virtual dog may study its memory to see what happened on previous occasions where it or other dogs asked Bob for food or other things, and then integrate the evidence from these occasions. e Procedural knowledge, mapped into declarative knowledge and then acted on by PLN in- ference, can be useful for estimating the probability of the implication C A P > G, in cases where the probability of C A P, — G is known for some P, related to P. — Example: knowledge of the internal similarity between the procedure of asking for food and the procedure of asking for toys, allows the virtual dog to reason that if asking Bob for toys has been successful, maybe asking Bob for food will be successful too. e Inference, acting on declarative or sensory knowledge, can be useful for estimating the probability of the implication C A P > G, in cases where the probability of C; A P > G is known for some C} related to C. — Example: if Bob and Jim have a lot of features in common, and Bob often responds positively when asked for food, then maybe Jim will too. e Inference can be used similarly for estimating the probability of the implication CA P > G, in cases where the probability of C A P > G, is known for some G, related to G. Concept creation can be useful indirectly in calculating these probability estimates, via providing new concepts that can be used to make useful inference trails more compact and hence easier to construct. — Example: The dog may reason that because Jack likes to play, and Jack and Jill are both children, maybe Jill likes to play too. It can carry out this reasoning only if its concept creation process has invented the concept of “child” via analysis of observed data. In these examples we have focused on cases where two terms in the cognitive schematic are fixed and the third must be filled in; but just as often, the situation is that only one of the terms is fixed. For instance, if we fix G, sometimes the best approach will be to collectively learn C' and P. This requires either a procedure learning method that works interactively with a declarative-knowledge-focused concept learning or reasoning method; or a declarative learning method that works interactively with a procedure learning method. That is, it requires the sort of cognitive synergy built into the CogPrime design. HOUSE_OVERSIGHT_013031
116 6 A Brief Overview of CogPrime 6.7 Conclusion To thoroughly describe a comprehensive, integrative AGI architecture in a brief chapter would be an impossible task; all we have attempted here is a brief overview, to be elaborated on in the 800-odd pages of Part 2 of this book. We do not expect this brief summary to be enough to convince the skeptical reader that the approach described here has a reasonable odds of success at achieving its stated goals, or even of fulfilling the conceptual notions outlined in the preceding chapters. However, we hope to have given the reader at least a rough idea of what sort of AGI design we are advocating, and why and in what sense we believe it can lead to advanced artificial general intelligence. For more details on the structure, dynamics and underlying concepts of CogPrime, the reader is encouraged to proceed to Part 2- after completing Part 1, of course. Please be patient — building a thinking machine is a big topic, and we have a lot to say about it! HOUSE_OVERSIGHT_013032
6.7 Conclusion 117 PROCEDURE REPOSITORY ASSOCIATIVE EPISODIC MEMORY SPACETIME BACKUB SERVER STORE REPOSITORY : FORMATION MpaEAd WORLD EPISODIC FORGETTING/ aasaniiaaaaics Gumniia SIMULATION] | ENCODL pec ENGINE / RECALL s ATTENTION hepantry ALLOCATION LEARNING BLENDING DECLARATIVE/ SEMANTIC ATOMS (ey 5 \ NEW ATOMS : ARE FORMED CLUSTERING Pal - % HEBBIAN O ATOMS ARE oy Atoms / SELECTIVELY “FORGOTTEN” PROCEDURE ATOMS 7 ° ad ‘a = | a ALL ATOMS ) 2 AND LONG-TERM @ Ulstoms \4¢ IMPORTANCE VALUES MOTOR a PROCEDURE ATOMS oe a / . SOME ATOMS HAVE 7 e e £4 NK (UNCERTAIN) e GOAL @ FEELING ) TRUTH VALUES ATOMS ATOMS ATOM SPACE PATTERN MINER PATTERN IMPRINTER Co | Cc ACTION HIERARCHY, PERCEPTION HIERARCHY HIERARCHY LANGUAGE COMPREHENSION HERARCHY LANGUAGE GENERATION HIERARCHY, SENSORS ACTUATORS Fig. 6.2: Key Explicitly Implemented Processes of CogPrime . The large box at the center is the Atomspace, the system’s central store of various forms of (long-term and working) memory, which contains a weighted labeled hypergraph whose nodes and links are "Atoms" of various sorts. The hexagonal boxes at the bottom denote various hierarchies devoted to recog- nition and generation of patterns: perception, action and linguistic. Intervening between these recognition/generation hierarchies and the Atomspace, we have a pattern mining/imprinting component (that recognizes patterns in the hierarchies and passes them to the Atomspace; and imprints patterns from the Atomspace on the hierarchies); and also OpenPsi, a special dynam- ical framework for choosing actions based on motivations. Above the Atomspace we have a host of cognitive processes, which act on the Atomspace, some continually and some only as context dictates, carrying out various sorts of learning and reasoning (pertinent to various sorts of memory) that help the system fulfill its goals and motivations. HOUSE_OVERSIGHT_013033
6 A Brief Overview of CogPrime Mind Agents Mind Agents 118 A * 4 oI "AtorTi nT = —_ e Mind Agents Mind Agents Fig. 6.3: MindAgents and AtomSpace in OpenCog. This is a conceptual depiction of one way cognitive processes may interact in OpenCog — they may be wrapped in MindAgent objects, which interact via cooperatively acting on the AtomSpace. HOUSE_OVERSIGHT_013034
6.7 Conclusion DIMENSIONAL EMBEDDING ASSOCIATIVE SPACETIME PROCEDURE EPISODIC SERVER REPOSITORY MEMORY SPACE REPOSITORY BACKUP STORE MAP FORMATION PROCEDURE LEARNING MaSES/ WORLD EPISODIC FORGETTING/ mannan qineiuc SIMULATION, | ENCODING FREEZING) | A AO ENGINE / RECALL DEFROSTING | / | ATTENTION ALLOCATION oo }/ / BLENDING DECLARATIVE/ ATOMS | SEMANTIC ATOMS a NEW ATOMS | 7 | me a, o ) ARE FORMED } \ ya | K | if ™. | CLUSTERII | of ‘ | Nh a HEBBIAN @) ATOMS ARE } oy atoms \ SELECTIVELY | PLN / ar h \ “FORGOTTEN” } PROBABILSTIC ‘ i | INFERENCE X \ eR | | | \, } | ‘ f & ALL ATOMS | DIALOGUE at" HAVE SHORT PROCEDURE / = 3 _ AND LONG-TERM | ATOMS | 42°” IMPORTANCE VALUES } MOTOR ¥ ¥ | PROCEDURE x ] ATOMS _—— | a if SOME ATOMS HAVE | e a @ £ 7 (UNCERTAIN) | @ coat | J@ FEELING f “TRUTH VALUES | F e Atoms ‘. | \ ATOM SPACE PERCEPTION HIERARCHY LANGUAGE GENERATION HIERARCHY LANGUAGE COMPREHENSIO| HERARCHY SENSORS HIERARCHY, 119 ACTUATORS Fig. 6.4: Links Between Cognitive Processes and the Atomspace. The cognitive pro- cesses depicted all act on the Atomspace, in the sense that they operate by observing certain Atoms in the Atomspace and then modifying (or in rare cases deleting) the m, and potentially adding new Atoms as well. Atoms represent all forms of knowledge, but some forms of knowl- edge are additionally represented by external data stores connected to the Atomspace, such as the Procedure Repository; these are also shown as linked to the Atomspace. HOUSE_OVERSIGHT_013035
ASSOCIATIVE 120 6 A Brief Overview of CogPrime EPISODIC MEMORY SPACETIME DIMENSIONAL BACKUP SERVER EMBEDDING STORE SPACE REPOSITORY MAP . . FORMATION - WORLD EPISODIC FORGETTING/ cute LATION, ODING FREEZING/ wanna ENGINE RECALL EFROSTING ATTENTION PRASEBUBE q x ALLOCATION > ; ¥ ¥ _-DECLARATIVE/ > \ Pe SEMANTIC ATOMS | NEW ATOMS 4 @ ARE FORMED a PROCEDURE REPOSITORY MOSES/ CLUSTERING PLN PROBABILSTIC INFERENCE % HEBBIAN oy sToms PROCEDURE ATOMS eX ALL ATOMS: DIALOGUE “et ) HAVE SHORT PROCEDURE oe ~ AND LONG-TERM ATOMS. ( rae? IMPORTANCE VALUES AS | ATOMS ARE SELECTIVELY “FORGOTTEN” MOTOR PROCEDURE ATOMS ua SOME ATOMS HAVE oe (UNCERTAIN) e e GOAL FEELING ) TRUTH VALUES. ATOMS ATOM SPACE PATTERN MINER PATTERN IMPRINTER PERCEPTION HIERARCHY HIERARCHY, LANGUAGE COMPREHENSION HERARCHY LANGUAGE GENERATION HIERARCHY, SENSORS ACTUATORS Fig. 6.5: Invocation of Atom Operations By Cognitive Processes. This diagram depicts some of the Atom modification, creation and deletion operations carried out by the abstract cognitive processes in the CogPrime architecture. HOUSE_OVERSIGHT_013036
6.7 Conclusion 121 CogPrime Int. Diag. . Campument at Ditenseeam Int. Diag. Component Procedure Repository | Long-Term Memory Procedural Procedure Repository Working Memory Active Procedural AeaomiatLe Episcdic Long-Term Memory Episodic Memory osoemlive Bplsedie Working Memory Transient Episodic Memory no correlate: a function not Backup Store Long-Term Memory necessarily possessed by the human mind Spacetime Server Long-Term Memory Declarative and Sensorimotor ? ‘ no clear correlate: a Dimensional ; tool for helping Embedding Space multiple types of LT M Dimensional l lat Binisedidiine deseut no clear correlate Long-Term and Blending ° Concept Formation Working Memory Long-T d Clustering Ons erm an Concept Formation Working Memory PLN Probabilistic Long-Term and Reasoning and Plan Inference Working Memory Learning/Optimization MOSES! / Hilldiimbing| ,V°ne tem. and : Procedure Learning Working Memory Long-Term and World Simulation ; Simulation Working Memory Higa Bneomiang / Long-Term g Memory Story-telling Episodic Encoding / . bai: Recall Working Memory Consolidation ‘ ‘ no correlate: a function not Forgetting / Freezing Long-Term and ; . ° necessarily possessed by the human / Defrosting Working Memory : mind Cc t Fi ti d Patt Map Formation Long-Term Memory cin in cccaaaaaaiaiils Mining Long-Term and Working Memory High-Level Mind Architecture Attention Allocation Hebbian/ Attentional Learning Attention Allocation Reinforcement P tual A iative M d Attention Allocation Working Memory GREE PUN GE ASS OGIBEING SS OIDORY BD Local Association no clear correlate: a general tool for High-Level Mind representing memory including AtomSpace . . Architecture long-term and working, plus some of perception and action Global Workspace (the high-STI AtomSpace Working Memory portion of AtomSpace) & other Workspaces . Long-T d : . Declarative Atoms ong Lorman Declarative and Sensorimotor Working Memory Long-Term and Procedure Atoms ; Procedural Working Memory Long-T d Hebbian Atoms a ia Attentional Working Memory Long-T d Goal Atoms ons wenn an Intentional Working Memory . Long-Term and spanning Declarative, Intentional and Feeling Atoms ; . Working Memory Sensorimotor High-L Mind OpenPsi pie oe Nie Motivation / Action Selection Architecture OpenPsi Working Memory Action Selection ; High-Level Mind arrows between perception and Pattern Miner . . Architecture working and long-term memory arrows between sensory memory and Pattern Miner Working Memory perceptual associative and transient episodic memory arrows between action selection and 'HOUSE_OVERSIGHT_013037
122 6 A Brief Overview of CogPrime Local Party Combat System [Jally |; LOOK- around you, [Sally]: is-the red ball next to the tree? [Fido]: Yes Is the bone next to the fountain? ]: No What is the color of the ball? The ball is red What is next to the tree? [Fido]: The ted ball is next to the tree bs ! ARATIVE KNOWLEDGE Predictive Maintain Evaluation Appropriate INTENTIONAL KNOWLEDGE Predictive Implication fd “My Energy (Current Location Find Entity - - Near | Evaluation Link ee IVE LARAT aL DEG cEOURON Fonver@ Hebbian Link (Inheritance get My Enersy| ~ ( Find Entity Near PROCEDURAL KNOWLEDGE EPISODIC/SENSORY KNOWLEDGE Fig. 6.7: Relationship Between Multiple Memory Types. The bottom left corner shows a program tree, constituting procedural knowledge. The upper left shows declarative nodes and links in the Atomspace. The upper right corner shows a relevant system goal. The lower right corner contains an image symbolizing relevant episodic and sensory knowledge. All the various types of knowledge link to each other and can be approximatively converted to each other. HOUSE_OVERSIGHT_013038
6.7 Conclusion 123 General Cognitive Memory Type Specific Cognitive Processes Functions Probabilistic Logic Networks (PLN) Declarative [GMIHO08]; conceptual blending pattern creation [FTO] MOSES (a novel probabilistic Procedural evolutionary program learning pattern creation algorithm) [Loo06] association, pattern Episodic internal simulation engine [GEHA08] ; creation Attentional Economic Attention Networks association, credit (ECAN) [GPI* 10] assignment probabilistic goal hierarchy refined by . ; intentional PLN and ECAN, structured credit assignment, tt ti according to MicroPsi [Bac09] Paneern crennon association, attention In CogBot, this will be supplied by allocation, pattern s 3 , ensory the DeSTIN component creation, credit assignment Table 6.2: Memory Types and Cognitive Processes in CogPrime. The third column indicates the general cognitive function that each specific cognitive process carries out, according to the patternist theory of cognition. HOUSE_OVERSIGHT_013039
124 6 A Brief Overview of CogPrime Word Node e aereren™ | - A bia meee Reference \ yin Link Rake / Concept Node & io @ Y > List Link O) , Concept Hy, Reference Node — “*bbian Link \ Link Hebbian / \\Hebbian / % “ Link Link { aoe i \ concept (os Concept _/ \ Concept Node Node O O Node \ | \ \ 3 \ | \ \¢& \ 2 \ a 3%. \ o \ Fo | 2 Sa \o% 3 @ 3 0 | 30 E % 3) a4 | 5 | | @ | Fa \ | 4 \ I o Word Node Fig. 6.8: Example of Explicit Knowledge in the Atomspace. One simple example of explicitly represented knowledge in the Atomspace is linguistic knowledge, such as words and the concepts directly linked to them. Not all of a CogPrime system’s concepts correlate to words, but some do. HOUSE_OVERSIGHT_013040
6.7 Conclusion 125 Bundle of: chicken @—_p food links Fig. 6.9: Example of Implicit Knowledge in the Atomspace. A simple example of implicit knowledge in the Atomspace. The "chicken" and "food" concepts are represented by "maps" of ConceptNodes interconnected by HebbianLinks, where the latter tend to form between Con- ceptNodes that are often simultaneously important. The bundle of links between nodes in the chicken map and nodes in the food map, represents an “implicit, emergent link" between the two concept maps. This diagram also illustrates "glocal" knowledge representation, in that the chicken and food concepts are each represented by individual nodes, but also by distributed maps. The "chicken" ConceptNode, when important, will tend to make the rest of the map important — and vice versa. Part of the overall chicken concept possessed by the system is ex- pressed by the explicit links coming out of the chicken ConceptNode, and part is represented only by the distributed chicken map as a whole. HOUSE_OVERSIGHT_013041
HOUSE_OVERSIGHT_013042
Section IT Toward a General Theory of General Intelligence HOUSE_OVERSIGHT_013043
HOUSE_OVERSIGHT_013044
Chapter 7 A Formal Model of Intelligent Agents 7.1 Introduction The artificial intelligence field is full of sophisticated mathematical models and equations, but most of these are highly specialized in nature — e.g. formalizations of particular logic systems, analyzes of the dynamics of specific sorts of neural nets, etc. On the other hand, a number of highly general models of intelligent systems also exist, including Hutter’s recent formalization of universal intelligence [[ut05] and a large body of work in the disciplines of systems science and cybernetics — but these have tended not to yield many specific lessons useful for engineering AGI systems, serving more as conceptual models in mathematical form. It would be fantastic to have a mathematical theory bridging these extremes — a real "general theory of general intelligence," allowing the derivation and analysis of specific structures and processes playing a role in practical AGI systems, from broad mathematical models of general intelligence in various situations and under various constraints. However, the path to such a theory is not entirely clear at present; and, as valuable as such a theory would be, we don’t believe such a thing to be necessary for creating advanced AGI. One possibility is that the development of such a theory will occur contemporaneously and synergetically with the advent of practical AGI technology. Lacking a mature, pragmatically useful "general theory of general intelligence," however, we have still found it valuable to articulate certain theoretical ideas about the nature of general intelligence, with a level of rigor a bit greater than the wholly informal discussions of the previous chapters. The chapters in this section of the book articulate some ideas we have developed in pursuit of a general theory of general intelligence; ideas that, even in their current relatively undeveloped form, have been very helpful in guiding our concrete work on the CogPrime design. This chapter presents a more formal version of the notion of intelligence as “achieving complex goals in complex environments,” based on a formal model of intelligent agents. These formal- izations of agents and intelligence will be used in later chapters as a foundation for formalizing other concepts like inference and cognitive synergy. Chapters 8 and 9 pursue the notion of cog- nitive synergy a little more thoroughly than was done in previous chapters. Chapter 10 sketches a general theory of general intelligence using tools from category theory — not bringing it to the level where one can use it to derive specific AGI algorithms and structures; but still, presenting ideas that will be helpful in interpreting and explaining specific aspects of the CogPrime design in Part 2. Finally, Appendix ?? explores an additional theoretical direction, in which the mind of an intelligent system is viewed in terms of certain curved spaces — a novel way of thinking 129 HOUSE_OVERSIGHT_013045
130 7 A Formal Model of Intelligent Agents about the dynamics of general intelligence, which has been useful in guiding development of the ECAN component of CogPrime, and we expect will have more general value in future. Despite the intermittent use of mathematical formalism, the ideas presented in this section are fairly speculative, and we do not propose them as constituting a well-demonstrated theory of general intelligence. Rather, we propose them as an interesting way of thinking about general intelligence, which appears to be consistent with available data, and which has proved inspira- tional to us in conceiving concrete structures and dynamics for AGI, as manifested for example in the CogPrime design. Understanding the way of thinking described in these chapters is valu- able for understanding why the CogPrime design is the way it is, and for relating CogPrime to other practical and intellectual systems, and extending and improving CogPrime. 7.2 A Simple Formal Agents Model (SRAM) We now present a formalization of the concept of “intelligent agents” — beginning with a for- malization of “agents” in general. Drawing on [Iut05, LM07a], we consider a class of active agents which observe and explore their environment and also take actions in it, which may affect the environment. Formally, the agent sends information to the environment by sending symbols from some finite alphabet called the action space 7; and the environment sends signals to the agent with symbols from an alphabet called the perception space, denoted P. Agents can also experience rewards, which lie in the reward space, denoted R, which for each agent is a subset of the rational unit interval. The agent and environment are understood to take turns sending signals back and forth, yielding a history of actions, observations and rewards, which may be denoted a1,0171aA909Pr9... or else A,X AQX)... if a is introduced as a single symbol to denote both an observation and a reward. The complete interaction history up to and including cycle t is denoted aa1..; and the history before cycle t is denoted ave, = ax14_1. The agent is represented as a function 7 which takes the current history as input, and pro- duces an action as output. Agents need not be deterministic, an agent may for instance induce a probability distribution over the space of possible actions, conditioned on the current history. In this case we may characterize the agent by a probability distribution 7(a:|av<¢). Similarly, the environment may be characterized by a probability distribution (7, lav <,ax). Taken together, the distributions 7 and y define a probability measure over the space of interaction sequences. Next, we extend this model in a few ways, intended to make it better reflect the realities of intelligent computational agents. The first modification is to allow agents to maintain memories (of finite size), via adding memory actions drawn from a set M into the history of actions, observations and rewards. The second modification is to introduce the notion of goals. HOUSE_OVERSIGHT_013046
7.2 A Simple Formal Agents Model (SRAM) 131 7.2.1 Goals We define goals as mathematical functions (to be specified below) associated with symbols drawn from the alphabet G; and we consider the environment as sending goal-symbols to the agent along with regular observation-symbols. (Note however that the presentation of a goal- symbol to an agent does not necessarily entail the explicit communication to the agent of the contents of the goal function. This must be provided by other, correlated observations.) We also introduce a conditional distribution y(g, 4) that gives the weight of a goal g in the context of a particular environment ju. In this extended framework, an interaction sequence looks like 410191714202g92P9... or else a1 Yy1a2yo... where g; are symbols corresponding to goals, and y is introduced as a single symbol to denote the combination of an observation, a reward and a goal. Each goal function maps each finite interaction sequence Jg,.2 = ays: with gs to g¢ corre- sponding to g, into a value rg(Ig,s,2) € [0,1] indicating the value or “raw reward” of achieving the goal during that interaction sequence. The total reward 7; obtained by the agent is the sum of the raw rewards obtained at time ¢ from all goals whose symbols occur in the agent’s history before t. This formalism of goal-seeking agents allows us to formalize the notion of intelligence as “achieving complex goals in complex environments” — a direction that is pursued in Section 7.3 below. Note that this is an external perspective of system goals, which is natural from the perspective of formally defining system intelligence in terms of system behavior, but is not necessarily very natural in terms of system design. From the point of view of AGI design, one is generally more concerned with the (implicit or explicit) representation of goals inside an AGI system, as in CogPrime’s Goal Atoms to be reviewed in Chapter 22 below. Further, it is important to also consider the case where an AGI system has no explicit goals, and the system’s environment has no immediately identifiable goals either. But in this case, we don’t see any clear way to define a system’s intelligence, except via approximating the system in terms of other theoretical systems which do have explicit goals. This approximation approach is developed in Section 7.3.5 below. The awkwardness of linking the general formalism of intelligence theory presented here, with the practical business of creating and designing AGI systems, may indicate a shortcoming on the part of contemporary intelligence theory or AGI designs. On the other hand, this sort of situation often occurs in other domains as well — e.g. the leap from quantum theory to the analysis of real-world systems like organic molecules involves a lot of awkwardness and large leaps a well. HOUSE_OVERSIGHT_013047
132 7 A Formal Model of Intelligent Agents 7.2.2 Memory Stores As well as goals, we introduce into the model a long-term memory and a workspace. Regarding long-term memory we assume the agent’s memory consists of multiple memory stores corre- sponding to various types of memory, e.g.: procedural (Kp,..), declarative (K pec), episodic (Kp,), attentional (4%) and Intentional (A7,;). In Appendix ?? a category-theoretic model of these memory stores is introduced; but for the moment, we need only assume the existence of e an injective mapping Op, : Kp, + H where H is the space of fuzzy sets of subhistories (subhistories being “episodes” in this formalism) @ an injective mapping Op;oe: K proce X M x W > A, where M is the set of memory states, W is the set of (observation, goal, reward) triples, and A is the set of actions (this maps each procedure object into a function that enacts actions in the environment or memory, based on the memory state and current world-state) @ an injective mapping Opec : Kpee > £, where L is the set of expressions in some formal lan- guage (which may for example be a logical language), which possesses words corresponding to the observations, goals, reward values and actions in our agent formalism @ an injective mapping Ornz : Kinz > G, where G is the space of goals mentioned above e an injective mapping O4au : Kini U Kap U K proc U Kee — V, where Y is the space of “attention values” (structures that gauge the importance of paying attention to an item of knowledge over various time-scales or in various contexts) We also assume that the vocabulary of actions contains memory-actions corresponding to the operations of inserting the current observation, goal, reward or action into the episodic and/or declarative memory store. And, we assume that the activity of the agent, at each time-step, includes the enaction of one or more of the procedures in the procedural memory store. If several procedures are enacted at once, then the end result is still formally modeled as a single action a = a4) *--. * @[x) Where * is an operator on action-space that composes multiple actions into a single one. Finally, we assume that, at each time-step, the agent may carry out an external action a; on the environment, a memory action m; on the (long-term) memory, and an action 6; on its internal workspace. Among the actions that can be carried out on the workspace, are the ability to insert or delete observations, goals, actions or reward-values from the workspace. The workspace can be thought of as a sort of short-term memory or else in terms of Baars’ “global workspace” concept mentioned above. The workspace provides a medium for interaction between the different memory types. The workspace provides a mechanism by which declarative, episodic and procedural memory may interact with each other. For this mechanism to work, we must assume that there are actions corresponding to query operations that allow procedures to look into declarative and episodic memory. The nature of these query operations will vary among different agents, but we can assume that in general an agent has e one or more procedures Qpec(x) serving as declarative queries, meaning that when Qpec is enacted on some x that is an ordered set of items in the workspace, the result is that one or more items from declarative memory is entered into the workspace @ one or more procedures Qzp(x) serving as episodic queries, meaning that when Qzp is enacted on some x that is an ordered set of items in the workspace, the result is that one or more items from episodic memory is entered into the workspace HOUSE_OVERSIGHT_013048
7.2 A Simple Formal Agents Model (SRAM) 133 One additional aspect of CogPrime’s knowledge representation that is important to PLN is the attachment of nonnegative weights n; corresponding to elementary observations o;. These weights denote the amount of evidence contained in the observation. For instance, in the context of a robotic agent, one could use these values to encode the assumption that an elementary visual observation has more evidential value than an elementary olfactory observation. We now have a model of an agent with long-term memory comprising procedural, declarative and episodic aspects, an internal cognitive workspace, and the capability to use procedures to drive actions based on items in memory and the workspace, and to move items between long- term memory and the workspace. 7.2.2.1 Modeling CogPrime Of course, this formal model may be realized differently in various real-world AGI systems. In CogPrime we have e a weighted, labeled hypergraph structure called the AtomSpace used to store declarative knowledge (this is the representation used by PLN) e acollection of programs in a LISP-like language called Combo, stored in a ProcedureRepos- itory data structure, used to store procedural knowledge e a collection of partial “movies” of the system’s experience, played back using an internal simulation engine, used to store episodic knowledge e AttentionValue objects, minimally containing ShortTermImportance (STI) and LongTer- mImportance (LTT) values used to store attentional knowledge e Goal Atoms for intentional knowledge, stored in the same format as declarative knowledge but whose dynamics involve a special form of artificial currency that is used to govern action selection The AtomSpace is the central repository and procedures and episodes are linked to Atoms in the AtomSpace which serve as their symbolic representatives. The “workspace” in CogPrime exists only virtually: each item in the AtomSpace has a “short term importance” (STT) level, and the workspace consists of those items in the AtomSpace with highest STI, and those procedures and episodes whose symbolic representatives in the AtomSpace have highest STI. On the other hand, as we saw above, the LIDA architecture uses separate representations for procedural, declarative and episodic memory, but also has an explicit workspace component, where the most currently contextually relevant items from all different types of memory are gathered and used together in the course of actions. However, compared to CogPrime, it lacks comparably fine-grained methods for integrating the different types of memory. Systematically mapping various existing cognitive architectures, or human brain structure, into this formal agents model would be a substantial though quite plausible exercise; but we will not undertake this here. 7.2.3 The Cognitive Schematic Next we introduce an additional specialization into SRAM: the cognitive schematic, written informally as HOUSE_OVERSIGHT_013049
134 7 A Formal Model of Intelligent Agents Context & Procedure — Goal and considered more formally as holds(C) & ex(P) > h, where h may be an externally specified goal g; or an internally specified goal h derived as a (possibly uncertain) subgoal of one of more gi; C is a piece of declarative or episodic knowledge and P is a procedure that the agent can internally execute to generate a series of actions. ex(P) is the proposition that P is successfully executed. If C is episodic then holds(C) may be interpreted as the current context (i.e. some finite slice of the agent’s history) being similar to C; if C is declarative then holds(C) may be interpreted as the truth value of C' evaluated at the current context. Note that C may refer to some part of the world quite distant from the agent’s current sensory observations; but it may still be formally evaluated based on the agent’s history. In the standard CogPrime notation as introduced formally in Chapter 20 (where indentation has function-argument syntax similar to that in Python, and relationship types are prepended to their relata without parentheses), for the case C is declarative this would be written as PredictiveExtensionallmplication AND C Execution P G and in the case C' is episodic one replaces C in this formula with a predicate expressing C’s similarity to the current context. The semantics of the PredictiveExtensionalInheritance relation will be discussed below. The Execution relation simply denotes the proposition that procedure P has been executed. For the class of SRAM agents who (like CogPrime) use the cognitive schematic to govern many or all of their actions, a significant fragment of agent intelligence boils down to estimating the truth values of PredictiveExtensionallmplication relationships. Action selection procedures can be used, which choose procedures to enact based on which ones are judged most likely to achieve the current external goals g; in the current context. Rather than enter into the particularities of action selection or other cognitive architecture issues, we will restrict ourselves to PLN inference, which in the context of the present agent model is a method for handling Predictivelmplication in the cognitive schematic. Consider an agent in a virtual world, such as a virtual dog, one of whose external goals is to please its owner. Suppose its owner has asked it to find a cat, and it can translate this into a subgoal “find cat.” If the agent operates according to the cognitive schematic, it will search for P so that PredictiveExtensionallmplication AND C Execution P Evaluation found cat holds. HOUSE_OVERSIGHT_013050
7.3 Toward a Formal Characterization of Real-World General Intelligence 135 7.3 Toward a Formal Characterization of Real-World General Intelligence Having defined what we mean by an agent acting in an environment, we now turn to the question of what it means for such an agent to be “intelligent.” As we have reviewed extensively in Chapter 2 above, “intelligence” is a commonsense, “folk psychology” concept, with all the imprecision and contextuality that this generally entails. One cannot expect any compact, elegant formalism to capture all of its meanings. Even in the psychology and AI research communities, divergent definitions abound; Legg and Hutter [L107a] lists and organizes 70+ definitions from the literature. Practical study of natural intelligence in humans and other organisms, and practical de- sign, creation and instruction of artificial intelligences, can proceed perfectly well without an agreed-upon formalization of the “intelligence” concept. Some researchers may conceive their own formalisms to guide their own work, others may feel no need for any such thing. But nevertheless, it is of interest to seek formalizations of the concept of intelligence, which capture useful fragments of the commonsense notion of intelligence, and provide guidance for practical research in cognitive science and AI. A number of such formalizations have been given in recent decades, with varying degrees of mathematical rigor. Perhaps the most carefully- wrought formalization of intelligence so far is the theory of “universal intelligence” presented by Shane Legg and Marcus Hutter in [LI07b], which draws on ideas from algorithmic information theory. Universal intelligence captures a certain aspect of the “intelligence” concept very well, and has the advantage of connecting closely with ideas in learning theory, decision theory and computation theory. However, the kind of general intelligence it captures best, is a kind which is in a sense more general in scope than human-style general intelligence. Universal intelligence does capture the sense in which humans are more intelligent than worms, which are more intelligent than rocks; and the sense in which theoretical AGI systems like Hutter’s AIXI or AIXI@ [Hut05] would be much more intelligent than humans. But it misses essential aspects of the intelligence concept as it is used in the context of intelligent natural systems like humans or real-world AI systems. Our main goal in this section is to present variants of universal intelligence that better capture the notion of intelligence as it is typically understood in the context of real-world natural and artificial systems. The first variant we describe is pragmatic general intelligence, which is inspired by the intuitive notion of intelligence as “the ability to achieve complex goals in complex environments,” given in [Goe93a]. After assuming a prior distribution over the space of possible environments, and one over the space of possible goals, one then defines the pragmatic general intelligence as the expected level of goal-achievement of a system relative to these distributions. Rather than measuring truly broad mathematical general intelligence, pragmatic general intelligence measures intelligence in a way that’s specifically biased toward certain environments and goals. Another variant definition is then presented, the efficient pragmatic general intelligence, which takes into account the amount of computational resources utilized by the system in achieving its intelligence. Some argue that making efficient use of available resources is a defining characteristic of intelligence, see e.g. [Wan06]. A critical question left open is the characterization of the prior distributions corresponding to everyday human reality; we give a semi-formal sketch of some ideas on this in Chapter 9 below, where we present the notion of a “communication prior,” which assigns a probability HOUSE_OVERSIGHT_013051
136 7 A Formal Model of Intelligent Agents weight to a situation S based on the ease with which one agent in a society can communicate S to another agent in that society, using multimodal communication (including verbalization, demonstration, dramatic and pictorial depiction, etc.). Finally, we present a formal measure of the “generality” of an intelligence, which precisiates the informal distinction between “general AT’ and “narrow AI.” 7.3.1 Biased Universal Intelligence To define universal intelligence, Legg and Hutter consider the class of environments that are reward-summable, meaning that the total amount of reward they return to any agent is bounded by 1. Where 7; denotes the reward experienced by the agent from the environment at time i, the expected total reward for the agent 7 from the environment j: is defined as Vr=B(S ori) <1 1 To extend their definition in the direction of greater realism, we first introduce a second-order probability distribution v, which is a probability distribution over the space of environments u. The distribution v assigns each environment a probability. One such distribution v is the Solomonoff-Levin universal distribution in which one sets v = 2-*; but this is not the only distribution v of interest. In fact a great deal of real-world general intelligence consists of the adaptation of intelligent systems to particular distributions v over environment-space, differing from the universal distribution. We then define Definition 4 The biased universal intelligence of an agent a is its expected performance with respect to the distribution v over the space of all computable reward-summable environ- ments, E, that is, Y(m) = SO vr pew Legg and Hutter’s universal intelligence is obtained by setting v equal to the universal distribution. This framework is more flexible than it might seem. E.g. suppose one wants to incorporate agents that die. Then one may create a special action, say agge, corresponding to the state of death, to create agents that e in certain circumstances output action ages e have the property that if their previous action was ageg, then all of their subsequent actions must be ages and to define a reward structure so that actions aggg always bring zero reward. It then follows that death is generally a bad thing if one wants to maximize intelligence. Agents that die will not get rewarded after they’re dead; and agents that live only 70 years, say, will be restricted from getting rewards involving long-term patterns and will hence have specific limits on their intelligence. HOUSE_OVERSIGHT_013052
7.3 Toward a Formal Characterization of Real-World General Intelligence 137 7.3.2 Connecting Legg and Hutter’s Model of Intelligent Agents to the Real World A notable aspect of the Legg and Hutter formalism is the separation of the reward mechanism from the cognitive mechanisms of the agent. While commonplace in the reinforcement learning literature, this seems psychologically unrealistic in the context of biological intelligences and many types of machine intelligences. Not all human intelligent activity is specifically reward- seeking in nature; and even when it is, humans often pursue complexly constructed rewards, that are defined in terms of their own cognitions rather than separately given. Suppose a certain human’s goals are true love, or world peace, and the proving of interesting theorems — then these goals are defined by the human herself, and only she knows if she’s achieved them. An externally- provided reward signal doesn’t capture the nature of this kind of goal-seeking behavior, which characterizes much human goal-seeking activity (and will presumably characterize much of the goal-seeking activity of advanced engineered intelligences also) ... let alone human behavior that is spontaneous and unrelated to explicit goals, yet may still appear commonsensically intelligent. One could seek to bypass this complaint about the reward mechanisms via a sort of “neo- Freudian” argument, via @ associating the reward signal, not with the “external environment” as typically conceived, but rather with a portion of the intelligent agent’s brain that is separate from the cognitive component @ viewing complex goals like true love, world peace and proving interesting theorems as in- direct ways of achieving the agent’s “basic goals”, created within the agent’s memory via subgoaling mechanisms but it seems to us that a general formalization of intelligence should not rely on such strong assumptions about agents’ cognitive architectures. So below, after introducing the pragmatic and efficient pragmatic general intelligence measures, we will propose an alternate interpreta- tion wherein the mechanism of external rewards is viewed as a theoretical test framework for assessing agent intelligence, rather than a hypothesis about intelligent agent architecture. In this alternate interpretation, formal measures like the universal, pragmatic and efficient pragmatic general intelligence are viewed as not directly applicable to real-world intelligences, because they involve the behaviors of agents over a wide variety of goals and environments, whereas in real life the opportunities to observe agents are more limited. However, they are viewed as being indirectly applicable to real-world agents, in the sense that an external intelli- gence can observe an agent’s real-world behavior and then infer its likely intelligence according to these measures. In a sense, this interpretation makes our formalized measures of intelligence the opposite of real-world IQ tests. An IQ test is a quantified, formalized test which is designed to approxi- mately predict the informal, qualitative achievement of humans in real life. On the other hand, the formal definitions of intelligence we present here are quantified, formalized tests that are designed to capture abstract notions of intelligence, but which can be approximately evaluated on a real-world intelligent system by observing what it does in real life. HOUSE_OVERSIGHT_013053
138 7 A Formal Model of Intelligent Agents 7.3.3 Pragmatic General Intelligence The above concept of biased universal intelligence is perfectly adequate for many purposes, but it is also interesting to explicitly introduce the notion of a goal into the calculation. This allows us to formally capture the notion presented in [Goe93a] of intelligence as “the ability to achieve complex goals in complex environments.” If the agent is acting in environment jz, and is provided with g, corresponding to g at the start and the end of the time-interval T = {i € (s,...,¢)}, then the expected goal-achievement of the agent, relative to g, during the interval is the expectation t ViiaT = OD T(Ig,s,i)) 1=s where the expectation is taken over all interaction sequences J, ., drawn according to pp. We then propose Definition 5 The pragmatic general intelligence of an agent 7, relative to the distribution vy over environments and the distribution y over goals, is its expected performance with respect to goals drawn from y in environments drawn from v, over the time-scales natural to the goals; that is, HEE GEG,T (in those cases where this sum is convergent). This definition formally captures the notion that “intelligence is achieving complex goals in complex environments,” where “complexity” is gauged by the assumed measures v and 4. If v is taken to be the universal distribution, and + is defined to weight goals according to the universal distribution, then pragmatic general intelligence reduces to universal intelligence. Furthermore, it is clear that a universal algorithmic agent like ATXT [Hut05] would also have a high pragmatic general intelligence, under fairly broad conditions. As the interaction history grows longer, the pragmatic general intelligence of AIXI would approach the theoretical maximum; as AIXI would implicitly infer the relevant distributions via experience. However, if significant reward discounting is involved, so that near-term rewards are weighted much higher than long-term rewards, then AIXI might compare very unfavorably in pragmatic general intelligence, to other agents designed with prior knowledge of v, y and 7 in mind. The most interesting case to consider is where v and ¥ are taken to embody some particular bias in a real-world space of environments and goals, and this bias is appropriately reflected in the internal structure of an intelligent agent. Note that an agent needs not lack universal intelligence in order to possess pragmatic general intelligence with respect to some non-universal distribution over goals and environments. However, in general, given limited resources, there may be a tradeoff between universal intelligence and pragmatic intelligence. Which leads to the next point: how to encompass resource limitations into the definition. One might argue that the definition of Pragmatic General Intelligence is already encompassed by Legg and Hutter’s definition because one may bias the distribution of environments within the latter by considering different Turing machines underlying the Kolmogorov complexity. However this is not a general equivalence because the Solomonoff-Levin measure intrinsically HOUSE_OVERSIGHT_013054
7.3 Toward a Formal Characterization of Real-World General Intelligence 139 decays exponentially, whereas an assumptive distribution over environments might decay at some other rate. This issue seems to merit further mathematical investigation. 7.3.4 Incorporating Computational Cost Let r44,g,7 be a probability distribution describing the amount of computational resources con- sumed by an agent 7 while achieving goal g over time-scale J. This is a probability distribution because we want to account for the possibility of nondeterministic agents. 50, 77,..97r(Q) tells the probability that @ units of resources are consumed. For simplicity we amalgamate space and time resources, energetic resources, etc. into a single number Q, which is assumed to live in some subset of the positive reals. Space resources of course have to do with the size of the system’s memory. Then we may define Definition 6 The efficient pragmatic general intelligence of an agent x with resource consumption Nru,9,7, relative to the distribution v over environments and the distribution + over goals, is its expected performance with respect to goals drawn from y in environments drawn from v, over the time-scales natural to the goals, normalized by the amount of computational effort expended to achieve each goal; that is, _ YL) Y(G, Le w9,T(@) ya Trg) = » ail Q mst Viig.T HEB,GEG,Q,T (in those cases where this sum is convergent). This is a measure that rates an agent’s intelligence higher if it uses fewer computational resources to do its business. Roughly, it measures reward achieved per spacetime computation unit. Note that, by abandoning the universal prior, we have also abandoned the proof of conver- gence that comes with it. In general the sums in the above definitions need not converge; and exploration of the conditions under which they do converge is a complex matter. 7.3.5 Assessing the Intelligence of Real-World Agents The pragmatic and efficient pragmatic general intelligence measures are more “realistic” than the Legg and Hutter universal intelligence measure, in that they take into account the innate biasing and computational resource restrictions that characterize real-world intelligence. But as discussed earlier, they still live in “fantasy-land” to an extent — they gauge the intelligence of an agent via a weighted average over a wide variety of goals and environments; and they presume a simplistic relationship between agents and rewards that does not reflect the complexities of real-world cognitive architectures. It is not obvious from the foregoing how to apply these measures to real-world intelligent systems, which lack the ability to exist in such a wide variety of environments within their often brief lifespans, and mostly go about their lives doing things other than pursuing quantified external rewards. In this brief section we describe an approach to bridging this gap. The treatment is left semi-formal in places. HOUSE_OVERSIGHT_013055
140 7 A Formal Model of Intelligent Agents We suggest to view the definitions of pragmatic and efficient pragmatic general intelligence in terms of a “possible worlds” semantics — i.e. to view them as asking, counterfactually, how an agent would perform, hypothetically, on a series of tests (the tests being goals, defined in relation to environments and reward signals). Real-world intelligent agents don’t normally operate in terms of explicit goals and rewards; these are abstractions that we use to think about intelligent agents. However, this is no objection to characterizing various sorts of intelligence in terms of counterfactuals like: how would system S operate if it were trying to achieve this or that goal, in this or that environment, in order to seek reward? We can characterize various sorts of intelligence in terms of how it can be inferred an agent would perform on certain tests, even though the agent’s real life does not consist of taking these tests. This conceptual approach may seem a bit artificial but we don’t currently see a better alternative, if one wishes to quantitatively gauge intelligence (which is, in a sense, an “artificial” thing to do in the first place). Given a real-world agent X and a mandate to assess its intelligence, the obvious alternative to looking at possible worlds in the manner of the above definitions, is just looking directly at the properties of the things X has achieved in the real world during its lifespan. But this isn’t an easy solution, because it doesn’t disambiguate which aspects of X’s achievements were due to its own actions versus due to the rest of the world that X was interacting with when it made its achievements. To distinguish the amount of achievement that X “caused” via its own actions requires a model of causality, which is a complex can of worms in itself; and, critically, the standard models of causality also involve counterfactuals (asking “what would have been achieved in this situation if the agent X hadn’t been there”, etc.) [MW07]. Regardless of the particulars, it seems impossible to avoid counterfactual realities in assessing intelligence. The approach we suggest — given a real-world agent X with a history of actions in a particular world, and a mandate to assess its intelligence — is to introduce an additional player, an inference agent 6, into the picture. The agent 7 modeled above is then viewed as 7x: the model of X that 6 constructs, in order to explore X’s inferred behaviors in various counterfactual environments. In the test situations embodied in the definitions of pragmatic and efficient pragmatic general intelligence, the environment gives 7x rewards, based on specifically configured goals. In X’s real life, the relation between goals, rewards and actions will generally be significantly subtler and perhaps quite different. We model the real world similarly to the “fantasy world” of the previous section, but with the omission of goals and rewards. We define a naturalistic context as one in which all goals and rewards are constant, i.e. g; = 99 and r; = 7p for all 7. This is just a mathematical convention for stating that there are no precisely-defined external goals and rewards for the agent. In a naturalistic context, we then have a situation where agents create actions based on the past history of actions and perceptions, and if there is any relevant notion of reward or goal, it is within the cognitive mechanism of some agent. A naturalistic agent X is then an agent 7 which is restricted to one particular naturalistic context, involving one particular environment pe (formally, we may achieve this within the framework of agents described above via dictating that X issues constant “null actions” ao in all environments except ;:). Next, we posit a metric space (2, d) of naturalistic agents defined on a naturalistic context involving environment yp, and a subspace A € &’, of inference agents, which are naturalistic agents that output predictions of other agents’ behaviors (a notion we will not fully formalize here). If agents are represented as program trees, then d may be taken as edit distance on tree space [Bil05]. Then, for each agent 6 € A, we may assess HOUSE_OVERSIGHT_013056
7.4 Intellectual Breadth: Quantifying the Generality of an Agent’s Intelligence 141 e the prior probability 0(6) according to some assumed distribution 0 e the effectiveness p(d, X) of 6 at predicting the actions of an agent X € ¥, We may then define Definition 7 The inference ability of the agent 6, relative to and X, is Lyes, sim( X,Y )p(d,Y) Dyes, sim(X, Y) where sim is a specified decreasing function of d(X,Y), such as sim(X,Y) = qu,x (8) = 8(8) 1 I+d(X,Y)° To construct tx, we may then use the model of X created by the agent 6 € A with the highest inference ability relative to 4 and X (using some specified ordering, in case of a tie). Having constructed 7x, we can then say that Definition 8 The inferred pragmatic general intelligence (relative to v and y) of a naturalistic agent X defined relative to an environment ys, is defined as the pragmatic general intelligence of the model rx of X produced by the agent 6 € A with maximal inference ability relative to js (and in the case of a tte, the first of these in the ordering defined over A). The inferred efficient pragmatic general intelligence of X relative to ys is defined similarly. This provides a precise characterization of the pragmatic and efficient pragmatic intelligence of real-world systems, based on their observed behaviors. It’s a bit messy; but the real world tends to be like that. 7.4 Intellectual Breadth: Quantifying the Generality of an Agent’s Intelligence We turn now to a related question: How can one quantify the degree of generality that an intelligent agent possesses? Above we have discussed the qualitative distinction between AGI and “Narrow AI”, and intelligence as we have formalized it above is specifically intended as a measure of general intelligence. But quantifying intelligence is different than quantifying generality versus narrowness. To make the discussion simpler, we introduce the term “context” as a shorthand for “envi- ronment/interval triple (4,9,7).” Given a context (,9,7'), and a set »' of agents, one may construct a fuzzy set Agygr gathering those agents that are intelligent relative to the context; and given a set of contexts, one may also define a fuzzy set Con, gathering those contexts with respect to which a given agent 7 is intelligent. The relevant formulas are: l Nu,9g,T(Q)V io TP XAgu er (@) = XCong (9.7) = 7 Dd, = Q 139 Q where N = N(y,g,T) is a normalization factor defined appropriately, e.g. via N(y,9,T) = max Vi 9,7- One could make similar definitions leaving out the computational cost factor Q, but we suspect that incorporating Q is a more promising direction. We then propose HOUSE_OVERSIGHT_013057
142 7 A Formal Model of Intelligent Agents Definition 9 The intellectual breadth of an agent 7, relative to the distribution v over environments and the distribution y over goals, is H(XGonq (Hs 951) where H is the entropy and V(L)V(G, HK) XConx (Hg, L) S2 (ta) 1(98; Ho) XCon, (Has 981 Tw) (He GB -T.) XConn (u, g; T) = is the probability distribution formed by normalizing the fuzzy set xcon, (1, 9,T). A similar definition of the intellectual breadth of a context (y, 9,7), relative to the distri- bution o over agents, may be posited. A weakness of these definitions is that they don’t try to account for dependencies between agents or contexts; perhaps more refined formulations may be developed that account explicitly for these dependencies. Note that the intellectual breadth of an agent as defined here is largely independent of the (efficient or not) pragmatic general intelligence of that agent. One could have a rather (efficiently or not) pragmatically generally intelligent system with little breadth: this would be a system very good at solving a fair number of hard problems, yet wholly incompetent on a larger number of hard problems. On the other hand, one could also have a terribly (efficiently or not) pragmatically generally stupid system with great intellectual breadth: i.e a system roughly equally dumb in all contexts! Thus, one can characterize an intelligent agent as “narrow” with respect to distribution v over environments and the distribution + over goals, based on evaluating it as having low intellectual breadth. A “narrow AI” relative to v and y would then be an AI agent with a relatively high efficient pragmatic general intelligence but a relatively low intellectual breadth. 7.5 Conclusion Our main goal in this chapter has been to push the formal understanding of intelligence in a more pragmatic direction. Much more work remains to be done, e.g. in specifying the environment, goal and efficiency distributions relevant to real-world systems, but we believe that the ideas presented here constitute nontrivial progress. If the line of research suggested in this chapter succeeds, then eventually, one will be able to do AGI research as follows: Specify an AGI architecture formally, and then use the mathematics of general intelligence to derive interesting results about the environments, goals and hardware platforms relative to which the AGI architecture will display significant pragmatic or efficient pragmatic general intelligence, and intellectual breadth. The remaining chapters in this section present further ideas regarding how to work toward this goal. For the time being, such a mode of AGI research remains mainly for the future, but we have still found the formalism given in these chapters useful for formulating and clarifying various aspects of the CogPrime design as will be presented in later chapters. HOUSE_OVERSIGHT_013058
Chapter 8 Cognitive Synergy 8.1 Cognitive Synergy As we have seen, the formal theory of general intelligence, in its current form, doesn’t really tell us much that’s of use for creating real-world AGI systems. It tells us that creating extraor- dinarily powerful general intelligence is almost trivial if one has unrealistically huge amounts of computational resources; and that creating moderately powerful general intelligence using feasible computational resources is all about creating AI algorithms and data structures that (explicitly or implicitly) match the restrictions implied by a certain class of situations, to which the general intelligence is biased. We've also described, in various previous chapters, some non-rigorous, conceptual principles that seem to explain key aspects of feasible general intelligence: the complementary reliance on evolution and autopoiesis, the superposition of hierarchical and heterarchical structures, and so forth. These principles can be considered as broad strategies for achieving general intelligence in certain broad classes of situations. Although, a lot of research needs to be done to figure out nice ways to describe, for instance, in what class of situations evolution is an effective learning strategy, in what class of situations dual hierarchical/heterarchical structure is an effective way to organize memory, etc. In this chapter we'll dig deeper into one of the “general principle of feasible general intel- ligences” briefly alluded to earlier: the cognitive synergy principle, which is both a conceptual hypothesis about the structure of generally intelligent systems in certain classes of environments, and a design principle used to guide the architecting of CogPrime. We will focus here on cognitive synergy specifically in the case of “multi-memory systems,” which we define as intelligent systems (like CogPrime) whose combination of environment, embodiment and motivational systems make it important for them to possess memories that divide into partially but not wholly distinct components corresponding to the categories of: Declarative memory Procedural memory (memory about how to do certain things) Sensory and episodic memory Attentional memory (knowledge about what to pay attention to in what contexts Intentional memory (knowledge about the system’s own goals and subgoals) In Chapter 9 below we present a detailed argument as to how the requirement for a multi- memory underpinning for general intelligence emerges from certain underlying assumptions 143 HOUSE_OVERSIGHT_013059
144 8 Cognitive Synergy regarding the measurement of the simplicity of goals and environments; but the points made here do not rely on that argument. What they do rely on is the assumption that, in the intelligence in question, the different components of memory are significantly but not wholly distinct. That is, there are significant “family resemblances” between the memories of a single type, yet there are also thoroughgoing connections between memories of different types. The cognitive synergy principle, if correct, applies to any AI system demonstrating intelli- gence in the context of embodied, social communication. However, one may also take the theory as an explicit guide for constructing AGI systems; and of course, the bulk of this book describes one AGI architecture, CogPrime, designed in such a way. It is possible to cast these notions in mathematical form, and we make some efforts in this direction in Appendix ??, using the languages of category theory and information geometry. However, this formalization has not yet led to any rigorous proof of the generality of cognitive synergy nor any other exciting theorems; with luck this will come as the mathematics is further developed. In this chapter the presentation is kept on the heuristic level, which is all that is critically needed for motivating the CogPrime design. 8.2 Cognitive Synergy The essential idea of cognitive synergy, in the context of multi-cmemory systems, may be ex- pressed in terms of the following points: 1. Intelligence, relative to a certain set of environments, may be understood as the capability to achieve complex goals in these environments. 2. With respect to certain classes of goals and environments (see Chapter 9 for a hypothe- sis in this regard), an intelligent system requires a “multi-memory” architecture, meaning the possession of a number of specialized yet interconnected knowledge types, including: declarative, procedural, attentional, sensory, episodic and intentional (goal-related). These knowledge types may be viewed as different sorts of patterns that a system recognizes in itself and its environment. Knowledge of these various different types must be interlinked, and in some cases may represent differing views of the same content (see Figure ?7) 3. Such a system must possess knowledge creation (i.e. pattern recognition / formation) mech- anisms corresponding to each of these memory types. These mechanisms are also called “cognitive processes.” 4, Each of these cognitive processes, to be effective, must have the capability to recognize when it lacks the information to perform effectively on its own; and in this case, to dynamically and interactively draw information from knowledge creation mechanisms dealing with other types of knowledge 5. This cross-mechanism interaction must have the result of enabling the knowledge creation mechanisms to perform much more effectively in combination than they would if operated non-interactively. This is “cognitive synergy.” While these points are implicit in the theory of mind given in [GoeN6al], they are not articulated in this specific form there. Interactions as mentioned in Points 4 and 5 in the above list are the real conceptual meat of the cognitive synergy idea. One way to express the key idea here is that most AI algorithms suffer from combinatorial explosions: the number of possible elements to be combined in a HOUSE_OVERSIGHT_013060
8.2 Cognitive Synergy 145 nn DECLARATIVE KNOWLEDGE - —>= Low ; on | do = - Sf . redticg, a g i Ve yy, Beintain INTENTIONAL v & —_ Evaluation "Dlicany Appropriate| 1 NTENTIONA Predictive Implication — n Energy Find Entity Near / Hebbian Link my Enea ) Current Location) Wk oe y, Lost Evaluation Link ( battery ) an get My Energy PROCEDURAL KNOWLEDGE EPISODIC/SENSORY KNOWLEDGE get Current Location batter / Fig. 8.1: Illustrative example of the interactions between multiple types of knowledge, in repre- senting a simple piece of knowledge. Generally speaking, one type of knowledge can be converted to another, at the cost of some loss of information. The synergy between cognitive processes associated with corresponding pieces of knowledge, possessing different type, is a critical aspect of general intelligence. synthesis or analysis is just too great, and the algorithms are unable to filter through all the possibilities, given the lack of intrinsic constraint that comes along with a “general intelligence” context (as opposed to a narrow-AI problem like chess-playing, where the context is constrained and hence restricts the scope of possible combinations that needs to be considered). In an AGI architecture based on cognitive synergy, the different learning mechanisms must be designed specifically to interact in such a way as to palliate each others’ combinatorial explosions - so that, for instance, each learning mechanism dealing with a certain sort of knowledge, must synergize with learning mechanisms dealing with the other sorts of knowledge, in a way that decreases the severity of combinatorial explosion. One prerequisite for cognitive synergy to work is that each learning mechanism must. rec- ognize when it is “stuck,” meaning it’s in a situation where it has inadequate information to make a confident judgment about what steps to take next. Then, when it does recognize that it’s stuck, it may request help from other, complementary cognitive mechanisms. A theoretical notion closely related to cognitive synergy is the cognitive schematic, formalized in Chapter 7 above, which states that the activity of the different cognitive processes involved in an intelligent system may be modeled in terms of the schematic implication Context \ Procedure — Goal HOUSE_OVERSIGHT_013061
146 8 Cognitive Synergy where the Context involves sensory, episodic and/or declarative knowledge; and attentional knowledge is used to regulate how much resource is given to each such schematic implication in memory. Synergy among the learning processes dealing with the context, the procedure and the goal is critical to the adequate execution of the cognitive schematic using feasible computational resources. Finally, drilling a little deeper into Point 3 above, one arrives at a number of possible knowl- edge creation mechanisms (cognitive processes) corresponding to each of the key types of knowl- edge. Figure ?? below gives a high-level overview of the main types of cognitive process con- sidered in the current version of Cognitive Synergy Theory, categorized according to the type of knowledge with which each process deals. 8.3 Cognitive Synergy in CogPrime Different cognitive systems will use different processes to fulfill the various roles identified in Figure ?? above. Here we briefly preview the basic cognitive processes that the CogPrime AGI design uses for these roles, and the synergies that exist between these. 8.3.1 Cognitive Processes in CogPrime : a Cognitive Synergy Based Architecture..." from ICCI 2009 Table 8.1: default |Table will go here| Table 8.2: The OpenCogPrime data structures used to represent the key knowledge types in- volved Table 8.3: default |Table will go here| Table 8.4: Key cognitive processes, and the algorithms that play their roles in CogPrime Tables 8.1 and 8.3 present the key structures and processes involved in CogPrime, identifying each one with a certain memory/process type as considered in cognitive synergy theory. That is: each of these cognitive structures or processes deals with one or more types of memory — declarative, procedural, sensory, episodic or attentional. Table 8.5 describes the key CogPrime HOUSE_OVERSIGHT_013062
8.3 Cognitive Synergy in CogPrime Beclirallve Memory 147 Cognitive Processes Senserimoter Memory Associated with Types ; af Memory Body map tor haptics & hinesthetics, hierarchical memory for vision, mic. Chand Gams lig A Syatom Cantral | ay Procedural Memory (Laarang Of a pagers pees a Tareas Function” laren Episodic Memory intemal Simulation of historical and hypothetical extamal eeents Fig. 8.2: High-level overview of the key cognitive dynamics considered here in the context of cognitive synergy. The cognitive synergy principle describes the behavior of a system as it pursues a set of goals (which in most cases may be assumed to be supplied to the system “a priori’, but then refined by inference and other processes). The assumed intelligent agent model is roughly as follows: At each time the system chooses a set of procedures to execute, based on its judgments regarding which procedures will best help it achieve its goals in the current context. These procedures may involve external actions (e.g. involving conversation, or controlling an agent in a simulated world) and/or internal cognitive actions. In order to make these judgments it must effectively manage declarative, procedural, episodic, sensory and attentional memory, each of which is associated with specific algorithms and structures as depicted in the diagram. There are also global processes spanning all the forms of memory, including the allocation of attention to different memory items and cognitive processes, and the identification and reification of system-wide activity patterns (the latter referred to as “map formation”) Table 8.5: default |Table will go here| Table 8.6: Key OpenCogPrime cognitive processes categorized according to knowledge type and process type HOUSE_OVERSIGHT_013063
148 8 Cognitive Synergy processes in terms of the “analysis vs. synthesis” distinction. Finally, Tables ?? and ?? exemplify these structures and processes in the context of embodied virtual agent control. In the CogPrime context, a procedure in this cognitive schematic is a program tree stored in the system’s procedural knowledge base; and a context is a (fuzzy, probabilistic) logical predicate stored in the AtomSpace, that holds, to a certain extent, during each interval of time. A goal is a fuzzy logical predicate that has a certain value at each interval of time, as well. Attentional knowledge is handled in CogPrime by the ECAN artificial economics mechanism, that continually updates ShortTermImportance and LongTerm Importance values associated with each item in the CogPrime system’s memory, which control the amount of attention other cognitive mechanisms pay to the item, and how much motive the system has to keep the item in memory. HebbianLinks are then created between knowledge items that often possess ShortTermImportance at the same time; this is CogPrime’s version of traditional Hebbian learning. ECAN has deep interactions with other cognitive mechanisms as well, which are essential to its efficient operation; for instance, PLN inference may be used to help ECAN extrapolate conclusions about what is worth paying attention to, and MOSES may be used to recognize subtle attentional patterns. ECAN also handles “assignment of credit”, the figuring-out of the causes of an instance of successful goal-achievement, drawing on PLN and MOSES as needed when the causal inference involved here becomes difficult. The synergies between CogPrime’s cognitive processes are well summarized below, which is a 16x16 matrix summarizing a host of interprocess interactions generic to CST. One key aspect of how CogPrime implements cognitive synergy is PLN’s sophisticated man- agement of the confidence of judgments. This ties in with the way OpenCogPrime’s PLN in- ference framework represents truth values in terms of multiple components (as opposed to the single probability values used in many probabilistic inference systems and formalisms): each item in OpenCogPrime’s declarative memory has a confidence value associated with it, which tells how much weight the system places on its knowledge about that memory item. This assists with cognitive synergy as follows: A learning mechanism may consider itself “stuck”, generally speaking, when it has no high-confidence estimates about the next step it should take. Without reasonably accurate confidence assessment to guide it, inter-component interaction could easily lead to increased rather than decreased combinatorial explosion. And of course there is an added recursion here, in that confidence assessment is carried out partly via PLN inference, which in itself relies upon these same synergies for its effective operation. To illustrate this point further, consider one of the synergetic aspects described in ?? below: the role cognitive synergy plays in deductive inference. Deductive inference is a hard problem in general - but what is hard about it is not carrying out inference steps, but rather “inference control” (i.e., choosing which inference steps to carry out). Specifically, what must happen for deduction to succeed in CogPrime is: 1. the system must recognize when its deductive inference process is “stuck”, i.e. when the PLN inference control mechanism carrying out deduction has no clear idea regarding which inference step(s) to take next, even after considering all the domain knowledge at is disposal 2. in this case, the system must defer to another learning mechanism to gather more informa- tion about the different choices available - and the other learning mechanism chosen must, a reasonable percentage of the time, actually provide useful information that helps PLN to get “unstuck” and continue the deductive process HOUSE_OVERSIGHT_013064
8.4 Some Critical Synergies 149 For instance, deduction might defer to the “attentional knowledge” subsystem, and make a judgment as to which of the many possible next deductive steps are most associated with the goal of inference and the inference steps taken so far, according to the HebbianLinks con- structed by the attention allocation subsystem, based on observed associations. Or, if this fails, deduction might ask MOSES (running in supervised categorization mode) to learn predicates characterizing some of the terms involving the possible next inference steps. Once MOSES pro- vides these new predicates, deduction can then attempt to incorporate these into its inference process, hopefully (though not necessarily) arriving at a higher-confidence next step. 8.4 Some Critical Synergies Referring back to Figure ??, and summarizing many of the ideas in the previous section, Table ?? enumerates a number of specific ways in which the cognitive processes mentioned in the Figure may synergize with one another, potentially achieving dramatically greater efficiency than would be possible on their own. Of course, realizing these synergies on the practical algorithmic level requires significant inventiveness and may be approached in many different ways. The specifics of how CogPrime manifests these synergies are discussed in many following chapters. ver —= Secncrimator | pattern recognition Ww ; = 5 as | Cron tn ful a Linceriain inferewd « ci ope Supearciaed procedure inarning cmndidats procedures Misntan allacalian erating AUK. bs fined allenthoral atari iveniving theme nokew ea airs ee eg we aaledd Fig. 8.3: This table, and the following ones, show some of the synergies between the primary cognitive processes explicitly used in CogPrime. HOUSE_OVERSIGHT_013065
150 infetetce tat be uber bo low price eupariancoe tn qeitis gach Pratarce of Poetire Searing Erapies inference ol row HenteasLinas ar HibhaPracicwted inom eEning one Bilcrarn brteereentunl aunonemerd of the value ol new Donceprn Inference can carry out goal refinement ‘Speculative inference fenipa fl in SPs @ Gemiofy Gala When inderence gets funk 6 an inference Wail, Hoan ask procedure learning ts lean new oer regarding COOEH wi he efeence fra @f thee in adequate: dau reqerdeg he Saad Precedurs bewrung can Our pHa i Pawar ad apa ac tihviby: whith are then uued ta mak oncepe and elalonenhips Quang aitenSon aliccaton Precedors iaaming can be uned im meerch for Pagh-quabty blends of sniming Concept (uming ag. inherersial and ana eorial krermieeige be fhe fereaa furemend) Procedure kaming ca ba ned to aearch for Map Phat ane more Sopa Ian mere “eo OncuITenoe” Procedure aming can kee mars ts fies) wre palterna manscemolo data imgorance levels alow Busing of inference reas Irgorbince leven may Be uted ie baad Shakes made in a cour cf Pobedune teaming fen OCP, the Airman eyaiuaton are Papen Dinka ree of WOE) ABSNRON aliocabon peovidies the rw data for map ermaon Flow of importance among subgoais selermitas which SuDSpOas. gad used werd being forgotten AtenGon allocaton qeidea pattern Setogneion ya indicating weich SeenON Shor a betel and paarta tend in he Se ROCoMte ney lice 8 Cognitive Synergy Provathed how concepts. leering compactr Pegs vein new coreg hi in serous rotme Concept creation can be used io provide raw dala tor goal relinement (9. a now subgoal that blends two others) Hew concepts may ba ‘stamle-d thal than ate found to bere a mgnficant pater in feronmsiot data HOUSE_OVERSIGHT_013066
8.5 The Cognitive Schematic 151 Map Toreanon Goal system Semnorimotr pattern Pecogmition Map tormanion ! Map formation may No significant Ao mignsicant cirecr ayrengy foe nding mapE. dine! ayrakgy rela wubpoule, “1 col i i iormied for ir We wierd fk : aero Concepis jon Senscaimator pabiern recognition real pensorimotor al 8.5 The Cognitive Schematic Now we return to the “cognitive schematic” notion, according to which various cognitive pro- cesses involved in intelligence may be understood to work together via the implication Context \ Procedure - Goal < p > (summarized C A P > G). Semi-formally, this implication may be interpreted to mean: “If the context C’ appears to hold currently, then if I enact the procedure P, I can expect to achieve the goal G with certainty p.” The cognitive schematic leads to a conceptualization of the internal action of an intelligent system as involving two key categories of learning: e Analysis: Estimating the probability p of a posited C A P > G relationship e Synthesis: Filling in one or two of the variables in the cognitive schematic, given as- sumptions regarding the remaining variables, and directed by the goal of maximizing the probability of the cognitive schematic More specifically, where synthesis is concerned, some key examples are: e The MOSES probabilistic evolutionary program learning algorithm is applied to find P, given fixed C' and G. Internal simulation is also used, for the purpose of creating a simulation embodying C and seeing which P lead to the simulated achievement of G. — Example: A virtual dog learns a procedure P to please its owner (the goal G) in the conterzt C where there is a ball or stick present and the owner is saying “fetch”. e PLN inference, acting on declarative knowledge, is used for choosing C, given fixed P and G (also incorporating sensory and episodic knowledge as appropriate). Simulation may also be used for this purpose. HOUSE_OVERSIGHT_013067
152 8 Cognitive Synergy — Example: A virtual dog wants to achieve the goal G of getting food, and it knows that the procedure P of begging has been successful at this before, so it seeks a context C where begging can be expected to get it food. Probably this will be a context involving a friendly person. e PLN-based goal refinement is used to create new subgoals G to sit on the right hand side of instances of the cognitive schematic. — Example: Given that a virtual dog has a goal of finding food, it may learn a subgoal of following other dogs, due to observing that other dogs are often heading toward their food. e Concept formation heuristics are used for choosing G and for fueling goal refinement, but especially for choosing C (via providing new candidates for C). They are also used for choosing P, via a process called “predicate schematization” that turns logical predicates (declarative knowledge) into procedures. — Example: At first a virtual dog may have a hard time predicting which other dogs are going to be mean to it. But it may eventually observe common features among a number of mean dogs, and thus form its own concept of “pit bull,” without anyone ever teaching it this concept explicitly. Where analysis is concerned: e PLN inference, acting on declarative knowledge, is used for estimating the probability of the implication in the cognitive schematic, given fixed C, P and G. Episodic knowledge is also used this regard, via enabling estimation of the probability via simple similarity matching against past experience. Simulation is also used: multiple simulations may be run, and statistics may be captured therefrom. — Example: To estimate the degree to which asking Bob for food (the procedure P is “asking for food”, the context C is “being with Bob”) will achieve the goal G of getting food, the virtual dog may study its memory to see what happened on previous occasions where it or other dogs asked Bob for food or other things, and then integrate the evidence from these occasions. Procedural knowledge, mapped into declarative knowledge and then acted on by PLN in- ference, can be useful for estimating the probability of the implication C A P > G, in cases where the probability of C A P, > G is known for some P, related to P. — Example: knowledge of the internal similarity between the procedure of asking for food and the procedure of asking for toys, allows the virtual dog to reason that if asking Bob for toys has been successful, maybe asking Bob for food will be successful too. Inference, acting on declarative or sensory knowledge, can be useful for estimating the probability of the implication C A P > G, in cases where the probability of Cy A P > G is known for some C} related to C. — Example: if Bob and Jim have a lot of features in common, and Bob often responds positively when asked for food, then maybe Jim will too. Inference can be used similarly for estimating the probability of the implication CA P > G, in cases where the probability of C A P > G, is known for some G, related to G. Concept HOUSE_OVERSIGHT_013068
8.6 Cognitive Synergy for Procedural and Declarative Learning 153 creation can be useful indirectly in calculating these probability estimates, via providing new concepts that can be used to make useful inference trails more compact and hence easier to construct. — Example: The dog may reason that because Jack likes to play, and Jack and Jill are both children, maybe Jill likes to play too. It can carry out this reasoning only if its concept creation process has invented the concept of “child” via analysis of observed data. In these examples we have focused on cases where two terms in the cognitive schematic are fixed and the third must be filled in; but just as often, the situation is that only one of the terms is fixed. For instance, if we fix G, sometimes the best approach will be to collectively learn C' and P. This requires either a procedure learning method that works interactively with a declarative-knowledge-focused concept learning or reasoning method; or a declarative learning method that works interactively with a procedure learning method. That is, it requires the sort of cognitive synergy built into the CogPrime design. 8.6 Cognitive Synergy for Procedural and Declarative Learning We now present a little more algorithmic detail regarding the operation and synergetic in- teraction of CogPrime’s two most sophisticated components: the MOSES procedure learning algorithm (see Chapter 33), and the PLN uncertain inference framework (see Chapter 34). The treatment is necessarily quite compact, since we have not yet reviewed the details of either MOSES or PLN; but as well as illustrating the notion of cognitive synergy more concretely, perhaps the high-level discussion here will make clearer how MOSES and PLN fit into the big picture of CogPrime. 8.6.1 Cognitive Synergy in MOSES MOSES, CogPrime’s primary algorithm for learning procedural knowledge, has been tested on a variety of application problems including standard GP test problems, virtual agent control, biological data analysis and text classification [Loo06]. It represents procedures internally as program trees. Each node in a MOSES program tree is supplied with a “knob,” comprising a set of values that may potentially be chosen to replace the data item or operator at that node. So for instance a node containing the number 7 may be supplied with a knob that can take on any integer value. A node containing a while loop may be supplied with a knob that can take on various possible control flow operators including conditionals or the identity. A node containing a procedure representing a particular robot movement, may be supplied with a knob that can take on values corresponding to multiple possible movements. Following a metaphor suggested by Douglas Hofstadter [Hof96], MOSES learning covers both “knob twiddling” (setting the values of knobs) and “knob creation.” MOSES is invoked within CogPrime in a number of ways, but most commonly for finding a procedure P satisfying a probabilistic implication C& P - G as described above, where C is an observed context and G is a system goal. In this case the probability value of the implication provides the “scoring function” that MOSES uses to assess the quality of candidate procedures. HOUSE_OVERSIGHT_013069
154 8 Cognitive Synergy ' Representation-Building \ } | Random Sampling | ; + I \ | \ \ | Scorins ; & Optimization ! Fig. 8.4: High-Level Control Flow of MOSES Algorithm For example, suppose an CogPrime -controlled robot is trying to learn to play the game of “tag." (Le. a multi-agent game in which one agent is specially labeled "it", and runs after the other player agents, trying to touch them. Once another agent is touched, it becomes the new "it" and the previous "it" becomes just another player agent.) Then its context C is that others are trying to play a game they call “tag” with it; and we may assume its goals are to please them and itself, and that it has figured out that in order to achieve this goal it should learn some procedure to follow when interacting with others who have said they are playing “tag.” In this case a potential tag-playing procedure might contain nodes for physical actions like step forward(speed s), as well as control flow nodes containing operators like ifelse (for instance, there would probably be a conditional telling the robot to do something different depending on whether someone seems to be chasing it). Each of these program tree nodes would have an appropriate knob assigned to it. And the scoring function would evaluate a procedure P in terms of how successfully the robot played tag when controlling its behaviors according to P (noting that it may also be using other control procedures concurrently with P). It’s worth noting here that evaluating the scoring function in this case involves some inference already, because in order to tell if it is playing tag successfully, in a real-world context, it must watch and understand the behavior of the other players. MOSES follows the high-level control flow depicted in Figure 8.4, which corresponds to the following process for evolving a metapopulation of “demes“ of programs (each deme being a set of relatively similar programs, forming a sort of island in program space): 1. Construct an initial set of knobs based on some prior (e.g., based on an empty program; or more interestingly, using prior knowledge supplied by PLN inference based on the system’s memory) and use it to generate an initial random sampling of programs. Add this deme to the metapopulation. 2. Select a deme from the metapopulation and update its sample, as follows: HOUSE_OVERSIGHT_013070
8.6 Cognitive Synergy for Procedural and Declarative Learning 155 a. Select some promising programs from the deme’s existing sample to use for modeling, according to the scoring function. b. Considering the promising programs as collections of knob settings, generate new collec- tions of knob settings by applying some (competent) optimization algorithm. For best performance on difficult problems, it is important to use an optimization algorithm that makes use of the system’s memory in its choices, consulting PLN inference to help estimate which collections of knob settings will work best. c. Convert the new collections of knob settings into their corresponding programs, re- duce the programs to normal form, evaluate their scores, and integrate them into the deme’s sample, replacing less promising programs. In the case that scoring is expensive, score evaluation may be preceded by score estimation, which may use PLN inference, enaction of procedures in an internal simulation environment, and/or similarity matching against episodic memory. 3. For each new program that meet the criterion for creating a new deme, if any: a. Construct a new set of knobs (a process called “representation-building”) to define a region centered around the program (the deme’s exemplar), and use it to generate a new random sampling of programs, producing a new deme. b. Integrate the new deme into the metapopulation, possibly displacing less promising demes. 4, Repeat from step 2. MOSES is a complex algorithm and each part plays its role; if any one part is removed the performance suffers significantly [Loo06]. However, the main point we want to highlight here is the role played by synergetic interactions between MOSES and other cognitive components such as PLN, simulation and episodic memory, as indicated in boldface in the above pseudocode. MOSES is a powerful procedure learning algorithm, but used on its own it runs into scalability problems like any other such algorithm; the reason we feel it has potential to play a major role in a human-level AI system is its capacity for productive interoperation with other cognitive components. Continuing the “tag” example, the power of MOSES’s integration with other cognitive pro- cesses would come into play if, before learning to play tag, the robot has already played simpler games involving chasing. If the robot already has experience chasing and being chased by other agents, then its episodic and declarative memory will contain knowledge about how to pursue and avoid other agents in the context of running around an environment full of objects, and this knowledge will be deployable within the appropriate parts of MOSES’s Steps 1 and 2. Cross- process and cross-memory-type integration make it tractable for MOSES to act as a “transfer learning” algorithm, not just a task-specific machine-learning algorithm. 8.6.2 Cognitive Synergy in PLN While MOSES handles much of CogPrime’s procedural learning, and OpenCogPrimes inter- nal simulation engine handles most episodic knowledge, CogPrime’s primary tool for handling declarative knowledge is an uncertain inference framework called Probabilistic Logic Networks (PLN). The complexities of PLN are the topic of a lengthy technical monograph [GMIT108], and HOUSE_OVERSIGHT_013071
156 8 Cognitive Synergy here we will eschew most details and focus mainly on pointing out how PLN seeks to achieve efficient inference control via integration with other cognitive processes. As a logic, PLN is broadly integrative: it combines certain term logic rules with more standard predicate logic rules, and utilizes both fuzzy truth values and a variant of imprecise probabilities called indefinite probabilities. PLN mathematics tells how these uncertain truth values propagate through its logic rules, so that uncertain premises give rise to conclusions with reasonably accurately estimated uncertainty values. This careful management of uncertainty is critical for the application of logical inference in the robotics context, where most knowledge is abstracted from experience and is hence highly uncertain. PLN can be used in either forward or backward chaining mode; and in the language intro- duced above, it can be used for either analysis or synthesis. As an example, we will consider backward chaining analysis, exemplified by the problem of a robot preschoolstudent trying to determine whether a new playmate “Bob” is likely to be a regular visitor to its preschool or not (evaluating the truth value of the implication Bob > regular_visitor). The basic backward chaining process for PLN analysis looks like: 1. Given an implication L = A — B whose truth value must be estimated (for instance L = C&P = G as discussed above), create a list (A1,...,An) of (inference rule, stored knowledge) pairs that might be used to produce L 2. Using analogical reasoning to prior inferences, assign each A; a probability of success e If some of the A; are estimated to have reasonable probability of success at generating reasonably confident estimates of D’s truth value, then invoke Step 1 with A; in place of L (at this point the inference process becomes recursive) e If none of the A; looks sufficiently likely to succeed, then inference has “gotten stuck” and another cognitive process should be invoked, e.g. — Concept creation may be used to infer new concepts related to A and B, and then Step 1 may be revisited, in the hope of finding a new, more promising A; involving one of the new concepts — MOSES may be invoked with one of several special goals, e.g. the goal of finding a procedure P so that P(X) predicts whether X — B. If MOSES finds such a procedure P then this can be converted to declarative knowledge understandable by PLN and Step 1 may be revisited.... — Simulations may be run in CogPrime’s internal simulation engine, so as to observe the truth value of A > B in the simulations; and then Step 1 may be revisited.... The combinatorial explosion of inference control is combatted by the capability to defer to other cognitive processes when the inference control procedure is unable to make a sufficiently confident choice of which inference steps to take next. Note that just as MOSES may rely on PLN to model its evolving populations of procedures, PLN may rely on MOSES to create complex knowledge about the terms in its logical implications. This is just one example of the multiple ways in which the different cognitive processes in CogPrime interact synergetically; a more thorough treatment of these interactions is given in Chapter 49. In the “new playmate” example, the interesting case is where the robot initially seems not to know enough about Bob to make a solid inferential judgment (so that none of the A; seem particularly promising). For instance, it might carry out a number of possible inferences and not come to any reasonably confident conclusion, so that the reason none of the A; seem promising is that all the decent-looking ones have been tried already. So it might then recourse to MOSES, simulation or concept creation. HOUSE_OVERSIGHT_013072
8.7 Is Cognitive Synergy Tricky? 157 For instance, the PLN controller could make a list of everyone who has been a regular visitor, and everyone who has not been, and pose MOSES the task of figuring out a procedure for distinguishing these two categories. This procedure could then used directly to make the needed assessment, or else be translated into logical rules to be used within PLN inference. For example, perhaps MOSES would discover that older males wearing ties tend not to become regular visitors. If the new playmate is an older male wearing a tie, this is directly applicable. But if the current playmate is wearing a tuxedo, then PLN may be helpful via reasoning that even though a tuxedo is not a tie, it’s a similar form of fancy dress — so PLN may extend the MOSES-learned rule to the present case and infer that the new playmate is not likely to be a regular visitor. 8.7 Is Cognitive Synergy Tricky? In this section we use the notion of cognitive synergy to explore a question that arises frequently in the AGI community: the well-known difficulty of measuring intermediate progress toward human-level AGI. We explore some potential reasons underlying this, via extending the notion of cognitive synergy to a more refined notion of "tricky cognitive synergy." These ideas are particularly relevant to the problem of creating a roadmap toward AGI, as we'll explore in Chapter 17 below. 8.7.1 The Puzzle: Why Is It So Hard to Measure Partial Progress Toward Human-Level AGI? It’s not entirely straightforward to create tests to measure the final achievement of human-level AGI, but there are some fairly obvious candidates here. There’s the Turing Test (fooling judges into believing you’re human, in a text chat), the video Turing Test, the Robot College Student test (passing university, via being judged exactly the same way a human student would), etc. There’s certainly no agreement on which is the most meaningful such goal to strive for, but there’s broad agreement that a number of goals of this nature basically make sense. On the other hand, how does one measure whether one is, say, 50 percent of the way to human-level AGI? Or, say, 75 or 25 percent? It’s possible to pose many "practical tests" of incremental progress toward human-level AGI, with the property that if a proto-AGI system passes the test using a certain sort of architecture and/or dynamics, then this implies a certain amount of progress toward human-level AGI based on particular theoretical assumptions about AGI. However, in each case of such a practical test, it seems intuitively likely to a significant percentage of AGI researchers that there is some way to "game" the test via designing a system specifically oriented toward passing that test, and which doesn’t constitute dramatic progress toward AGI. Some examples of practical tests of this nature would be 1 This section co-authored with Jared Wigmore HOUSE_OVERSIGHT_013073
158 8 Cognitive Synergy e The Wozniak "coffee test": go into an average American house and figure out how to make coffee, including identifying the coffee machine, figuring out what the buttons do, finding the coffee in the cabinet, etc. e Story understanding — reading a story, or watching it on video, and then answering questions about what happened (including questions at various levels of abstraction) e Graduating (virtual-world or robotic) preschool e Passing the elementary school reading curriculum (which involves reading and answering questions about some picture books as well as purely textual ones) e Learning to play an arbitrary video game based on experience only, or based on experience plus reading instructions One interesting point about tests like this is that each of them seems to some AGI researchers to encapsulate the crux of the AGI problem, and be unsolvable by any system not far along the path to human-level AGI — yet seems to other AGI researchers, with different conceptual perspectives, to be something probably game-able by narrow-AI methods. And of course, given the current state of science, there’s no way to tell which of these practical tests really can be solved via a narrow-AI approach, except by having a lot of people try really hard over a long period of time. A question raised by these observations is whether there is some fundamental reason why it’s hard to make an objective, theory-independent measure of intermediate progress toward advanced AGI. Is it just that we haven’t been smart enough to figure out the right test — or is there some conceptual reason why the very notion of such a test is problematic? We don’t claim to know for sure — but in the rest of this section we’ll outline one possible reason why the latter might be the case. 8.7.2 A Possible Answer: Cognitive Synergy is Tricky! Why might a solid, objective empirical test for intermediate progress toward AGI be an in- feasible notion? One possible reason, we suggest, is precisely cognitive synergy, as discussed above. The cognitive synergy hypothesis, in its simplest form, states that human-level AGI in- trinsically depends on the synergetic interaction of multiple components (for instance, as in CogPrime, multiple memory systems each supplied with its own learning process). In this hy- pothesis, for instance, it might be that there are 10 critical components required for a human- level AGI system. Having all 10 of them in place results in human-level AGI, but having only 8 of them in place results in having a dramatically impaired system — and maybe having only 6 or 7 of them in place results in a system that can hardly do anything at all. Of course, the reality is almost surely not as strict as the simplified example in the above paragraph suggests. No AGI theorist has really posited a list of 10 crisply-defined subsystems and claimed them necessary and sufficient for AGI. We suspect there are many different routes to AGI, involving integration of different sorts of subsystems. However, if the cognitive synergy hypothesis is correct, then human-level AGI behaves roughly like the simplistic example in the prior paragraph suggests. Perhaps instead of using the 10 components, you could achieve human- level AGI with 7 components, but having only 5 of these 7 would yield drastically impaired functionality — etc. Or the point could be made without any decomposition into a finite set of components, using continuous probability distributions. To mathematically formalize the HOUSE_OVERSIGHT_013074
8.7 Is Cognitive Synergy Tricky? 159 cognitive synergy hypothesis becomes complex, but here we’re only aiming for a qualitative argument. So for illustrative purposes, we’ll stick with the "10 components" example, just for communicative simplicity. Next, let’s suppose that for any given task, there are ways to achieve this task using a system that is much simpler than any subset of size 6 drawn from the set of 10 components needed for human-level AGI, but works much better for the task than this subset of 6 components (assuming the latter are used as a set of only 6 components, without the other 4 components). Note that this supposition is a good bit stronger than mere cognitive synergy. For lack of a better name, we'll call it tricky cognitive synergy. The tricky cognitive synergy hypothesis would be true if, for example, the following possibilities were true: e creating components to serve as parts of a synergetic AGI is harder than creating compo- nents intended to serve as parts of simpler AI systems without synergetic dynamics @ components capable of serving as parts of a synergetic AGI are necessarily more complicated than components intended to serve as parts of simpler AGI systems. These certainly seem reasonable possibilities, since to serve as a component of a synergetic AGI system, a component must have the internal flexibility to usefully handle interactions with a lot of other components as well as to solve the problems that come its way. In a CogPrime context, these possibilities ring true, in the sense that tailoring an AI process for tight integration with other AI processes within CogPrime, tends to require more work than preparing a conceptually similar AT process for use on its own or in a more task-specific narrow AI system. It seems fairly obvious that, if tricky cognitive synergy really holds up as a property of human-level general intelligence, the difficulty of formulating tests for intermediate progress toward human-level AGI follows as a consequence. Because, according to the tricky cognitive synergy hypothesis, any test is going to be more easily solved by some simpler narrow AI process than by a partially complete human-level AGI system. 8.7.3 Conclusion We haven’t proved anything here, only made some qualitative arguments. However, these argu- ments do seem to give a plausible explanation for the empirical observation that positing tests for intermediate progress toward human-level AGI is a very difficult prospect. If the theoret- ical notions sketched here are correct, then this difficulty is not due to incompetence or lack of imagination on the part of the AGI community, nor due to the primitive state of the AGI field, but is rather intrinsic to the subject matter. And if these notions are correct, then quite likely the future rigorous science of AGI will contain formal theorems echoing and improving the qualitative observations and conjectures we’ve made here. If the ideas sketched here are true, then the practical consequence for AGI development is, very simply, that one shouldn’t worry a lot about producing intermediary results that are compelling to skeptical observers. Just at 2/3 of a human brain may not be of much use, similarly, 2/3 of an AGI system may not be much use. Lack of impressive intermediary results may not imply one is on a wrong development path; and comparison with narrow AI systems on specific tasks may be badly misleading as a gauge of incremental progress toward human-level AGI. HOUSE_OVERSIGHT_013075
160 8 Cognitive Synergy Hopefully it’s clear that the motivation behind the line of thinking presented here is a desire to understand the nature of general intelligence and its pursuit — not a desire to avoid testing our AGI software! Really, as AGI engineers, we would love to have a sensible rigorous way to test our intermediary progress toward AGI, so as to be able to pose convincing arguments to skeptics, funding sources, potential collaborators and so forth. Our motivation here is not a desire to avoid having the intermediate progress of our efforts measured, but rather a desire to explain the frustrating (but by now rather well-established) difficulty of creating such intermediate goals for human-level AGI in a meaningful way. If we or someone else figures out a compelling way to measure partial progress toward AGI, we will celebrate the occasion. But it seems worth seriously considering the possibility that the difficulty in finding such a measure reflects fundamental properties of general intelligence. From a practical CogPrime perspective, we are interested in a variety of evaluation and testing methods, including the "virtual preschool" approach mentioned briefly above and more extensively in later chapters. However, our focus will be on evaluation methods that give us meaningful information about CogPrime’s progress, given our knowledge of how CogPrime works and our understanding of the underlying theory. We are unlikely to focus on the achieve- ment of intermediate test results capable of convincing skeptics of the reality of our partial progress, because we have not yet seen any credible tests of this nature, and because we suspect the reasons for this lack may be rooted in deep properties of feasible general intelligence, such as tricky cognitive synergy. HOUSE_OVERSIGHT_013076
Chapter 9 General Intelligence in the Everyday Human World 9.1 Introduction Intelligence is not just about what happens inside a system, but also about what happens outside that system, and how the system interacts with its environment. Real-world general intelligence is about intelligence relative to some particular class of environments, and human-like general intelligence is about intelligence relative to the particular class of environments that humans evolved in (which in recent millennia has included environments humans have created using their intelligence). In Chapter 2, we reviewed some specific capabilities characterizing human- like general intelligence; to connect these with the general theory of general intelligence from the last few chapters, we need to explain what aspects of human-relevant environments correspond to these human-like intelligent capabilities. We begin with aspects of the environment related to communication, which turn out to tie in closely with cognitive synergy. Then we turn to physical aspects of the environment, which we suspect also connect closely with various human cognitive capabilities. Finally we turn to physical aspects of the human body and their relevance to the human mind. In the following chapter we present a deeper, more abstract theoretical framework encompassing these ideas. These ideas are of theoretical importance, and they’re also of practical importance when one turns to the critical area of AGI environment design. If one is going to do anything besides release one’s young AGI into the “wilds” of everyday human life, then one has to put some thought into what kind of environment it will be raised in. This may be a virtual world or it may be a robot preschool or some other kind of physical environment, but in any case some specific choices must be made about what to include. Specific choices must also be made about what kind of body to give one’s AGI system — what sensors and actuators, and so forth. In Chapter 16 we will present some specific suggestions regarding choices of embodiment and environment that we find to be ideal for AGI development — virtual and robot preschools — but the material in this chapter is of more general import, beyond any such particularities. If one has an intuitive idea of what properties of body and world human intelligence is biased for, then one can make practical choices about embodiment and environment in a principled rather than purely ad hoc or opportunistic way. 161 HOUSE_OVERSIGHT_013077
162 9 General Intelligence in the Everyday Human World 9.2 Some Broad Properties of the Everyday World That Help Structure Intelligence The properties of the everyday world that help structure intelligence are diverse and span multiple levels of abstraction. Most of this chapter will focus on fairly concrete patterns of this nature, such as are involved in inter-agent communication and naive physics; however, it’s also worth noting the potential importance of more abstract patterns distinguishing the everyday world from arbitrary mathematical environments. The propensity to search for hierarchical patterns is one huge potential example of an ab- stract everyday-world property. We strongly suspect the reason that searching for hierarchical patterns works so well, in so many everyday-world contexts, lies in the particular structure of the everyday world — it’s not something that would be true across all possible environments (even if one weights the space of possible environments in some clever way, say using program- length according to some standard computational model). However, this sort of assertion is of course highly “philosophical,” and becomes complex to formulate and defend convincingly given the current state of science and mathematics. Going one step further, we recall from Chapter 3 a structure called the “dual network”, which consists of superposed hierarchical and heterarchical networks: basically a hierarchy in which the distance between two nodes in the hierarchy is correlated with the distance between the nodes in some metric space. Another high level property of the everyday world may be that dual network structures are prevalent. This would imply that minds biased to represent the world in terms of dual network structure are likely to be intelligent with respect to the everyday world. In a different direction, the extreme commonality of symmetry groups in the (everyday and otherwise) physical world is another example: they occur so often that minds oriented toward recognizing patterns involving symmetry groups are likely to be intelligent with respect to the real world. We suspect that the number of cognitively-relevant properties of the everyday world is huge ... and that the essence of everyday-world intelligence lies in the list of varyingly abstract and concrete properties, which must be embedded implicitly or explicitly in the structure of a natural or artificial intelligence for that system to have everyday-world intelligence. Apart from these particular yet abstract properties of the everyday world, intelligence is just about “finding patterns in which actions tend to achieve which goals in which situations” ... but, the simple meta-algorithm needed to accomplish this universally is, we suggest, only a small percentage what it takes to make a mind. You might say that a sufficiently generally intelligent system should be able to infer the various cognitively-relevant properties of the environment from looking at data about the ev- eryday world. We agree in principle, and in fact Ben Kuipers and his colleagues have done some interesting work in this direction, showing that learning algorithms can infer some basics about the structure of space and time from experience [MIX07]. But we suggest that doing this really thoroughly would require a massively greater amount of processing power than an AGI that embodies and hence automatically utilizes these principles. It may be that the problem of inferring these properties is so hard as to require a wildly infeasible ATXI“ / Godel Machine type system. HOUSE_OVERSIGHT_013078
9.3. Embodied Communication 163 9.3 Embodied Communication Next we turn to the potential cognitive implications of seeking to achieve goals in an environ- ment in which multimodal communication with other agents plays a prominent role. Consider a community of embodied agents living in a shared world, and suppose that the agents can communicate with each other via a set of mechanisms including: e Linguistic communication, in a language whose semantics is largely (not necessarily wholly) interpretable based on the mutually experienced world e Indicative communication, in which e.g. one agent points to some part of the world or delimits some interval of time, and another agent is able to interpret the meaning e Demonstrative communication, in which an agent carries out a set of actions in the world, and the other agent is able to imitate these actions, or instruct another agent as to how to imitate these actions e Depictive communication, in which an agent creates some sort of (visual, auditory, etc.) construction to show another agent, with a goal of causing the other agent to experience phenomena similar to what they would experience upon experiencing some particular entity in the shared environment e Intentional communication, in which an agent explicitly communicates to another agent what its goal is in a certain situation ! It is clear that ordinary everyday communication between humans possesses all these aspects. We define the Embodied Communication Prior (ECP) as the probability distribution in which the probability of an entity (e.g. a goal or environment) is proportional to the difficulty of describing that entity, for a typical member of the community in question, using a particular set of communication mechanisms including the above five modes. We will sometimes refer to the prior probability of an entity under this distribution, as its “simplicity” under the distribution. Next, to further specialize the Embodied Communication Prior, we will assume that for each of these modes of communication, there are some aspects of the world that are much more easily communicable using that mode than the other modes. For instance, in the human everyday world: e Abstract (declarative) statements spanning large classes of situations are generally much easier to communicate linguistically e Complex, multi-part procedures are much easier to communicate either demonstratively, or using a combination of demonstration with other modes e Sensory or episodic data is often much easier to communicate demonstratively e The current value of attending to some portion of the shared environment is often much easier to communicate indicatively e Information about what goals to follow in a certain situation is often much easier to com- municate intentionally, i.e. via explicitly indicating what one’s own goal is These simple observations have significant implications for the nature of the Embodied Com- munication Prior. For one thing they let us define multiple forms of knowledge: e Isolatedly declarative knowledge is that which is much more easily communicable lin- guistically 1 in Appendix ?? we recount some interesting recent results showing that mirror neurons fire in response to some cases of intentional communication as thus defined HOUSE_OVERSIGHT_013079
164 9 General Intelligence in the Everyday Human World e Isolatedly procedural knowledge is that which is much more easily communicable demonstratively e Isolatedly sensory knowledge is that which is much more easily communicable depic- tively e Isolatedly attentive knowledge is that which is much more easily communicable indica- tively e Isolatedly intentional knowledge is that which is much more easily communicable in- tentionally This categorization of knowledge types resembles many ideas from the cognitive theory of memory [TC05], although the distinctions drawn here are a little crisper than any classification currently derivable from available neurological or psychological data. Of course there may be much knowledge, of relevance to systems seeking intelligence accord- ing to the ECP, that does not fall into any of these categories and constitutes “mixed knowledge.” There are some very important specific subclasses of mixed knowledge. For instance, episodic knowledge (knowledge about specific real or hypothetical sets of events) will most easily be communicated via a combination of declarative, sensory and (in some cases) procedural com- munication. Scientific and mathematical knowledge are generally mixed knowledge, as is most everyday commonsense knowledge. Some cases of mixed knowledge are reasonably well decomposable, in the sense that they decompose into knowledge items that individually fall into some specific knowledge type. For instance, an experimental chemistry procedure may be much more easily communicable pro- cedurally, whereas an allied piece of knowledge from theoretical chemistry may be much more easily communicable declaratively; but in order to fully communicate either the experimental procedure or the abstract piece of knowledge, one may ultimately need to communicate both aspects. Also, even when the best way to communicate something is mixed-mode, it may be possible to identify one mode that poses the most important part of the communication. An example would be a chemistry experiment that is best communicated via a practical demonstration together with a running narrative. It may be that the demonstration without the narrative would be vastly more valuable than the narrative without the demonstration. To cover such cases we may make less restrictive definitions such as e Interactively declarative knowledge is that which is much more easily communicable in a manner dominated by linguistic communication and so forth. We call these “interactive knowledge categories,” by contrast to the “isolated knowledge categories” introduced earlier. 9.3.0.1 Naturalness of Knowledge Categories Next we introduce an assumption we call NKC, for Naturalness of Knowledge Categories. The NKC assumption states that the knowledge in each of the above isolated and interac- tive communication-modality-focused categories forms a “natural category,” in the sense that for each of these categories, there are many different properties shared by a large percentage of the knowledge in the category, but not by a large percentage of the knowledge in the other cat- egories. This means that, for instance, procedural knowledge systematically (and statistically) has different characteristics than the other kinds of knowledge. HOUSE_OVERSIGHT_013080
9.3. Embodied Communication 165 The NKC assumption seems commonsensically to hold true for human everyday knowledge, and it has fairly dramatic implications for general intelligence. Suppose we conceive general intelligence as the ability to achieve goals in the environment shared by the communicating agents underlying the Embodied Communication Prior. Then, NKC suggests that the best way to achieve general intelligence according to the Embodied Communication Prior is going to involve @ specialized methods for handling declarative, procedural, sensory and attentional knowledge (due to the naturalness of the isolated knowledge categories) e specialized methods for handling interactions between different types of knowledge, includ- ing methods focused on the case where one type of knowledge is primary and the others are supporting (the latter due to the naturalness of the interactive knowledge categories) 9.3.0.2 Cognitive Completeness Suppose we conceive an AI system as consisting of a set of learning capabilities, each one characterized by three features: e One or more knowledge types that it is competent to deal with, in the sense of the two key learning problems mentioned above e At least one learning type: either analysis, or synthesis, or both e At least one interaction type, for each (knowledge type, learning type) pair it handles: “isolated” (meaning it deals mainly with that knowledge type in isolation), or “interactive” (meaning it focuses on that knowledge type but in a way that explicitly incorporates other knowledge types into its process), or “fully mixed” (meaning that when it deals with the knowledge type in question, no particular knowledge type tends to dominate the learning process). Then, intuitively, it seems to follow from the ECP with NKC that systems with high efficient general intelligence should have the following properties, which collectively we'll call cognitive completeness: e For each (knowledge type, learning type, interaction type) triple, there should be a learning capability corresponding to that triple. e Furthermore the capabilities corresponding to different (knowledge type, interaction type) pairs should have distinct characteristics (since according to the NKC the isolated knowledge corresponding to a knowledge type is a natural category, as is the dominant knowledge corresponding to a knowledge type) e For each (knowledge type, learning type) pair (K,L), and each other knowledge type K1 distinct from K, there should be a distinctive capability with interaction type “interactive” and dealing with knowledge that is interactively K but also includes aspects of K1 Furthermore, it seems intuitively sensible that according to the ECP with NKC, if the ca- pabilities mentioned in the above points are reasonably able, then the system possessing the capabilities will display general intelligence relative to the ECP. Thus we arrive at the hypothesis that HOUSE_OVERSIGHT_013081
166 9 General Intelligence in the Everyday Human World Under the assumption of the Embodied Communication Prior (with the Natural Knowledge Categories assumption), the property above called “cognitive complete- ness” is necessary and sufficient for efficient general intelligence at the level of an inteligent adult human (e.g. at the Piagetan formal level [Pia53]). Of course, the above considerations are very far from a rigorous mathematical proof (or even precise formulation) of this hypothesis. But we are presenting this here as a conceptual hypothesis, in order to qualitatively guide our practical AGI R&D and also to motivate further, more rigorous theoretical work. 9.3.1 Generalizing the Embodied Communication Prior One interesting direction for further research would be to broaden the scope of the inquiry, in a manner suggested above: instead of just looking at the ECP, look at simplicity measures in general, and attack the question of how a mind must be structured in order to display efficient general intelligence relative to a specified simplicity measure. This problem seems unapproach- able in general, but some special cases may be more tractable. For instance, suppose one has e asimplicity measure that (like the ECP) is approximately decomposable into a set of fairly distinct components, plus their interactions @ an assumption similar to NKC, which states that the entities displaying simplicity according to each of the distinct components, are roughly clustered together in entity-space Then one should be able to say that, to achieve efficient general intelligence relative to this decomposable simplicity measure, a system should have distinct capabilities corresponding to each of the components of the simplicity measure interactions between these capabilities, corresponding to the interaction terms in the simplicity measure. With copious additional work, these simple observations could potentially serve as the seed for a novel sort of theory of general intelligence — a theory of how the structure of a system depends on the structure of the simplicity measure with which it achieves efficient general intelligence. Cognitive Synergy Theory would then emerge as a special case of this more abstract theory. 9.4 Naive Physics Multimodal communication is an important aspect of the environment for which human in- telligence evolved — but not the only one. It seems likely that our human intelligence is also closely adapted to various aspects of our physical environment — a matter that is worth carefully attending as we design environments for our robotically or virtually embodied AGI systems to operate in. One interesting guide to the most cognitively relevant aspects of human environments is the subfield of AI known as “naive physics” [Hay85] — a term that refers to the theories about the physical world that human beings implicitly develop and utilize during their lives. For instance, HOUSE_OVERSIGHT_013082
9.4 Naive Physics 167 when you figure out that you need to pressure the knife slightly harder when spreading peanut butter rather than jelly, you’re not making this judgment using Newtonian physics or the Navier-Stokes equations of fluid dynamics; you’re using heuristic patterns that you figured out through experience. Maybe you figured out these patterns through experience spreading peanut butter and jelly in particular. Or maybe you figured these heuristic patterns out before you ever tried to spread peanut butter or jelly specifically, via just touching peanut butter and jelly to see what they feel like, and then carrying out inference based on your experience manipulating similar tools in the context of similar substances. Other examples of similar “naive physics” patterns are easy to come by, e.g. What goes up must come down. A dropped object falls straight down. A vacuum sucks things towards it. Centrifugal force throws rotating things outwards. An object is either at rest or moving, in an absolute sense. Two events are simultaneous or they are not. When running downhill, one must lift one’s knees up high. When looking at something that you just barely can’t discern accurately, squint. CONOR WN Attempts to axiomatically formulate naive physics have historically come up short, and we doubt this is a promising direction for AGI. However, we do think the naive physics literature does a good job of identifying the various phenomena that the human mind’s naive physics deals with. So, from the point of view of AGI environment design, naive physics is a useful source of requirements. Ideally, we would like an AGI’s environment to support all the fundamental phenomena that naive physics deals with. We now describe some key aspects of naive physics in a more systematic manner. Naive physics has many different formulations; in this section we draw heavily on [SC94], who divide naive physics phenomena into 5 categories. Here we review these categories and identify a number of important things that humanlike intelligent agents must be able to do relative to each of them. 9.4.1 Objects, Natural Units and Natural Kinds One key aspect of naive physics involves recognition of various aspects of objects, such as: Recognition of objects amidst noisy perceptual data Recognition of surfaces and interiors of objects Recognition of objects as manipulable units Recognition of objects as potential subjects of fragmentation (splitting, cutting) and of unification (gluing, bonding) Recognition of the agent’s body as an object, and as parts of the agent’s body as objects 6. Division of universe of perceived objects into “natural kinds”, each containing typical and atypical instances mw N Gr HOUSE_OVERSIGHT_013083
168 9 General Intelligence in the Everyday Human World 9.4.2 Events, Processes and Causality Specific aspects of naive physics related to temporality and causality are: 1. Distinguishing roughly-subjectively-instantaneous events from extended processes Identifying beginnings, endings and crossings of processes Identifying and distinguishing internal and external changes Identifying and distinguishing internal and external changes relative to one’s own body Interrelating body-changes with changes in external entities Notably, these aspects of naive physics involve a different processes occurring on a variety of different time scales, intersecting in complex patterns, and involving processes inside the agent’s body, outside the agent’s body, and crossing the boundary of the agent’s body. 9.4.3 Stuffs, States of Matter, Qualities Regarding the various states of matter, some important aspects of naive physics are: 1. Perceiving gaps between objects: holes, media, illusions like rainbows, mirages and holo- grams 2. Distinguishing the manners in which different sorts of entities (e.g. smells, sounds, light) fill space 3. Distinguishing properties such as smoothness, roughness, graininess, stickiness, runniness, etc. 4, Distinguishing degrees of elasticity and fragility 5. Assessing separability of aggregates 9.4.4 Surfaces, Limits, Boundaries, Media Gibson [Gib77, Gib79] has argued that naive physics is not mainly about objects but rather mainly about surfaces. Surfaces have a variety of aspects and relationships that are important for naive physics, such as: Perceiving and reasoning about surfaces as two-sided or one-sided interfaces Inference of the various ecological laws of surfaces Perception of various media in the world as separated by surfaces Recognition of the textures of surfaces Recognition of medium/surface layout relationships such as: ground, open environment, enclosure, detached object, attached object, hollow object, place, sheet, fissure, stick, fibre, dihedral, etc. As a concrete, evocative “toy” example of naive everyday knowledge about surfaces and boundaries, consider Sloman’s [Slo08a] example scenario, depicted in Figure 9.1 and drawn largely from [SS74] (see also related discussion in [SloO8b], in which “A child can be given one HOUSE_OVERSIGHT_013084
9.4 Naive Physics 169 Fig. 9.1: One of Sloman’s example test domains for real-world inference. Left: a number of pins and a rubber band to be stretched around them. Right: use of the pins and rubber band to make a letter T. or more rubber bands and a pile of pins, and asked to use the pins to hold the band in place to form a particular shape)... For example, things to be learnt could include”: 1. There is an area inside the band and an area outside the band. 2. The possible effects of moving a pin that is inside the band towards or further away from other pins inside the band. (The effects can depend on whether the band is already stretched.) 3. The possible effects of moving a pin that is outside the band towards or further away from other pins inside the band. 4, The possible effects of adding a new pin, inside or outside the band, with or without pushing the band sideways with the pin first. 5. The possible effects of removing a pin, from a position inside or outside the band. 6. Patterns of motion/change that can occur and how they affect local and global shape (e.g. introducing a concavity or convexity, introducing or removing symmetry, increasing or decreasing the area enclosed). 7. The possibility of causing the band to cross over itself. (NB: Is an odd number of crosses possible?) 8. How adding a second, or third band can enrich the space of structures, processes and effects of processes. 9.4.5 What Kind of Physics Is Needed to Foster Human-like Intelligence? We stated above that we would like an AGI’s environment to support all the fundamental phe- nomena that naive physics deals with; and we have now reviewed a number of these specific phenomena. But it’s not entirely clear what the “fundamental” aspects underlying these phe- nomena are. One important question in the environment-design context is how close an AGI environment needs to stick to the particulars of real-world naive physics. Is it important that a young AGI can play with the specific differences between spreading peanut butter versus jelly? Or is it enough that it can play with spreading and smearing various substances of different consistencies? How close does the analogy between an AGI environment’s naive physics and HOUSE_OVERSIGHT_013085
170 9 General Intelligence in the Everyday Human World real-world naive physics need to be? This is a question to which we have no scientific answer at present. Our own working hypothesis is that the analogy does not need to be extremely close, and with this in mind in Chapter 16 we propose a virtual environment BlocksNBeads World that encompasses all the basic conceptual phenomena of real-world naive physics, but does not attempt to emulate their details. Framed in terms of human psychology rather than environment design, the question be- comes: At what level of detail must one model the physical world to understand the ways in which human intelligence has adapted to the physical world?. Our suspicion, which underlies our BlocksNBeadsWorld design, is that it’s approximately enough to have e Newtonian physics, or some close approximation e Matter in multiple phases and forms vaguely similar to the ones we see in the real world: solid, liquid, gas, paste, goo, etc. e Ability to transform some instances of matter from one form to another e Ability to flexibly manipulate matter in various forms with various solid tools e Ability to combine instances of matter into new ones in a fairly rich way: e.g. glue or tie solids togethermix liquids together, etc. e Ability to position instances of matter with respect to each other in a rich way: e.g. put liquid in a solid cavity, cover something with a lid or a piece of fabric, etc. It seems to us that if the above are present in an environment, then an AGI seeking to achieve appropriate goals in that environment will be likely to form an appropriate “human- like physical-world intuition." We doubt that the specifics of the naive physics of different forms of matter are critical to human-like intelligence. But, we suspect that a great amount of unconscious human metaphorical thinking is conditioned on the fact that humans evolved around matter that takes a variety of forms, can be changed from one form to another, and can be fairly easily arranged and composited to form new instances from prior ones. Without many diverse instances of matter transformation, arrangement and composition in its experience, an AGI is unlikely to form an internal “metaphor-base” even vaguely similar to the human one — so that, even if it’s highly intelligent, its thinking will be radically non-human-like in character. Naturally this is all somewhat speculative and must be explored via experimentation. Maybe an elaborate blocks-world with only solid objects will be sufficient to create human-level, roughly human-like AGI with rich spatiotemporal and manipulative intuition. Or maybe human intel- ligence is more closely adapted to the specifics of our physical world — with water and dirt and plants and hair and so forth — than we currently realize. One thing that és very clear is that, as we proceed with embodying, situating and educating our AGI systems, we need to pay careful attention to the way their intelligence is conditioned by their environment. 9.5 Folk Psychology Related to naive physics is the notion of “naive psychology” or “folk psychology” [Rav04], which includes for instance the following aspects: 1. Mental simulation of other agents 2. Mental theory regarding other agents 3. Attribution of beliefs, desires and intentions (BDI) to other agents via theory or simulation HOUSE_OVERSIGHT_013086
9.6 Body and Mind 171 4, Recognition of emotions in other agents via their physical embodiment . Recognition of desires and intentions in other agents via their physical embodiment 6. Analogical and contextual inferences between self and other, regarding BDI and other as- pects 7. Attribute causes and meanings to other agents behaviors 8. Anthropomorphize non-human, including inanimate objects or The main special requirement placed on an AGI’s embodiment by the above aspects pertains to the ability of agents to express their emotions and intentions to each other. Humans do this via facial expressions, gestures and language. 9.5.1 Motivation, Requiredness, Value Relatedly to folk psychology, Gestalt [Koh38] and ecological [Gib77, Gib79] psychology suggest that humans perceive the world substantially in terms of the affordances it provides them for goal-directed action. This suggests that, to support human-like intelligence, an AGI must be capable of: 1. Perception of entities in the world as differentially associated with goal-relevant value 2. Perception of entities in the world in terms of the potential actions they afford the agent, or other agents The key point is that entities in the world need to provide a wide variety of ways for agents to interact with them, enabling richly complex perception of affordances. 9.6 Body and Mind The above discussion has focused on the world external to the body of the AGI agent embodied and embedded in the world, but the issue of the AGI’s body also merits consideration. There seems little doubt that a human’s intelligence is highly conditioned by the particularities of the human body. 9.6.1 The Human Sensorium Here the requirements seem fairly simple: while surely not strictly necessary, it would certainly be preferable to provide an AGI with fairly rich analogues of the human senses of touch, sight, sound, kinesthesia, taste and smell. Each of these senses provides different sorts of cognitive stimulation to the human mind; and while similar cognitive stimulation could doubtless be achieved without analogous senses, the provision of such seems the most straightforward ap- proach. It’s hard to know how much of human intelligence is specifically biased to the sorts of outputs provided by human senses. As vision already is accorded such a prominent role in the AI and cognitive science literature — and is discussed in moderate depth in Chapter 26 of Part 2, we won’t take time elaborating HOUSE_OVERSIGHT_013087
172 9 General Intelligence in the Everyday Human World on the importance of vision processing for humanlike cognition. The key thing an AGI requires to support humanlike “visual intelligence” is an environment containing a sufficiently robust collection of materials that object and event recognition and identification become interesting problems. Audition is cognitively valuable for many reasons, one of which is that it gives a very rich and precise method of sensing the world that is different from vision. The fact that humans can display normal intelligence while totally blind or totally deaf is an indication that, in a sense, vision and audition are redundant for understanding the everyday world. However, it may be important that the brain has evolved to account for both of these senses, because this forced it to account for the presence of two very rich and precise methods of sensing the world — which may have forced it to develop more abstract representation mechanisms than would have been necessary with only one such method. Touch is a sense that is, in our view, generally badly underappreciated within the AT commu- nity. In particular the cognitive robotics community seems to worry too little about the terribly impoverished sense of touch possessed by most current robots (though fortunately there are recent technologies that may help improve robots in this regard; see e.g. [Nan08]). Touch is how the human infant learns to distinguish self from other, and in this way it is the most essential sense for the establishment of an internal selfmodel. Touching others’ bodies is a key method for developing a sense of the emotional reality and responsiveness of others, and is hence key to the development of theory of mind and social understanding in humans. For this reason, among others, human children lacking sufficient tactile stimulation will generally wind up badly im- paired in multiple ways. A good-quality embodiment should supply an AI agent with a body that possesses skin, which has varying levels of sensitivity on different parts of the skin (so that it can effectively distinguish between reality and its perception thereof in a tactile context); and also varying types of touch sensors (e.g. temperature versus friction), so that it experiences textures as multidimensional entities. Related to touch, kinesthesia refers to direct sensation of phenomena happening inside the body. Rarely mentioned in AI, this sense seems quite critical to cognition, as it underpins many of the analogies between self and other that guide cognition. Again, it’s not important that an AGI’s virtual body have the same internal body parts as a human body. But it seems valuable to have the AGI’s virtual body display some vaguely human-body-like properties, such as feeling internal strain of various sorts after getting exercise, feeling discomfort in certain places when running out of energy, feeling internally different when satisfied versus unsatisfied, etc. Next, taste is a cognitively interesting sense in that it involves the interplay between the internal and external world; it involves the evaluation of which entities from the external world are worthy of placing inside the body. And smell is cognitively interesting in large part because of its relationship with taste. A smell is, among other things, a long-distance indicator of what a certain entity might taste like. So, the combination of taste and smell provides means for conceptualizing relationships between self, world and distance. 9.6.2 The Human Body’s Multiple Intelligences While most unique aspect of human intelligence is rooted in what one might call the "cognitive cortex" — the portions of the brain dealing with self-reflection and abstract thought. But the cognitive cortex does its work in close coordination with the body’s various more specialized HOUSE_OVERSIGHT_013088
9.6 Body and Mind 173 intelligent subsystems, including those associated with the gut, the heart, the liver, the immune and endocrine systems, and the perceptual and motor cortices. In the perspective underlying this book, the human cognitive cortex — or the core cognitive network of any roughly human-like AGI system — should be viewed as a highly flexible, self organizing network. These cognitive networks are modelable e.g. as a recurrent neural net with general topology, or a weighted labeled hypergraph, and are centrally concerned with recognizing patterns in its environment and itself, especially patterns regarding the achievement of the system’s goals in various appropriate contexts. Here we augment this perspective, noting that the human brain’s cognitive network is closely coupled with a variety of simpler and more specialized intelligent "body-system networks" which provide it with structural and dynamical inductive biasing. We then discuss the implications of this observation for practical AGI design. One recalls Pascal’s famous quote "The heart has its reasons, of which reason knows not." As we now know, the intuitive sense that Pascal and so many others have expressed, that the heart and other body systems have their own reasons, is grounded in the fact that they actually do carry out simple forms of reasoning (i.e. intelligent, adaptive dynamics), in close, sometimes cognitively valuable, coordination with the central cognitive network. 9.6.2.1 Some of the Human Body’s Specialized Intelligent Subsystems The human body contains multiple specialized intelligences apart from the cognitive cortex. Here we review some of the most critical. Hierarchies of Visual and Auditory Perception . The hierarchical structure of visual and auditory cortex has been taken by some researchers [Kurl2], [HB06] as the generic structure of cognition. While we suspect this is overstated, we agree it is important that these cortices nudge large portions of the cognitive cortex to assume an approximately hierarchical structure. Olfactory Attractors . The process of recognizing a familiar smell is grounded in a neural process similar to con- vergence to an attractor in a nonlinear dynamical system [Fre95]. There is evidence that the mammalian cognitive cortex evolved in close coordination with the olfactory cortex [Row1 1], and much of abstract cognition reflects a similar dynamic of gradually coming to a conclusion based on what initially "smells right." Physical and Cognitive Action . The cerebellum, a specially structured brain subsystem which controls motor movements, has for some time been understood to also have involvement in attention, executive control, language, working memory, learning, pain, emotion, and addiction [PSF 09]. HOUSE_OVERSIGHT_013089
174 9 General Intelligence in the Everyday Human World The Second Brain . The gastrointestinal neural net contains millions of neurons and is capable of operating inde- pendently of the brain. It modulates stress response and other aspects of emotion and motivation based on experience — resulting in so-called "gut feelings" [Ger99]. The Heart’s Neural Network . The heart has its own neural network, which modulates stress response, energy level and relaxation/excitement (factors key to motivation and emotion) based on experience [Arm(4]. Pattern Recognition and Memory in the Liver . The liver is a complex pattern recognition system, adapting via experience to better identify toxins [C06]. Like the heart, it seems to store some episodic memories as well, resulting in liver transplant recipients sometimes acquiring the tastes in music or sports of the donor [EMC12]. Immune Intelligence . The immune network is a highly complex, adaptive self-organizing system, which ongoingly solves the learning problem of identifying antigens and distinguishing them from the body system [F P86]. As immune function is highly energetically costly, stress response involves subtle modulation of the energy allocation to immune function, which involves communication between neural and immune networks. The Endocrine System: A Key Bridge Between Mind and Body . The endocrine (hormonal) system regulates (and is related by) emotion, thus guiding all aspects of intelligence (due to the close connection of emotion and motivation) [PH12]. Breathing Guides Thinking . As oxygenation of the brain plays a key role in the spread of neural activity, the flow of breath is a key driver of cognition. Forced alternate nostril breathing has been shown to significantly affect cognition via balancing activity of the two brain hemispheres [SKBB91]. Much remains unknown, and the totality of feedback loops between the human cognitive cortex and the various specialized intelligences operative throughout the human body, has not yet been thoroughly charted. HOUSE_OVERSIGHT_013090
9.6 Body and Mind 175 9.6.2.2 Implications for AGI What lesson should the AGI developer draw from all this? The particularities of the human mind/body should not be taken as general requirements for general intelligence. However, it is worth remembering just how difficult is the computational problem of learning, based on experiential feedback alone, the right way to achieve the complex goal of controlling a system with general intelligence at the human level or beyond. To solve this problem without some sort of strong inductive biasing may require massively more experience than young humans obtain. Appropriate inductive bias may be embedded in an AGI system in many different ways. Some AGI designers have sought to embed it very explicitly, e.g. with hand-coded declarative knowledge as in Cyc, SOAR and other "GOFAI" type systems. On the other hand, the human brain receives its inductive bias much more subtly and implicitly, both via the specifics of the initial structure of the cognitive cortex, and via ongoing coupling of the cognitive cortex with other systems possessing more focused types of intelligence and more specific structures and/or dynamics. In building an AGI system, one has four choices, very broadly speaking: 1. Create a flexible mind-network, as unbiased as feasible, and attempt to have it learn how to achieve its goals via experience 2. Closely emulate key aspects of the human body along with the human mind 3. Imitate the human mind-body, conceptually if not in detail, and create a number of struc- turally and dynamically simpler intelligent systems closely and appropriately coupled to the abstract cognitive mind-network, provide useful inductive bias. 4, Find some other, creative way to guide and probabilistically constrain one’s AGI system’s mind-network, providing inductive bias appropriate to the tasks at hand, without emulating even conceptually the way the human mind-brain receives its inductive bias via coupling with simpler intelligent systems. Our suspicion is that the first option will not be viable. On the other hand, to do the second option would require more knowledge of the human body than biology currently possesses. This leaves the third and fourth options, both of which seem viable to us. CogPrime incorporates a combination of the third and fourth options. CogPrime’s generic dynamic knowledge store, the Atomspace, is coupled with specialized hierarchical networks (DeSTIN) for vision and audition, somewhat mirroring the human cortex. An artificial en- docrine system for OpenCog is also under development, speculatively, as part of a project using OpenCog to control humanoid robots. On the other hand, OpenCog has no gastrointestinal nor cardiological nervous system, and the stress-response-based guidance provided to the human brain by a combination of the heart, gut, immune system and other body systems, is achieved in CogPrime in a more explicit way using the OpenPsi model of motivated cognition, and its integration with the system’s attention allocation dynamics. Likely there is no single correct way to incorporate the lessons of intelligent human body- system networks into AGI designs. But these are aspects of human cognition that all AGI researchers should be aware of. HOUSE_OVERSIGHT_013091
176 9 General Intelligence in the Everyday Human World 9.7 The Extended Mind and Body Finally, Hutchins [Hut95], Logan [Log07| and others have promoted a view of human intelli- gence that views the human mind as extended beyond the individual body, incorporating social interactions and also interactions with inanimate objects, such as tools, plants and animals. This leads to a number of requirements for a humanlike AGI’s environment: 1. The ability to create a variety of different tools for interacting with various aspects of the world in various different ways, including tools for making tools and ultimately machinery 2. The existence of other mobile, virtual life-forms in the world, including simpler and less intelligent ones, and ones that interact with each other and with the AGI 3. The existence of organic growing structures in the world, with which the AGI can interact in various ways, including halting their growth or modifying their growth pattern How necessary these requirements are is hard to say — but it is clear that these things have played a major role in the evolution of human intelligence. 9.8 Conclusion Happily, this diverse chapter supports a simple, albeit tentative conclusion. Our suggestion is that, if an AGI is e placed in an environment capable of roughly supporting multimodal communication and vaguely (but not necessarily precisely) real-world-ish naive physics e surrounded with other intelligent agents of varying levels of complexity, and other complex, dynamic structures to interface with e given a body that can perceive this environment through some forms of sight, sound and touch; and perceive itself via some form of kinesthesia e given a motivational system that encourages it to make rich use of these aspects of its environment then the AGI is likely to have an experience-base reinforcing the key inductive biases provided by the everyday world for the guidance of humanlike intelligence. HOUSE_OVERSIGHT_013092
Chapter 10 A Mind-World Correspondence Principle 10.1 Introduction Real-world minds are always adapted to certain classes of environments and goals. The ideas of the previous chapter, regarding the connection between a human-like intelligence’s internals and its environment, result from exploring the implications of this adaptation in the context of the cognitive synergy concept. In this chapter we explore the mind-world connection in a broader and more abstract way — making a more ambitious attempt to move toward a "general theory of general intelligence." One basic premise here, as in the preceding chapters is: Even a system of vast general intelligence, subject to real-world space and time constraints, will necessarily be more efficient at some kinds of learning than others. Thus, one approach to formulating a general theory of general intelligence is to look at the relationship between minds and worlds — where a "world" is conceived as an environment and a set of goals defined in terms of that environment. In this spirit, we here formulate a broad principle binding together worlds and the minds that are intelligent in these worlds. The ideas of the previous chapter constitute specific, concrete instantiations of this general principle. A careful statement of the principle requires introduction of a number of technical concepts, and will be given later on in the chapter. A crude, informal version of the principle would be: MIND-WORLD CORRESPONDENCE-PRINCIPLE For a mind to work intelligently toward certain goals in a certain world, there should be a nice mapping from goal-directed sequences of world-states into sequences of mind-states, where "nice" means that a world-state-sequence W composed of two parts W, and Wo, gets mapped into a mind-state-sequence M composed of two corresponding parts M, and Mo. What’s nice about this principle is that it relates the decomposition of the world into parts, to the decomposition of the mind into parts. 177 HOUSE_OVERSIGHT_013093
178 10 A Mind-World Correspondence Principle 10.2 What Might a General Theory of General Intelligence Look Like? It’s not clear, at this point, what a real "general theory of general intelligence" would look like — but one tantalizing possibility is that it might confront the two questions: e How does one design a world to foster the development of a certain sort of mind? e How does one design a mind to match the particular challenges posed by a certain sort of world? One way to achieve this would be to create a theory that, given a description of an environment and some associated goals, would output a description of the structure and dynamics that a system should possess to be intelligent in that environment relative to those goals, using limited computational resources. Such a theory would serve a different purpose from the mathematical theory of "universal intelligence" developed by Marcus Hutter [Hut05] and others. For all its beauty and theoreti- cal power, that approach currently gives it useful conclusions only about general intelligences with infinite or infeasibly massive computational resources. On the other hand, the approach suggested here is aimed toward creation of a theory of real-world general intelligences utilizing realistic amounts of computational power, but still possessing general intelligence comparable to human beings or greater. This reflects a vision of intelligence as largely concerned with adaptation to particular classes of environments and goals. This may seem contradictory to the notion of "general" intelligence, but I think it actually embodies a realistic understanding of general intelligence. Maximally general intelligence is not pragmatically feasible; it could only be achieved using infinite com- putational resources [ITut05]. Real-world systems are inevitably limited in the intelligence they can display in any real situation, because real situations involve finite resources, including finite amounts of time. One may say that, in principle, a certain system could solve any problem given enough resources and time but, even when this is true, it’s not necessarily the most in- teresting way to look at the system’s intelligence. It may be more important to look at what a system can do given the resources at its disposal in reality. And this perspective leads one to ask questions like the ones posed above: which bounded-resources systems are well-disposed to display intelligence in which classes of situations? As noted in Chapter 7 above, one can assess the generality of a system’s intelligence via looking at the entropy of the class of situations across which it displays a high level of intelligence (where “high” is measured relative to its total level of intelligence across all situations). A system with a high generality of intelligence will tend to be roughly equally intelligent across a wide variety of situations; whereas a system with lower generality of intelligence will tend to be much more intelligent in a small subclass of situations, than in any other. The definitions given above embody this notion in a formal and quantitative way. If one wishes to create a general theory of general intelligence according to this sort of perspective, the main question then becomes how to represent goals/environments and systems in such a way as to render transparent the natural correspondence between the specifics of the former and the latter, in the context of resource-bounded intelligence. This is the business of the next section. HOUSE_OVERSIGHT_013094
10.3 Steps Toward A (Formal) General Theory of General Intelligence 179 10.3 Steps Toward A (Formal) General Theory of General Intelligence Now begins the formalism. At this stage of development of the theory proposed in this chapter, mathematics is used mainly as a device to ensure clarity of expression. However, once the theory is further developed, it may possibly become useful for purposes of calculation as well. Suppose one has any system S (which could be an AI system, or a human, or an environment that a human or AI is interacting with, or the combination of an environment and a human or Al’s body, etc.). One may then construct an uncertain transition graph associated with that system 5, in the following way: e The nodes of the graph represent fuzzy sets of states of system S' (I'll call these state-sets from here on, leaving the fuzziness implicit) e The (directed) links of the graph represent probabilistically weighted transitions between state-sets Specifically, the weight of the link from B to A should be defined as P(o(S, A, t(T))|o(S, B, T)) where o(S, A, T) denotes the presence of the system S in the state-set A during time-distribution 7, and f¢() is a temporal succession function defined so that ¢(7') refers to a time-distribution conceived as "after" T. A time-distribution is a probability distribution over time-points. The interaction of fuzziness and probability here is fairly straightforward and may be handled in the manner of PLN, as outlined in subsequent chapters. Note that the definition of link weights is dependent on the specific implementation of the temporal succession function, which includes an implicit time-scale. Suppose one has a transition graph corresponding to an environment; then a goal relative to that environment may be defined as a particular node in the transition graph. The goals of a particular system acting in that environment may then be conceived as one or more nodes in the transition graph. The system’s situation in the environment at any point in time may also be associated with one or more nodes in the transition graph; then, the system’s movement toward goal-achievement may be associated with paths through the environment’s transition graph leading from its current state to goal states. It may be useful for some purposes to filter the uncertain transition graph into a crisp transition graph by placing a threshold on the link weights, and removing links with weights below the threshold. The next concept to introduce is the world-mind transfer function, which maps world (envi- ronment) state-sets into organism (e.g. AI system) state-sets in a specific way. Given a world state-set W, the world-mind transfer function MZ maps W into various organism state-sets with various probabilities, so that we may say: M(W) is the probability distribution of state-sets the organism tends to be in, when its environment is in state-set W. (Recall also that state-sets are fuzzy.) Now one may look at the spaces of world-paths and mind-paths. A world-path is a path through the world’s transition graph, and a mind-path is a path through the organism’s transi- HOUSE_OVERSIGHT_013095
180 10 A Mind-World Correspondence Principle tion graph. Given two world-paths P and Q, it’s obvious how to define the composition P*Q one follows P and then, after that, follows Q, thus obtaining a longer path. Similarly for mind-paths. In category theory terms, we are constructing the free category associated with the graph: the objects of the category are the nodes, and the morphisms of the category are the paths. And category theory is the right way to be thinking here we want to be thinking about the relationship between the world category and the mind category. The world-mind transfer function can be interpreted as a mapping from paths to subgraphs: Given a world-path, it produces a set of mind state-sets, which have a number of links between them. One can then define a world-mind path transfer function M(P) via taking the mind-graph M(nodes(P)), and looking at the highest-weight path spanning M(nodes(P)). (Here nodes? obviously means the set of nodes of the path P.) A functor F between the world category and the mind category is a mapping that preserves object identities and so that F(P *Q) = F(P) * F(Q) We may also introduce the notion of an approximate functor, meaning a mapping F so that the average of d(F(P * Q), F(P) * F(Q)) is small. One can introduce a prior distribution into the average here. This could be the Levin universal distribution or some variant (the Levin distribution assigns higher probability to computation- ally simpler entities). Or it could be something more purpose specific: for example, one can give a higher weight to paths leading toward a certain set of nodes (e.g. goal nodes). Or one can use a distribution that weights based on a combination of simplicity and directedness toward a certain set of nodes. The latter seems most interesting, and I will define a goal-weighted ap- proximate functor as an approximate functor, defined with averaging relative to a distribution that balances simplicity with directedness toward a certain set of goal nodes. The move to approximate functors is simple conceptually, but mathematically it’s a fairly big step, because it requires us to introduce a geometric structure on our categories. But there are plenty of natural metrics defined on paths in graphs (weighted or not), so there’s no real problem here. 10.4 The Mind-World Correspondence Principle Now we finally have the formalism set up to make a non-trivial statement about the relationship between minds and worlds. Namely, the hypothesis that: MIND-WORLD CORRESPONDENCE PRINCIPLE For an organism with a reasonably high level of intelligence in a certain world, relative to a certain set of goals, the mind-world path transfer function is a goal-weighted approximate functor. HOUSE_OVERSIGHT_013096
10.5 How Might the Mind-World Correspondence Principle Be Useful? 181 That is, a little more loosely: the hypothesis is that, for intelligence to occur, there has to be a natural correspondence between the transition-sequences of world-states and the corresponding transition-sequences of mind-states, at least in the cases of transition-sequences leading to relevant goals. We suspect that a variant of the above proposition can be formally proved, using the definition of general intelligence presented in Chapter 7. The proof of a theorem corresponding to the above would certainly constitute an interesting start toward a general formal theory of general intelligence. Note that proving anything of this nature would require some attention to the time-scale-dependence of the link weights in the transition graphs involved. A formally proved variant of the above proposition would be in short, a "MIND-WORLD CORRESPONDENCE THEOREM." Recall that at the start of the chapter, we expressed the same idea as: MIND-WORLD CORRESPONDENCE-PRINCIPLE For a mind to work intelligently toward certain goals in a certain world, there should be a nice mapping from goal-directed sequences of world-states into sequences of mind-states, where "nice" means that a world-state-sequence W composed of two parts W, and Ws, gets mapped into a mind-state-sequence M composed of two corresponding parts MM, and Mg. That is a reasonable gloss of the principle, but it’s clunkier and less accurate, than the statement in terms of functors and path transfer functions, because it tries to use only common- language vocabulary, which doesn’t really contain all the needed concepts. 10.5 How Might the Mind-World Correspondence Principle Be Useful? Suppose one believes the Mind-World Correspondence Principle as laid out above so what? Our hope, obviously, is that the principle could be useful in actually figuring out how to architect intelligent systems biased toward particular sorts of environment. And of course, this is said with the understanding that any finite intelligence must be biased toward some sorts of environment. Relatedly, given a specific AGI design (such as CogPrime), one could use the principle to figure out which environments it would be best suited for. Or one could figure out how to adjust the particulars of the design, to maximize the system’s intelligence in the environments of interest. One next step in developing this network of ideas, aside from (and potentially building on) full formalization of the principle, would be an exploration of real-world environments in terms of transition graphs. What properties do the transition graphs induced from the real world have? One such property, we suggest, is successive refinement. Often the path toward a goal in- volves first gaining an approximate understanding of a situation, then a slightly more accurate understanding, and so forth — until finally one has achieved a detailed enough understanding to actually achieve the goal. This would be represented by a world-path whose nodes are state-sets involving the gathering of progressively more detailed information. HOUSE_OVERSIGHT_013097
182 10 A Mind-World Correspondence Principle Via pursuing to the mind-world correspondence property in this context, I believe we will find that world-paths reflecting successive refinement correspond to mind-paths embodying suc- cessive refinement. This will be found to relate to the hierarchical structures found so frequently in both the physical world and the human mind-brain. Hierarchical structures allow many rel- evant goals to be approached via successive refinement, which I believe is the ultimate reason why hierarchical structures are so common in the human mind-brain. Another next step would be exploring what mind-world correspondence means for the struc- ture and dynamics of a limited-resources intelligence. If an organism O has limited resources and, to be intelligent, needs to make P(o(O, M(A), t(7))]o(O, M(B), T)) high for particular world state-sets A and B, then what’s the organism’s best approach? Arguably, it should represent (/(A) and M(B) internally in such a way that very little compu- tational effort is required for it to transition between M(A) and M(B). For instance, this could be done by coding its knowledge in such a way that 17(A) and M(B) share many common bits; or it could be done in other more complicated ways. If, for instance, A is a subset of B, then it may prove beneficial for the organism to represent M(A) physically as a subset of its representation of M(B). Pursuing this line of thinking, one could likely derive specific properties of an intelligent organism’s internal information-flow, from properties of the environment and goals with respect to which it’s supposed to be intelligent. This would allow us to achieve the holy grail of intelligence theory as I understand it: given a description of an environment and goals, to be able to derive an architectural description for an organism that will display a high level of intelligence relative to those goals, given limited computational resources. While this “holy grail” is obviously a far way off, what we’ve tried to do here is to outline a clear mathematical and conceptual direction for moving toward it. 10.6 Conclusion The Mind-World Correspondence Principle presented here — if in the vicinity of correctness — constitutes a non-trivial step toward fleshing out the concept of a general theory of general intelligence. But obviously the theory is still rather abstract, and also not completely rigorous. There’s a lot more work to be done. The Mind-World Correspondence Principle as articulated above is not quite a formal math- ematical statement. It would take a little work to put in all the needed quantifiers to formulate it as one, and it’s not clear the best way to do so the details would perhaps become clear in the course of trying to prove a version of it rigorously. One could interpret the ideas presented in this chapter as a philosophical theory that hopes to be turned into a mathematical theory and to play a key role in a scientific theory. For the time being, the main role to be served by these ideas is qualitative: to help us think about concrete AGI designs like CogPrime in a sensible way. It’s important to understand what the goal of a real-world AGI system needs to be: to achieve the ability to broadly learn and generalize, yes, but not with infinite capability rather with biases and patterns that are implicitly and/or explicitly tuned to certain broad classes of goals and environments. The Mind-World HOUSE_OVERSIGHT_013098































