Ben Goertzel with Cassio Pennachin & Nil Geisweiller & the OpenCog Team Engineering General Intelligence, Part 2: The CogPrime Architecture for Integrative, Embodied AGI September 19, 2013 EFTA00624128
EFTA00624129
This book is dedicated by Ben Goertzel to his beloved, departed grandfather, Leo Ztuell - an amazingly warm-hearted, giving human being who was also a deep thinker and excellent scientist, who got Ben started on the path of science. As a careful experimentalist, Leo would have been properly skeptical of the big hypotheses made here - but he would have been eager to see them put to the test! EFTA00624130
EFTA00624131
Preface Welcome to the second volume of Engineering General Intelligence! This is the second half of a two-part technical treatise aimed at outlining a practical approach to engineering software systems with general intelligence at the human level and ultimately beyond. Our goal here is an ambitious one and not a modest one: Machines with flexible problem- solving ability, open-ended learning capability, creativity and eventually, their own kind of genius. Part 1 set the stage, dealing with with a variety of general conceptual issues related to the engineering of advanced AGI, as well as presenting a brief overview of the CogPrime design for Artificial General Intelligence. Now here in Part 2 we plunge deep into the nitty-gritty, and describe the multiple aspects of the CogPrime with a fairly high degree of detail. First we describe the CogPrime software architecture and knowledge representation in de- tail; then we review the "cognitive cycle" via which CogPrime perceives and acts in the world and reflects on itself. We then turn to various forms of learning: procedural. declarative (e.g. inference), simulative and integrative. Methods of enabling natural language functionality in CogPrime are then discussed; and the volume concludes with a chapter summarizing the ar- gument that CogPrime can lead to human-level (and eventually perhaps greater) AGI, and a chapter giving a "thought experiment" describing the internal dynamics via which a completed CogPrime system might solve the problem of obeying the request "Build me something with blocks that I haven't seen before." Reading this book before Engineering General Intelligence, Part 1 first is not especially recommended, since the prequel not only provides context for this one, but it also defines a number of specific terms and concepts that are used here without explanation (for example, Part One has an extensive Glossary). However, the impatient reader who has not mastered Part 1, or the reader who has finished Part 1 but is tempted to hop through Part 2 nonlinearly, might wish to first skim the final two chapters, and then return to reading in linear order. While the majority of the text here was written by the lead author Ben Goertzel, the overall work and underlying ideas have been very much a team effort, with major input from the sec- ondary authors Cassio Pennachin and Nil Geisweiller, and large contributions from various other contributors as well. Nlany chapters have specifically indicated coauthors; but the contributions from various collaborating researchers and engineers go far beyond these. The creation of the AGI approach and design presented here is a process that has occurred over a long period of time among a community of people; and this book is in fact a quite partial view of the existent Iii EFTA00624132
via body of knowledge and intuition regarding CogPrime. For example, beyond the ideas presented here, there is a body of work on the OpenCog wiki site, and then the OpenCog codebase itself. More extensive introductory remarks may be found in Preface of Part 1, including a brief history of the book and acknowledgements to some of those who helped inspire it. Also, one brief comment from the Preface of Part 1 bears repeating: At several places in this volume, as in its predecessor, we will refer to the "current" CogPrime implementation (in the OpenCog framework); in all cases this refers to the OpenCog software system as of late 2013. We fully realize that this book is not "easy reading", and that the level and nature of exposition varies somewhat from chapter to chapter. We have done our best to present these very complex ideas as clearly as we could, given our own time constraints, and the lack of commonly understood vocabularies for discussing many of the concepts and systems involved. Our hope is that the length of the book, and the conceptual difficulty of some portions, will be considered as compensated by the interest of the ideas we present. For, make no mistake — for all their technicality and subtlety, we find the ideas presented here incredibly exciting. We are talking about no less than the creation of machines with intelligence, creativity and genius equaling and ultimately exceeding that of human beings. This is, in the end, the kind of book that we (the authors) all hoped to find when we first entered the AI field: a reasonably detailed description of how to go about creating thinking machines. The fact that so few treatises of this nature, and so few projects explicitly aimed at the creation of advanced AGI, exist, is something that has perplexed us since we entered the field. Rather than just complain about it, we have taken matters into our own hands, and worked to create a design and a codebase that we believe capable of leading to human-level AGI and beyond. We feel tremendously fortunate to live in times when this sort of pursuit can be discussed in a serious, scientific way. Online Appendices Just one more thing before getting started! This book originally had even more chapters than the ones currently presented in Parts 1 and 2. In order to decrease length and increase fo- cus, however, a number of chapters dealing with peripheral - yet still relevant and interest- ing - matters were moved to online appendices. These may be downloaded in a single PDF file at http: higoert zel.orgiengineering_general_Intenigence_appendices_ B-I4.pdf. The titles of these appendices are: • Appendix A: Possible Worlds Semantics and Experiential Semantics • Appendix B: Steps Toward a Formal Theory of Cognitive Structure and Dynamics • Appendix C: Emergent Reflexive Mental Structures • Appendix D: GOLEM: Toward an AGI Meta-Architecture Enabling Both Goal Preservation and Radical Self-Improvement • Appendix E: Lojban++: A Novel Linguistic Mechanism for Teaching AGI Systems • Appendix F: PLN and the Brain • Appendix G: Possible Worlds Semantics and Experiential Semantics • Appendix H: Propositions About Environments in Which CogPrime Components are Useful EFTA00624133
ix None of these are critical to understanding the key ideas in the book, which is why they were relegated to online appendices. However, reading them will deepen your understanding of the conceptual and formal perspectives underlying the CogPrime design. September 2013 Ben Goertzet EFTA00624134
EFTA00624135
Contents Section I Architectural and Representational Mechanisms 19 The OpenCog Framework 3 19.1 Introduction 3 19.1.1 Layers of Abstraction in Describing Artificial Minds 3 19.1.2 The OpenCog Framework 4 19.2 The OpenCog Architecture 5 19.2.1 OpenCog and Hardware Models 5 19.2.2 The Key Components of the OpenCog Framework 6 19.3 The AtomSpace 7 19.3.1 The Knowledge Unit: Atoms 7 19.3.2 AtomSpace Requirements and Properties 8 19.3.3 Accessing the Atomspace 9 19.3.4 Persistence 10 19.3.5 Specialized Knowledge Stores 11 19.4 MindAgents: Cognitive Processes 13 19.4.1 A Conceptual View of CogPrime Cognitive Processes 14 19.4.2 Implementation of MindAgents 15 19.4.3 Tasks 16 19.4.4 Scheduling of MindAgents and Tasks in a Unit 16 19.4.5 The Cognitive Cycle 17 19.5 Distributed AtomSpace and Cognitive Dynamics 18 19.5.1 Distributing the AtomSpace 18 19.5.2 Distributed Processing 23 20 Knowledge Representation Using the Atomspace 27 20.1 Introduction 27 20.2 Denoting Atoms 28 20.2.1 Meta-Language 28 20.2.2 Denoting Atoms 30 20.3 Representing Functions and Predicates 35 20.3.1 Execution Links 36 20.3.2 Denoting Schema and Predicate Variables 39 xi EFTA00624136
xii Contents 20.3.3 Variable and Combinator Notation 41 20.3.4 Inheritance Between Higher-Order Types 43 20.3.5 Advanced Schema Manipulation 44 21 Representing Procedural Knowledge 49 21.1 Introduction 49 21.2 Representing Programs 50 21.3 Representational Challenges 51 21.4 What Makes a Representation Tractable? 53 21.5 The Combo Language 55 21.6 Normal Forms Postulated to Provide Tractable Representations 55 21.6.1 A Simple Type System 56 21.6.2 Boolean Normal Form 57 21.6.3 Number Normal Form 57 21.6.4 List Normal Form 57 21.6.5 Tuple Normal Form 57 21.6.6 Enum Normal Form 58 21.6.7 Function Normal Form 58 21.6.8 Action Result Normal Form 58 21.7 Program Transformations 59 21.7.1 Reductions 59 21.7.2 Neutral Transformations 60 21.7.3 Non-Neutral Transformations 62 21.8 Interfacing Between Procedural and Declarative Knowledge 63 21.8.1 Programs Manipulating Atoms 63 21.9 Declarative Representation of Procedures 64 Section II The Cognitive Cycle 22 Emotion, Motivation, Attention and Control 67 22.1 Introduction 67 22.2 A Quick Look at Action Selection 68 22.3 Psi in C,ogPrime 69 22.4 Implementing Emotion Rules atop Psi's Emotional Dynamics 72 22.4.1 Grounding the Logical Structure of Emotions in the Psi Model 73 22.5 Goals and Contexts 73 22.5.1 Goal Atoms 74 22.6 Context Atoms 76 22.7 Ubergoal Dynamics 77 22.7.1 Implicit Ubergoal Pool Modification 77 22.7.2 Explicit Ubergoal Pool Modification 78 22.8 Goal Formation 78 22.9 Goal Fulfillment and Predicate Schematization 79 22.10Context Formation 79 22.11Execut ion Management 80 22.12Goals and Time 81 EFTA00624137
Contents xiii 23 Attention Allocation 83 23.1 Introduction 83 23.2 Semantics of Short and Long Temi Importance 85 23.2.1 The Precise Semantics of STI and LTI 86 23.2.2 STI, STIFund, and Juju 89 23.2.3 Formalizing LTI 89 23.2.4 Applications of LT/bunt versus LT/cont 90 23.3 Defining Burst LTI in Terms of STI 91 23.4 Valuing LTI and STI in terms of a Single Currency 92 23.5 Economic Attention Networks 94 23.5.1 Semantics of Hebbian Links 94 23.5.2 Explicit and Implicit Hebbian Relations 95 23.6 Dynamics of STI and LTI Propagation 95 23.6.1 ECAN Update Equations 96 23.6.2 ECAN as Associative Memory 101 23.7 Glocal Economic Attention Networks 101 23.7.1 Experimental Explorations 102 23.8 Long-Term Importance and Forgetting 102 23.9 Attention Allocation via Data Mining on the System Activity Table 103 23.10Schema Credit Assignment 104 23.11Interaction between ECANs and other CogPrime Components 106 23.11.1Use of PLN and Procedure Learning to Help ECAN 106 23.11.2Use of ECAN to Help Other Cognitive Processes 106 23.12MindAgent Importance and Scheduling 107 23.13Information Geometry for Attention Allocation 108 23.13.1Brief Review of Information Geometry 108 23.13.2Information-Geometric Learning for Recurrent Networks: Extending the ANGL Algorithm 109 23.13.3Information Geometry for Economic Attention Allocation: A Detailed Example 110 24 Economic Goal and Action Selection 113 24.1 Introduction 113 24.2 Transfer of STI "Requests for Services" Between Goals 114 24.3 Feasibility Structures 116 24.4 Goal Based Schema Selection 116 24.4.1 A Game-Theoretic Approach to Action Selection 117 24.5 SchemaActivation 118 24.6 GoalBasedSchemaLearning 119 25 Integrative Procedure Evaluation 121 25.1 Introduction 121 25.2 Procedure Evaluators 121 25.2.1 Simple Procedure Evaluation 122 25.2.2 Effort Based Procedure Evaluation 122 25.2.3 Procedure Evaluation with Adaptive Evaluation Order 123 25.3 The Procedure Evaluation Process 123 EFTA00624138
xiv Contents 25.3.1 Truth Value Evaluation 124 25.3.2 Schema Execution 125 Section III Perception and Action 26 Perceptual and Motor Hierarchies 129 26.1 Introduction 129 26.2 The Generic Perception Process 130 26.2.1 The ExperienceDB 131 26.3 Interfacing CogPrime with a Virtual Agent 131 26.3.1 Perceiving the Virtual World 132 26.3.2 Acting in the Virtual World 133 26.4 Perceptual Pattern Mining 134 26.4.1 Input Data 134 26.4.2 Transaction Graphs 135 26.4.3 Spatiotemporal Conjunctions 135 26.4.4 The Mining Task 136 26.5 The Perceptual-Motor Hierarchy 136 26.6 Object Recognition from Polygonal Meshes 137 26.6.1 Algorithm Overview 138 26.6.2 Recognizing PersistentPolygonNodes (PPNodes) from PolygonNodes 138 26.6.3 Creating Adjacency Graphs from PPNodes 139 26.6.4 Clustering in the Adjacency Graph 140 26.6.5 Discussion 140 26.7 Interfacing the Atomspace with a Deep Learning Based Perception-Action Hierarchy 140 26.7.1 Hierarchical Perception Action Networks 141 26.7.2 Declarative Memory 142 26.7.3 Sensory Memory 142 26.7.4 Procedural Memory 142 26.7.5 Episodic Memory 143 26.7.6 Action Selection and Attention Allocation 144 26.8 Multiple Interaction Channels 144 27 Integrating CogPrime with a Compositional Spatiotemporal Deep Learning Network 147 27.1 Introduction 147 27.2 Integrating CSDLNs with Other AI Frameworks 149 27.3 Semantic CSDLN for Perception Processing 149 27.4 Semantic CSDLN for Motor and Sensorimotor Processing 152 27.5 Connecting the Perceptual and Motoric Hierarchies with a Goal Hierarchy 154 28 Making DeSTIN Representationally Transparent 157 28.1 Introduction 157 28.2 Review of DeSTIN Architecture and Dynamics 158 28.2.1 Beyond Gray-Scale Vision 159 28.3 Uniform DeSTIN 159 28.3.1 Translation-Invariant DeSTIN 160 EFTA00624139
Contents xv 28.3.2 Mapping States of Tran.slation-Invariant De$TIN into the Atomspace 161 28.3.3 Scale-Invariant DeSTIN 162 28.3.4 Rotation Invariant DeSTIN 163 28.3.5 Temporal Perception 164 28.4 Interpretation of DeSTIN's Activity 164 28.4.1 DeSTIN's Assumption of Hierarchical Decomposability 165 28.4.2 Distance and Utility 165 28.5 Benefits and Costs of Uniform DeSTIN 166 28.6 Imprecise Probability as a Tool for Linking CogPrime and DeSTIN 167 28.6.1 Visual Attention Focusing 167 28.6.2 Using Imprecise Probabilities to Guide Visual Attention Focusing 168 28.6.3 Sketch of Application to DeSTIN 168 29 Bridging the Symbolic/Subsymbolic Gap 171 29.1 Introduction 171 29.2 Simplified OpenCog Workflow 173 29.3 Integrating De$TIN and OpenCog 174 29.3.1 Mining Patterns from DeSTIN States 175 29.3.2 Probabilistic Inference on Mined Hypergraphs 176 29.3.3 Insertion of OpenCog-Learned Predicates into DeSTIN's Pattern Library 177 29.4 Multisensory Integration, and Perception-Action Integration 178 29.4.1 Perception-Action Integration 179 29.4.2 Thought-Experiment: Eye-Hand Coordination 181 29.5 A Practical Example: Using Subtree Mining to Bridge the Gap Between DeSTIN and PLN 182 29.5.1 The Importance of Semantic Feedback 184 29.6 Some Simple Experiments with Letters 184 29.6.1 Mining Subtrees from DeSTIN States Induced via Observing Letterforms 184 29.6.2 Mining Subtrees from DeSTIN States Induced via Observing Letterforms 185 29.7 Conclusion 188 Section IV Procedure Learning 30 Procedure Learning as Program Learning 193 30.1 Introduction 193 30.1.1 Program Learning 193 30.2 Representation-Building 195 30.3 Specification Based Procedure Learning 196 31 Learning Procedures via Imitation, Reinforcement and Correction 197 31.1 Introduction 197 31.2 IRC Learning 197 31.2.1 A Simple Example of Imitation/Reinforcement Learning 198 31.2.2 A Simple Example of Corrective Learning 199 31.3 IRC Learning in the PetBrain 201 31.3.1 Introducing Corrective Learning 203 31.4 Applying A Similar IRC Methodology to Spontaneous Learning 203 EFTA00624140
xti Contents 32 Procedure Learning via Adaptively Biased Hillcimbing 205 32.1 Introduction 205 32.2 Hillclimbing 206 32.3 Entity and Perception Filters 207 32.3.1 Entity filter 207 32.3.2 Entropy perception filter 207 32.4 Using Action Sequences as Building Blocks 208 32.5 Automatically Parametrizing the Program Size Penalty 208 32.5.1 Definition of the complexity penalty 208 32.5.2 Parameterizing the complexity penalty 209 32.5.3 Definition of the Optimization Problem 210 32.6 Some Simple Experimental Results 211 32.7 Conclusion 214 33 Probabilistic Evolutionary Procedure Learning 215 33.1 Introduction 215 33.1.1 Explicit versus Implicit Evolution in CogPrime 217 33.2 Estimation of Distribution Algorithms 218 33.3 Competent Program Evolution via MOSES 219 33.3.1 Statics 219 33.3.2 Dynamics 222 33.3.3 Architecture 223 33.3.4 Example: Artificial Ant Problem 224 33.3.5 Discussion 229 33.3.6 Conclusion 229 33.4 Integrating Feature Selection Into the Learning Process 230 33.4.1 Machine Learning, Feature Selection and AGI 231 33.4.2 Data- and Feature- Focusable Learning Problems 232 33.4.3 Integrating Feature Selection Into Learning 233 33.4.4 Integrating Feature Selection into MOSES Learning 234 33.4.5 Application to Genomic Data Classification 234 33.5 Supplying Evolutionary Learning with Long-Term Memory 236 33.6 Hierarchical Program Learning 237 33.6.1 Hierarchical Modeling of Composite Procedures in the AtomSpace 238 33.6.2 Identifying Hierarchical Structure In Combo trees via Metallodes and Dimensional Embedding 239 33.7 Fitness Function Estimation via Integrative Intelligence 242 Section V Declarative Learning 34 Probabilistic Logic Networks 247 34.1 Introduction 247 34.2 A Simple Overview of PLN 248 34.2.1 Forward and Backward Chaining 249 34.3 First Order Probabilistic Logic Networks 250 34.3.1 Core FOPLN Relationships 250 34.3.2 PLN Truth Values 251 EFTA00624141
Contents xvii 34.3.3 Auxiliary FOPLN Relationships 251 34.3.4 PLN Rules and Formulas 252 34.3.5 Inference Trails 253 34.4 Higher-Order PLN 254 34.4.1 Reducing HOPLN to FOPLN 255 34.5 Predictive Implication and Attraction 256 34.6 Confidence Decay 257 34.6.1 An Example 258 34.7 Why is PLN a Good Idea' 260 35 Spatiotemporal Inference 263 35.1 Introduction 263 35.2 Related Work on Spatio-temporal Calculi 264 35.3 Uncertainty with Distributional Fuzzy Values 267 35.4 Spatio-temporal Inference in PLN 270 35.5 Examples 272 35.5.1 Spatiotemporal Rules 272 35.5.2 The Laptop is Safe from the Rain 273 35.5.3 Fetching the Toy Inside the Upper Cupboard 273 35.6 An Integrative Approach to Planning 275 36 Adaptive, Integrative Inference Control 277 36.1 Introduction 277 36.2 High-Level Control Mechanisms 277 36.2.1 The Need for Adaptive Inference Control 278 36.3 Inference Control in PLN 279 36.3.1 Representing PLN Rules as GroundedSchemallodes 279 36.3.2 Recording Executed PLN Inferences in the Atomspace 279 36.3.3 Anatomy of a Single Inference Step 280 36.3.4 Basic Forward and Backward Inference Steps 281 36.3.5 Interaction of Forward and Backward Inference 282 36.3.6 Coordinating Variable Bindings 282 36.3.7 An Example of Problem Decomposition 284 36.3.8 Example of Casting a Variable Assignment Problem as an Optimization Problem 284 36.3.9 Backward Chaining via Nested Optimization 285 36.4 Combining Backward and Forward Inference Steps with Attention Allocation to Achieve the Same Effect as Backward Chaining (and Even Smarter Inference Dynamics) 288 36.4.1 Breakdown into MindAgents 289 36.5 Hebbian Inference Control 289 36.6 Inference Pattern Mining 293 36.7 Evolution As an Inference Control Scheme 293 36.8 Incorporating Other Cognitive Processes into Inference 294 36.9 PLN and Bayes Nets 295 EFTA00624142
xtiii Contents 37 Pattern Mining 297 37.1 Introduction 297 37.2 Finding Interesting Patterns via Program Learning 298 37.3 Pattern Mining via Frequent/Surprising Subgraph Mining 299 37.4 Fishgram 300 37.4.1 Example Patterns 300 37.4.2 The Fishgram Algorithm 301 37.4.3 Preprocessing 302 37.4.4 Search Process 303 37.4.5 Comparison to other algorithms 304 38 Speculative Concept Formation 305 38.1 Introduction 305 38.2 Evolutionary Concept Formation 306 38.3 Conceptual Blending 308 38.3.1 Outline of a CogPrime Blending Algorithm 310 38.3.2 Another Example of Blending 311 38.4 Clustering 312 38.5 Concept Formation via Formal Concept Analysis 312 38.5.1 Calculating Membership Degrees of New Concepts 313 38.5.2 Forming New Attributes 313 38.5.3 Iterating the Fuzzy Concept Formation Process 314 Section VI Integrative Learning 39 Dimensional Embedding 319 39.1 Introduction 319 39.2 Link Based Dimensional Embedding 320 39.3 Harel and Koren's Dimensional Embedding Algorithm 322 39.3.1 Step 1: Choosing Pivot Points 322 39.3.2 Step 2: Similarity Estimation 323 39.3.3 Step 3: Embedding 323 39.4 Embedding Based Inference Control 323 39.5 Dimensional Embedding and InheritanceLinks 325 40 Mental Simulation and Episodic Memory 327 40.1 Introduction 327 40.2 Internal Simulations 328 40.3 Episodic Memory 328 41 Integrative Procedure Learning 333 41.1 Introduction 333 41.1.1 The Diverse Technicalities of Procedure Learning in CogPrime 334 41.2 Preliminary Comments on Procedure Map Encapsulation and Expansion 336 41.3 Predicate Schematization 337 41.3.1 A Concrete Example 339 41.4 Concept-Driven Schema and Predicate Creation 340 41.4.1 Concept-Driven Predicate Creation 340 EFTA00624143
Contents xix 41.4.2 Concept-Driven Schema Creation 341 41.5 Inference-Guided Evolution of Pattern-Embodying Predicates 342 41.5.1 Rewarding Surprising Predicates 342 41.5.2 A More Formal Treatment 344 41.6 PredicateNode Mining 345 41.7 Learning Schema Maps 346 41.7.1 Goal-Directed Schema Evolution 347 41.8 Occam's Razor 349 42 Map Formation 351 42.1 Introduction 351 42.2 Map Encapsulation 353 42.3 Atom and Predicate Activity Tables 355 42.4 Mining the AtomSpace for Maps 356 42.4.1 Frequent Itemset Mining for Map Mining 357 42.4.2 Evolutionary Map Detection 359 42.5 Map Dynamics 359 42.6 Procedure Encapsulation and Expansion 360 42.6.1 Procedure Encapsulation in More Detail 361 42.6.2 Procedure Encapsulation in the Human Brain 361 42.7 Maps and Focused Attention 362 42.8 Recognizing and Creating Self-Referential Structures 363 42.8.1 Encouraging the Recognition of Self-Referential Structures in the AtomSpace 364 Section VII Communication Between Human and Artificial Minds 43 Communication Between Artificial Minds 369 43.1 Introduction 369 43.2 A Simple Example Using a PsyneseVocabulary Server 371 43.2.1 The Psynese Match Schema 373 43.3 Psynese as a Language 373 43.4 Psynese Mindplexes 374 43.4.1 AGI Mindplexes 375 43.5 Psynese and Natural Language Processing 376 43.5.1 Collective Language Learning 378 44 Natural Language Comprehension 379 44.1 Introduction 379 44.2 Linguistic Atom Types 381 44.3 The Comprehension and Generation Pipelines 382 44.4 Parsing with Link Grammar 383 44.4.1 Link Grammar vs. Phrase Structure Grammar 385 44.5 The RelEx Framework for Natural Language Comprehension 386 44.5.1 RelEx2Frame: Mapping Syntactico-Semantic Relationships into FrameNet Based Logical Relationships 387 44.5.2 A Priori Probabilities For Rules 389 44.5.3 Exclusions Between Rules 389 EFTA00624144
xx Contents 44.5.4 Handling Multiple Prepositional Relationships 390 44.5.5 Comparatives and Phantom Nodes 391 44.6 Frame2Atom 392 44.6.1 Examples of Frame2Atom 393 44.6.2 Issues Involving Disambiguation 396 44.7 Syn2Sem: A Semi-Supervised Alternative to RelEx and RelEx2Frame 397 44.8 Mapping Link Parses into Atom Structures 398 44.8.1 Example Training Pair 399 44.9 Making a Training Corpus 399 44.9.1 Leveraging RelEx to Create a Training Corpus 399 44.9.2 Making an Experience Based Training Corpus 399 44.9.3 Unsupervised, Experience Based Corpus Creation 400 44.10Limiting the Degree of Disambiguation Attempted 400 44.11Rule Format 401 44.11.1Example Rule 402 44.12Rule Learning 402 44.13Creating a Cyc-Like Database via Text Mining 403 44.14PROWL Grammar 404 44.14.1Brief Review of Word Grammar 405 44.14.2Word Grammar's Logical Network Model 406 44.14.3Link Grammar Parsing vs Word Grammar Parsing 407 44.14.4Contextually Guided Greedy Parsing and Generation Using Word Link Grammar 411 44.15Aspects of Language Learning 413 44.15.1 Word Sense Creation 413 44.15.2Feature Structure Learning 414 44.15.3Transformation and Semantic Mapping Rule Learning 414 44.16Experiential Language Learning 415 44.17Which Path(s) Forward? 416 45 Language Learning via Unsupervised Corpus Analysis 417 45.1 Introduction 417 45.2 Assumed Linguistic Infrastructure 419 45.3 Linguistic Content To Be Learned 421 45.3.1 Deeper Aspects of Comprehension 423 45.4 A Methodology for Unsupervised Language Learning from a Large Corpus 423 45.4.1 A High Level Perspective on Language Learning 424 45.4.2 Learning Syntax 426 45.4.3 Learning Semantics 430 45.5 The Importance of Incremental Learning 434 45.6 Integrating Language Learned via Corpus Analysis into CogPrime's Experiential Learning 435 46 Natural Language Generation 437 46.1 Introduction 437 46.2 SegSim for Sentence Generation 437 46.2.1 NLGen: Example Results 441 EFTA00624145
Contents xxi 46.3 Experiential Learning of Language Generation 444 46.4 Sem2Syn 445 46.5 Conclusion 445 47 Embodied Language Processing 447 47.1 Introduction 447 47.2 Semiosis 448 47.3 Teaching Gestural Communication 450 47.4 Simple Experiments with Embodiment and Anaphor Resolution 455 47.5 Simple Experiments with Embodiment and Question Answering 456 47.5.1 Preparing/Matching Framm 456 47.5.2 Frames2RelEx 458 47.5.3 Example of the Question Answering Pipeline 458 47.5.4 Example of the PetBrain Language Generation Pipeline 459 47.6 The Prospect of Massively Multiplayer Language Teaching 460 48 Natural Language Dialogue 463 48.1 Introduction 463 48.1.1 Two Phases of Dialogue System Development 464 48.2 Speech Act Theory and its Elaboration 464 48.3 Speech Act Schemata and Triggers 465 48.3.1 Notes Toward Example SpeechActSchema 467 48.4 Probabilistic Mining of Trigger contexts 471 48.5 Conclusion 473 Section VIII From Here to AGI 49 Summary of Argument for the CogPrime Approach 477 49.1 Introduction 477 49.2 Multi-Memory Systems 477 49.3 Perception, Action and Environment 478 49.4 Developmental Pathways 479 49.5 Knowledge Representation 480 49.6 Cognitive Processes 480 49.6.1 Uncertain Logic for Declarative Knowledge 481 49.6.2 Program Learning for Procedural Knowledge 482 49.6.3 Attention Allocation 483 49.6.4 Internal Simulation and Episodic Knowledge 484 49.6.5 Low-Level Perception and Action 484 49.6.6 Goals 485 49.7 Fulfilling the "Cognitive Equation" 485 49.8 Occam's Razor 486 49.8.1 Mind Geometry 486 49.9 Cognitive Synergy 488 49.9.1 Synergies that Help Inference 488 49.10Synergies that Help MOSES 489 49.10.1Synergies that Help Attention Allocation 489 49.10.2Further Synergies Related to Pattern Mining 489 EFTA00624146
xxii Contents 49.10.3Synergim Related to Map Formation 490 49.11Emergent Structures and Dynamics 490 49.12Ethical AGI 491 49.13Toward Superhuman General Intelligence 492 49.13.1Conclusion 492 50 Build Me Something I Haven't Seen: A CogPrime Thought Experiment 495 50.1 Introduction 495 50.2 Roles of Selected Cognitive Processes 496 50.3 A Semi-Narrative Treatment 506 50.4 Conclusion 509 A Glossary 511 A.1 List of Specialized Acronyms 511 A.2 Glossary of Specialized Terms 512 References 529 EFTA00624147
Section I Architectural and Representational Mechanisms EFTA00624148
EFTA00624149
Chapter 19 The OpenCog Framework 19.1 Introduction The primary burden of this book is to explain the CogPrime architecture for AGI - the broad outline of the design, the main dynamics it's intended to display once complete, and the reasons why we believe it will be capable of leading to general intelligence at the human level and beyond. The crux of CogPrime lies in its learning algorithms and how they are intended to interact together synergetically, making use of CogPrime's knowledge representation and other tools. Before we can get to this, however, we need to elaborate some of the "plumbing" within which this learning dynamics occurs. We will start out with a brief description of the OpenCog frame- work in which implementation of CogPrime has been, gradually and incrementally, occurring for the last few years. 19.1.1 Layers of Abstraction in Describing Artificial Minds There are multiple layers intervening between a conceptual theory of mind and a body of source code. How many layers to explicitly discuss is a somewhat arbitrary decision, but one way to picture it is exemplified in Table 19.1. In Part 1 of this work we have concerned ourselves mainly with levels 5 and 6 in the table: mathematical/conceptual modeling of cognition and philosophy of mind (with occasional forays into levels 3 and 4). Most of Part 2, on the other hand, deals with level 4 (mathematical/concep- tual AI design), verging into level 3 (high-level software design). This chapter however will focus on somewhat lower-level material, mostly level 3 with sonic dips into level 2. We will describe the basic architecture of CogPrime as a software system, implemented as "OpenCogPrime" within the OpenCog Framework (OCF). The reader may want to glance back at Chapter 6 of Part 1 before proceeding through this one, to get a memory-refresh on basic CogPrime terminology. Also, OpenCog and OpenCogPrime are open-source, so the reader who wishes to dig into the source code (mostly C++, some Python and Scheme) is welcome to; directions to find the code are on the opencog . org website. 3 EFTA00624150
4 19 The OpenCog Framework Level of Abstraction Description/Example 1 Source code 2 Detailed software design 8 Software architecture Largely programming-language-independent, but not hardware-architecture-independent: much of the ma- terial in this chapter, for example, and most of the OpenCog Framework 4 Mathematical and concep- tual Al design e.g., the sort of characterization of CogPrime given in most of this Part of this book 5 Abstract mathematical mod- eling of cognition e.g. the SRAM model discussed in chapter 7 of Part 1, which could be used to inspire or describe many different Al systems 6 Philosophy of mind e.g. Patternism, the Mind-World Correspondence Principle Table 19.1: Levels of abstractions in CogPrime's implementation and design 19.1.2 The OpenCog Framework The OpenCog Framework forms a bridge between the mathematical structures and dynamics of CogPrime's concretely implemented mind, and the nitty-gritty realities of modern computer technology. While CogPrime could in principle be implemented in a quite different infrastruc- ture, in practice the CogPrime design has been developed closely in conjunction with OpenCog, so that a qualitative understanding of the nature of the OCF is fairly necessary for an under- standing of how CogPrime is intended to function, and a detailed understanding of the OCF is necessary for doing concrete implementation work on CogPrime. Marvin Minsky, in a personal conversation with one of the authors (Goertzel), once expressed the opinion that a human-level general intelligence could probably be implemented on a 486 PC, if we just knew the algorithm. We doubt this is the case — at least not unless the 486 PC were supplied with masses of external memory and allowed to proceed much, much slower than any human being - and it is certainly not the case for CogPrime. By current computing hardware standards, a CogPrime system is a considerable resource hog. And it will remain so for a number of years, even considering technology progress. It is one of the jobs of the OCF to manage the system's gluttonous behavior. It is the software layer that abstracts the real world efficiency compromises from the rest of the system; this is why we call it a "Mind OS": it provides services, rules, and protection to the Atoms and cognitive processes (see Section 19.4) that live on top of it, which are then allowed to ignore the software architecture they live on. And so, the nature of the OCF is strongly influenced by the quantitative requirements ha- posed on the system, as well as the general nature of the structure and dynamics that it must support. The large number and great diversity of Atoms needed to create a significantly intelli- gent CogPrime, demands that we pay careful attention to such issues as concurrent, distributed processing, and scalability in general. The number of Nodes and Links that we will need in order to create a reasonably complete CogPrime is still largely unknown. But our experiments with learning, natural language processing, and cognition over the past few years have given us an intuition for the question. We currently believe that we are likely to need billions - but probably not trillions, and almost surely not quadrillions - of Atoms in order to achieve a high degree of general intelligence. Hundreds of millions strikes us as possible but overly optimistic. EFTA00624151
19.2 The OpenCog Architecture 5 In fact we have already run CogPrime systems utilizing hundreds of millions of Atoms, though in a simplified dynamical regime with only a couple very simple processes acting on most of them. The operational infrastructure of the OCF is an area where pragmatism must reign over ide- alism. What we describe here is not the ultimate possible "mind operating system" to underlie a CogPrime system, but rather a workable practical solution given the hardware, networking and software infrastructure readily available today at reasonable prices. Along these lines, it must be emphasized that the ideas presented in this chapter are the result of over a decade of practical experimentation by the authors and their colleagues with implementations of related software systems. The journey began in earnest in 1997 with the design and implementation of the Webmind AI Engine at Intelligenesis Corp., which itself went through a few major design revisions; and then in 2001-2002 the Novamente Cognition Engine was architected and imple- mented, and evolved progressively until 2008, when a subset of it was adapted for open-sourcing as OpenCog. Innumerable mistakes were made, and lessons learned, along this path. The OCF as described here is significantly different. and better, than these previous architectures, thanks to these lessons, as well as to the changing lamLscape of concurrent, distributed computing over the past few years. The design presented here reflects a mix of realism and idealism, and we haven't seen fit here to describe all the alternatives that were pursued on the route to what we present. We don't claim the approach we've chosen is ideal, but it's in use now within the OpenCog sys- tem, and it seems both workable in practice and capable of effectively supporting the entire CogPrime design. No doubt it will evolve in some respects as implementation progresses; one of the principles kept in mind during the design and development of OpenCog was modular- ity, enabling substantial modifications to particular parts of the framework to occur without requiring wholesale changes throughout the codebase. 19.2 The OpenCog Architecture 19.2.1 OpenCog and Hardware Models The job of the OCF Ls closely related to the nature of the hardware on which it runs. The ideal hardware platform for CogPrime would be a massively parallel hardware architecture, in which each Atom was given its own processor and memory. The closest thing would have been the Connection Machine II Ii1891: a CM5 was once built with 64000 processors and local RAM for each processor. But even 64000 processors wouldn't be enough for a highly intelligent CogPrime to run in a fully parallelized manner, since we're sure we need more than 64000 Atoms. Connection Machine style hardware seems to have perished in favor of more standard SNIP (Symmetric Multi-Processing) machines. It is true that each year we see SNIP machines with more and more processors on the market, and more and more cores per processor. However, the state of the art is still in the hundreds of cores range, many orders of magnitude from what would be necessary for a one Atom per processor CogPrime implementation. So, at the present time, technological and financial reasons have pushed us to implement the OpenCog system using a relatively mundane and standard hardware architecture. If the CogPrime project is successful in the relatively near term, the first human-level OpenCogPrime system will most likely live on a network of high-end commodity SMP machines. These are EFTA00624152
6 19 The OpenCog Ftamework machines with dozens of gigabytes of RAM and several processor cores, perhaps dozens but not thousands. A highly intelligent CogPrime would require a cluster of dozens and possibly hundreds or thousands of such machines. We think it's unlikely that tens of thousands will be required, and extremely unlikely that hundreds of thousands will be. Given this sort of architecture, we need effective ways to swap Atoms back and forth be- tween disk and RAM, and carefully manage the allocation of processor time among the various cognitive processes that demand it. The use of a widely-distributed network of weaker ma- chines for peripheral processing is a serious possibility, and we have some detailed software designs addressing this option: but for the near future we believe that this can best be used as augmentation to core CogPrime processing, which must remain on a dedicated cluster. Of course, the use of specialized hardware is also a viable passibility, and we have considered a host of possibilities such as • True supercomputers like those created by IBM or Cray (which these days are distributed systems, but with specialized, particularly efficient interconnection frameworks and overall control mechanisms) • GPU supercomputers such as the Nvidia Tesla (which are currently being used for vision processing systems considered for hybridization with OCP), such as DeSTIN and Hugo de Garis's Parcone • custom chips designed to implement the various CogPrime algorithms and data structures in hardware • More speculatively, it might be possible to use evolutionary quantum computing or adiabatic quantum computing a la Dwave (ht tp : //dwave . corn) to accelerate CogPrime procedure learning. All these possibilities and many more are exciting to envision, but the CogPrime architecture does not require any of them in order to be successful. 19.2.2 The Key Components of the OpenCog Framework Given the realities of implementing CogPrime on clustered commodity servers, as we have seen above, the three key questions that have to be answered in the OCF design are: 1. How do we store CogPrime's knowledge? 2. How do we enable cognitive processes to act on that knowledge, refining and improving it? 3. How do we enable scalable, distributed knowledge storage and cognitive processing of that knowledge? The remaining sections of this Chapter are dedicated to answering each of these questions in more detail. While the basic landscape of concurrent, distributed processing is largely the same as it was a decade ago - we're still dealing with distributed networks of multiprocessor von Neumann ma- chines - we can draw on advancements in both computer architecture and software. The former is materialized on the increasing availability of multiple real and virtual cores in commodity processors. The latter reflects the emergence of a number of tools and architectural patterns, largely thanks to the rise of "big data" problems and businesses. Companies and projects dealing EFTA00624153
19.3 The AtomSpace 7 with massive datasets face challenges that aren't entirely alike those of building CogPrime, but which share many useful similarities. These advances are apparent mostly in the architectute of the AtomSpace, a distributed knowledge store for efficient storage of hypergraphs and its use by CogPrime's cognitive dy- namics. The AtomSpace, like many NoSQL datastores, is heavily distributed, utilizing local caches for read and write operations, and a special purpose design for eventual consistency guarantees. We also attempt to minimize the complexities of multi-threading in the scheduling of cogni- tive dynamics, by allowing those to be deployed either as agents sharing a single OS process, or, preferably, as processes of their own. Cognitive dynamics communicate through message queues, which are provided by a sub-system that hides the deployment decision, so the mes- sages exchanged are the same whether delivered within a proms, to another process in the same machine, or to a process in another machine in the cluster. 19.3 The AtomSpace As alluded to above and in Chapter 13. and discussed more fully in Chapter 20 below, the foundation of CogPrime's knowledge representation is the Atom, an object that can be either a Node or a Link. CogPrime's hypergraph is implemented as the AtomSpace, a specialized datastore that comes along with an API designed specifically for CogPrime's requirements. 19.3.1 The Knowledge Unit: Atoms Atoms are used to represent every kind of knowledge in the system's memory in one way or another. The particulars of Atoms and how they represent knowledge will be discussed in later chapters; here we present only a minimal description in order to motivate the design of the AtomSpace. From that perspective, the most important properties of Atoms are: • Every Atom has an AtomHandle, which is a universal ID across a CogPrime deployment (possibly involving thousands of networked machines). The AtomHandles are the keys for acessing Atoms in the AtomSpace, and once a handle is assigned to an Atom it can't be changed or reused. • Atoms have TruthValue and AttentionValue entities associated with them, each of which are small collections of numbers; there are multiple versions of truth values, with varying degrees of detail. TruthValues are context-dependent, and useful Atoms will typically have multiple TruthValues. indexed by context. • Some Atoms are nodes, and may have names. • Atoms that are links will have a list of targets, of variable size (as in CogPrime's hypergraph links may connect more than two nodes). Some Atom attributes are immutable, such as Node names and, most importantly, Link targets, called outgoing sets in AtomSpace lingo. One can remove a Link, but not change its targets. This enables faster implementation of some neighborhood searches, as well as index- EFTA00624154
8 19 The OpenCog Ftamework Mg. Truth and attention values, on the other hand, are mutable, an essential requirement for CogPrime. For performance reasons, some types of knowledge have alternative representations. These alternative representations are necessary for space or speed reasons, but knowledge stored that way can always be translated back into Atoms in the AtomSpace as needed. So, for instance, procedures are represented as program trees in a ProcedureRepository, which allows for faster execution, but the trees can be expanded into a set of Nodes and Links if one wants to do reasoning on a specific program. 19.3.2 AtomSpace Requirements and Properties The major high-level requirements for the AtomSpace are the following ones: • Store Atoms indexed by their immutable AtomHandles as compactly as possible, while still enabling very efficient modification of the mutable properties of an Atom (TruthValues and AttentionValues). • Perform queries as fast as possible. • Keep the working set of all Atoms currently being used by CogPrime's cognitive dynamics in RAM. • Save and restore hypergraphs to disk, a more traditional SQL or non-SQL database, or other structure such as binary files, XML, etc. • Hold hypergraphs consisting of billions or trillions of Atoms, scaling up to petabytes of data. • Be transparently distributable across a cluster of machines. The design trade-offs in the AtomSpace implementation are driven by the needs of CogPrime. The datastore is implemented in a way that maximizes the performance of the cognitive dynam- ics running on top of it. From this perspective, the AtomSpace differs from most datastores, as the key decisions aren't made in terms of flexibility, consistency, reliability and other common criteria for databases. It is a very specialized database. Among the factors that motivate the AtomSpace's design, we can highlight a few: 1. Atoms tend to be small objects, with very few exceptions (links with many targets or Atoms with many different context-derived TruthValu). 2. Atom creation and deletion are common events, and occur according to complex patterns that may vary a lot over time, even for a particular CogPrime instance. 3. Atoms involved in CogPrime's cognitive dynamics at any given time need to live in RAM. However, the system still needs the ability to save sets of Atoms to disk in order to preserve RAM, and then retrive those later when they get contextually relevant. 4. Some Atoms will remain around for a really long time, others will be ephemeral and get removed shortly after they're created. Removal may be to disk, as outlined above, or plain deletion. Besides storing Atoms, the AtomSpace also contains a number of indices for fast Atom re- trieval according to several criteria. It can quickly search for Atoms given their type, importance, truth value, arity, targets (for Links), name (for Nodes), and any combination of the above. These are built-in indexes. The AtomSpace also allows cognitive processes to create their own EFTA00624155
19.3 The AtomSpace 9 indexes. based on the evaluation of a Procedure over the universe of Atoms, or a subset of that universe specified by the process responsible for the index. The AtomSpace also allows pattern matching queries for a given Atom structure template, which allows for fast search for small subgraphs displaying some desirable properties. In ad- dition to pattern matching, it provides neighborhood searches. Although it doesn't implement any graph-traversal primitives, it's easy for cognitive processes to do so on top of the pattern matching and neighborhood primitives. Note that. since CogPrime's hypergraph is quite different from a regular graph, using a graph database without modification would probably be inadequate. While it's possible to automati- cally translate a hypergraph into a regular graph, that process is expensive for large knowledge bases, and leads to higher space requirements, reducing the overall system's scalability. In terms of database taxonomy, the AtomSpace lies somewhere between a key-value store and a document store, as there is some structure in the contents of each value (an Atom's properties are well defined, and listed above), but no built-in flexibility to add more contents to an existing Atom. We will now discuss the above requirements in more detail, starting with querying the Atom- Space, followed by persistence to disk, and then handling of specific forms of knowledge that are best handled by specialized stores. 19.3.3 Accessing the Atomspace The AtomSpace provides an API, which allows the basic operations of creating new Atoms, updating their mutable properties, searching for Atoms and removing Atoms. More specifically, the API supports the following operations: • Create and store a new Atom. There are special methods for Nodes and Links, in the latter case with multiple convenience versions depending on the number of targets and other properties of the link. • Remove an Atom. This requires the validation that no Links currently point to that Atom, otherwise they'd be left dangling. • Look up one or more Atoms. This includes several variants, such as: - Look up an Atom by AtomHandle; - Look up a Node by name; - Find links with an Atom as target; - Pattern matching, i.e., find Atoms satisfying some predicate, which is designed as a "search criteria" by some cognitive process, and results in the creation of a specific index for that predicate; - Neighborhood search, i.e., find Atoms that are within some radius of a given centroid Atom; - Find Atoms by type (this can be combined with the previous queries, resulting in type specific versions); - Find Atoms by some AttentionValue criteria, such as the top N most important Atoms, or those with importance above some threshold (can also be combined with previous queries); EFTA00624156
10 19 The OpenCog Framework - Find Atoms by some TruthValue criteria, similar to the previous one (can also be combined with other queries); - Find Atoms based on some temporal or spatial association, a query that relies on the specialized knowledge stores mentioned below; Queries can be combined, and the Atom type, AttentionValue and TruthValue criteria are often used as filters for other queries, preventing the result set size from exploding. • Manipulate an Atom, retrieving or modifying its AttentionValue and TruthValue. In the modification case, this causes the respective indexes to be updated. 19.3.4 Persistence In many planned CogPrime deployment scenarios, the amount of knowledge that needs to be stored is too vast to fit in RAM, even if one considers a large cluster of machines hosting the AtomSpace and the cognitive processes. The AtomSpace must then be able to persist subsets of that knowledge to disk, and reload them later when necessary. The decision of whether to keep an Atom in RAM or remove it is made based on its At- tentionValue, through the process of economic attention allocation that is the topic of Chapter 23. AttentionValue determines how important an Atom is to the system, and there are mul- tiple levels of importance. For the persistence decisions, the ones that matter are Long Term Importance (LTI) and Very Long Term Importance (VLTI). LTI is used to estimate the probability that the Atom will be necessary or useful in the not too distant future. If this value is low, below a threshold i.1, then it is safe to remove the Atom from RAM, a process called forgetting. When the decision to forget an Atom has been made, VLTI enters the picture. VLTI is used to estimate the probability that the Atom will be useful eventually at some distant point in the future. If VLTI is high enough, the forgotten Atom is persisted to disk so it can be reloaded. Otherwise, the Atom is permanently forgotten. When an Atom has been forgotten, a proxy is kept in its place. The proxy is more compact than the original Atom, preserving only a crude measure of its LTI. When the proxy's LTI increases above a second threshold is, the system understands that the Atom has become relevant again, and loads it from disk. Eventually, it may happen that the proxy doesn't become important enough over a very long period of time. In this case, the system should remove even the proxy, if its Long Term Importance (LTI) is below a third threshold i3. Other actions, usually taken by the system administrator, can cause the removal of Atoms and their proxies from RAM. For instance, in a CogPrime system managing information about a number of asers of some information system, the deletion of a user from the system would came all that user's specific Atoms to be removed. When Atoms are saved to disk and have no proxies in RAM, they can only be reloaded by the system administrator. When reloaded, they will be disconnected from the rest of the AtomSpace, and should be given special attention in order to pursue the creation of new Links with the other Atoms in the system. It's important that the values of i1, is, and is be set correctly. Otherwise, one or more of the following problems may arise: • If i1 and is are too close, the system may spend a lot of resources with saving and loading Atoms. EFTA00624157
19.3 The AtomSpace 11 • If it is set too high, important Atoms will be excluded from the system's dynamics, de- creasing its intelligence. • If i3 is set too high, the system will forget very quickly and will have to spend resources re-creating necessary but no longer available evidence. • If either it or i3 is set too low, the system will consume significantly more resources than it needs to with knowledge store, sacrificing cognitive processes. Generally, we want to enforce a degree of hysteresis for the freezing and defrosting process. What we mean is that: iz - it > et > 0 it — is > c2 > 0 This ensures that when Atoms are reloaded, their importance is still above the threshold for saving, so they will have a chance to be part of cognitive dynamics and become more important, and won't be removed again too quickly. It also ensures that saved Atoms stay in the system for a period of time before their proxies are removed and they're definitely forgotten. Another important consideration is that forgetting individual Atoms makes little sense, be- cause, as pointed out above, Atoms are relatively small objects. So the forgetting process should prioritize the removal of clusters of highly interconnected Atoms whenever possible. In that case, it's passible that a large subset of those Atoms will only have relations within the cluster, so their proxies aren't needed and the memory, savings are maximized. 19.3.5 Specialized Knowledge Stores Some specific kinds of knowledge are best stored in specialized data structures, which allow big savings in space, query time, or both. The information provided by these specialized stores isn't as flexible as it would be if the knowledge were stored in full fledged Node and Link form, but most of the time CogPrime doesn't need the fully flexible format. Translation between the specialized formats and Nodes and Links is always possible, when necessary. We note that the ideal set of specialized knowledge stores is application domain specific. The stores we have deemed necessary reflect the pro-school based roadmap towards AGI, and are likely sufficient to get as through most of that roadmap, but not sufficient nor particularly adequate for an architecture where self-modification plays a key role. These specialized stores are a pragmatic compromise between performance and formalism, and their existence and design would need to be revised once CogPrime is mostly functional. 19.3.5.1 Procedure Repository Procedural knowledge, meaning knowledge that can be used both for the selection and execution of actions, has a specialized requirement - this knowledge needs to be executable by the system. While it will be passible, and conceptually straightforward, to execute a procedure that is stored as a set of Atoms in the AtomSpace, it is much simpler, faster, and safer to rely on a specialized repository. EFTA00624158
12 19 The OpenCog Ftamework Procedural knowledge in CogPrime is stored as programs in a special-purpose LISP-like programming language called Combo. The motivation and details of this language are the subject of Chapter 21. Each Combo program is associated with a Node (a GroundedProcedureNode, to be more precise), and the AtomHandle of that Node is used to index the procedure repository, where the executable version of the program is kept, along with specifications of the necessary inputs for its evaluation and what kind of output to expect. Combo programs can also be saved to disk and loaded, like regular Atoms. There is a text representation of Combo for this purpose. Program execution can be very fast, or, in cognitive dynamics terms, very slow, if it involves interacting with the external world. Therefore, the procedure repository should also facilitate the storage of program states during the execution of procedures. Concurrent execution of many procedures is possible with no significant overhead. 19.3.5.2 3D Space Map In the AGI Preschool setting, CogPrime is embodied in a three-dimensional world (either a real one, in which it controls a robot, or a virtual one, in which it controls an avatar). This requires the efficient storage and querying of vast amounts of spatial data, including very specialized queries about the spacial interrelationship between entities. This spatial data is a key form of knowledge for CogPrime's world perception, and it also needs to be accessible during learning, action selection, and action execution. All spatial knowledge is stored in a 3D Space Map, which allows for fast queries about specific regions of the world, and for queries about the proximity and relative placement of objects and entities. It can be used to provide a coarse-grained object level perception for the AtomSpace, or it can be instrumental in supporting a lower level vision layer in which pixels or polygons are used as the units of perception. In both cases. the knowledge stored in the 3D Space Map can be translated into full-fledged Atoms and Links through the AtomHandles. One characteristic feature of spatial perception is that vast amounts of data are generated constantly, but most of it is very quickly forgotten. The mind abstracts the perceptual data into the relevant concepts, which are linked with other Atoms, and most of the underlying information can then be discarded. The process is repeated at a high frequency as long as something novel is being perceived in the world. 3D Space Map is then optimized for quick inserts and deletes. 19.3.5.3 Time Server Similarly to spatial information, temporal information poses challenges for a hypergraph-based storage. It can be much more compactly stored in specific data structures, which also allow for very fast querying. The Time Server is the specialized structure for storing and querying temportal data in CogPrime. Temporal information can be stored by any cognitive process, based on its own criteria for determining that some event should be remembered in a specific temporal context in the future. This can include the perception of specific events, or the agents participation in those. such as the first time it meets a new human teacher. It can also include a collection of concepts describing specific contexts in which a set of actions has been particularly useful. The possibilities are EFTA00624159
19.4 NlindAgents: Cognitive Processes 13 numerous, but from the Time Server perspective, all equivalent. They add up to associating a time point or time interval with a set of Atoms. The Time Server is a bi-directional storage, as AtomHandles can be used as keys, but also as objects indexed by time points or time intervals. In the former case, the Time Server tells us when an Atom was associated with temporal data. In the latter case, it tells us, for a given time point or interval. which Atoms have been marked as relevant. Temporal indexing can be based on time points or time intervals. A time point can be at any granularity: from years to sub-seconds could be useful. A time interval is simply a set of two points, the second being necessary after the first one, but their granularities not necessarily the same. The temporal indexing inside the Time Server is hierarchical, so one can query for time points or intervals in granularities other than the ones originally used when the knowledge was first stored. 19.3.5.4 System Activity Table Set The last relevant specialized store is the System Activity Table Set, which is described in more detail in Chapter 23. This set of tables records, with fine-grained temporal associations, the most important activities that take place inside CogPrime. There are different tables for recording cognitive process activity (at the level of MindAgents, to be described in the next section), for maintaining a history of the level of achievement of each important goal in the system, and for recording other important aspects of the system state. such as the most important Atoms and contexts. 19.4 MindAgents: Cognitive Processes The AtomSpace holds the system's knowledge, but those Atoms are inert. How Ls that knowledge used and useful? That is the province of cognitive dynamics. These dynamics, in a CogPrime system, can be considered on two levels. First, we have the cognitive processes explicitly programmed into CogPrime's source code. These are what we call Concretely-Implemented Mind Dynamics, or CIM-Dynamics. Their implementation in software happens through objects called MindAgents. We use the term CIM- Dynamic to discuss a conceptual cognitive process, and the term MindAgents for its actual implementation and execution dynamics. The second level corresponds to the dynamics that emerge through the system's self- organizing dynamics, based on the cooperative activity of the CIM-Dynamics on the shared AtomSpace. Most of the material in the following chapters is concerted with particular CIM-Dynamics in the CogPrime system. In this section we will simply give some generalities about the CIM- Dynamics as abstract processes and as software processes, which are largely independent of the actual Al contents of the CIM-Dynamics. In practice the CIM-Dynamics involved in a CogPrime system are fairly stereotyped in form, although diverse in the actual dynamics they induce. EFTA00624160
14 19 The Opetteog Ftamework 19.4.1 A Conceptual View of CogPrime Cognitive Processes We return now to the conceptual trichotomy of cognitive processes presented in Chapter 3 of Part 1, according to which CogPrime cognitive processes may be divided into: • Control processes; • Global cognitive processes; • Focused cognitive processes. In practical terms, these may be considered as three categories of CIM-Dynamic. Control Process CIM-Dynamics are hard to stereotype. Examples are the process of homeo- static parameter adaptation of the parameters associated with the various other CIM-Dynamics, and the CIM-Dynamics concerned with the execution of procedures, especially those whose ex- ecution is made lengthy by the interactions with the external world. Control Processes tend to focus on a limited and specialized subset of Atoms or other entities, and carry out specialized mechanical operations on them (e.g. adjusting parameters, interpreting procedures). To an extent, this may be considered a "grab bag" category containing CIM- Dynamics that are not global or focused cognitive processes according to the definitions of the latter two categories. However, it is a nontrivial observation about the CogPrime system that the CIM-Dynamics that are not global or focused cognitive processes are all explicitly concerned with system control in some way or another, so this grouping makes sense. Global and Focused Cognitive Process CIM-Dynamics all have a common aspect to their structure. Then, there are aspects in which Global versus Focused CIM-Dynamics diverge from each other in stereotyped ways. In most cases, the process undertaken by a Global or Focused CIM-Dynamic involves two parts: a selection process and an actuation process. Schematically, such a CIM-Dynamic typi- cally looks something like this: 1. Fetch a set of Atoms that it is judged will be useful to process, according to some selection process. 2. Operate on these Atoms, possibly together with previously selected ones (this is what we sometimes call the actuation process of the CIM-Dynamic). 3. Go back to step 1. The major difference between Global and Focused cognitive processes lies in the selection process. In the case of a Global process, the selection process Ls very broad, sometimes yielding the whole AtomSpace, or a significant subset of it. This means that the actuation process must be very simple, or the activation of this CIM-Dynamic mast be very infrequent. On the other hand, in the case of a Focused process, the selection process is very narrow, yielding only a small number of Atoms, which can then be processed more intensively and expensively, on a per-Atom basis. Common selection processes for Focused cognitive processes are fitness-oriented selectors, which pick one or a set of Atoms from the AtomSpace with a probability based on some nu- merical quantity associated with the atom, such as properties of TruthValue or AttentionValue. There are also more specific selection processes, which choose for example Atoms obeying some particular combination of relationships in relation to some other Atoms; say choosing only Atoms that inherit from some given Atom already being processed. There is a notion, described in the PLN book, of an Atom Structure Template; this is basically just a predicate that applies to Atoms, such as EFTA00624161
19.4 MindAgents: Cognitive Processes 15 P (X) .tv equals ((InheritanceLink X cat) AND (EvaluationLink eats(X e cheese)).tv which is a template that matches everything that inherits from cat and eats cheese. Templates like this allow a much more refined selection than the above fitness-oriented selection process. Selection processes can be created by composing a fitness-oriented process with further re- strictions, such as templates. or simpler type-based restrictions. 19.4.2 Implementation of MindAgents MindAgents follow a very, simple design. They need to provide a single method through which they can be enacted. and they should execute their actions in atomic, incremental steps, where each step should be relatively quick. This design enables collaborative scheduling of MindAgents, at the cost of allowing "opportunistic" agents to have more than their fair share of resources. We rely on CogPrime developers to respect the above guidelines, instead of trying to enforce exact resource allocations on the software level. Each MindAgent can have a set of system parameters that guide its behavior. For instance, a MindAgent dedicated to inference can provide drastically different conclusions if its parameters tell it to select a small set of Atoms for processing each time, but to spend significant time on each Atom, rather than selecting many Atoms and doing shallow inferences on each one. It's expected that multiple copies of the same MindAgent will exist in the cluster, but delivering different dynamics thanks to those parameters. In addition to their main action method, MindAgents can also communicate with other MindAgents through message queues. CogPrime has. in its runtime configuration, a list of available MindAgents and their locations in the cluster. Communications between MindAgents typically take the form of specific, one-time requests, which we call Tasks. The default action of MindAgents and the processing of Tasks constitute the cognitive dy- namics of CogPrime. Nearly everything that takes place within a CogPrime deployment is done by either a MindAgent (including the control processes), a Task, or specialized code handling AtomSpace internals or communications with the external world. We now talk about how those dynamics are scheduled. MindAgents live inside a process called a CogPrime Unit. One machine in a CogPrime cluster can contain one or more Units, and one Unit can contain one or more MindAgents. In practice, given the way the AtomSpace is distributed, which requires a control process in each machine, it typically makes more sense to have a single Unit per machine, as this enables all MindAgents in that machine to make direct function calls to the AtomSpace, instead of using more expensive inter-process communication. There are exceptions to the above guideline, to accommodate various situations: 1. Very specific MindAgents may not need to communicate with other agents, or only do so very rarely, so it makes sense to give them their own process. EFTA00624162
16 19 The Opetteog Ftamework 2. MindAgents whose implementation is a poor fit for the collaborative processing in small increments design described above also should be given their own process, so they don't interfere with the overall dynamics in that machine. 3. MindAgents whose priority is either much higher or much lower than that of other agents in the same machine should be given their own proems, so operating system-level scheduling can be relied upon to reflect those wry different priority levels. 19.4.5 Tasks It is not convenient for CogPrime to do all its work directly via the action of MindAgent objects embodying CIM-Dynamics. This is especially true for MindAgents embodying focused cognitive processes. These have their selection algorithms, which are ideally suited to guarantee that, over the long run, the right Atoms get selected and processed. This, however, doesn't address the issue that, on many occasions, it may be necessary to quickly process a specific set of Atoms in order to execute an action or rapidly respond to some demand. These actions tend to be one-time, rather than the recurring patterns of mind dynamics. While it would be possible to design MindAgents so that they could both cover their long term processing needs and rapidly respond to urgent demands, we found it much simpler to augment the MindAgent framework with an additional scheduling mechanism that we call the Task framework. In essence, this is a ticketing system, designed to handle cases where MindAgents or Schema spawn one - off tasks to be executed - things that need to be done only once, rather that repeatedly and iteratively as with the things embodied in MindAgents. For instance, grab the most important Atoms from the AtomSpace and do shallow PLN rea- soning to derive immediate conclusions from them is a natural job for a MindAgent. But do search to find entities that satisfy this particular predicate P is a natural job for a Task. Tasks have AttentionValues and target MindAgents. When a Task is created it is submitted to the appropriate Unit and then put in a priority queue. The Unit will schedule some resources to processing the more important Tasks, as we'll see next. 19.4.4 Scheduling of MindAgents and Tasks in a Unit Within each Unit we have one or more MindAgents, a Task queue and, optionally, a subset of the distributed AtomSpace. If that subset isn't held in the unit, it's held in another process running on the same machine. If there is more than one Unit per machine, their relative priorities are handled by the operating system's scheduler. In addition to the Units, CogPrime has an extra maintenance process per machine, whose job is to handle changes in those priorities as well as reconfigurations caused by MindAgent migration, and machines joining or leaving the CogPrime cluster. So, at the Unit level, attention allocation in CogPrime has two aspects: how MindAgents and Tasks receive attention from CogPrime, and how Atoms receive attention from different MindAgents and Tasks. The topic of this Section is the former. The latter is dealt with elsewhere, in two ways: EFTA00624163
19.4 MindAgents: Cognitive Processes 17 • in Chapter 23, which discusses the dynamic updating of the AttentionValue structures asso- ciated with Atoms, and how these determine how much attention various focused cognitive processes MindAgents pay to them. • in the discussion of various specific CIM-Dynamics, each of which may make choices of which Atoms to focus on in its own way (though generally making use of AttentionValue and TruthValue in doing so). The attention allocation subsystem is also pertinent to MindAgent scheduling, because it discusses dynamics that update ShortTermlmportance (STI) values associated with MindA- gents, based on the usefulness of MindAgents for achieving system goals. In this chapter, we will not enter into such cognitive matters, but will merely discuss the mechanics by which these STI values are used to control processor allocation to MindAgents. Each instance of a MindAgent has its own AttentionValue, which is used to schedule processor time within the Unit. That scheduling is done by a Scheduler object which controls a collection of worker threads, whose size is a system parameter. The Scheduler aims to allocate worker threads to the MindAgents in a way that's roughly proportional to their STI, but it needs to account for starvation, as well as the need to process the Tasks in the task queue. This is an area in which we can safely borrow from reasonably mature computer science research. The requirements of cognitive dynamics scheduling are far from unique, so this is not a topic where new ideas need to be invented for OpenCog; rather, designs need to be crafted meeting CogPrime's specific requirements based on state-of-the•art knowledge and experience. One example scheduler design has two important inputs: the STI associated with each MindAgent, and a parameter determining how much resources should go to the MindAgents vs the Task queue. In the CogPrime implementation, the Scheduler maps the MindAgent STIs to a set of priority queues, and each queue is nut a number of times per cycle. Ideally one wants to keep the number of queues small, and rely on multiple Units and the OS-level scheduler to handle widely different priority levels. When the importance of a MindAgent changes, one just has to reassign it to a new queue, which is a cheap operation that can be done between cycles. MindAgent insertions and removals are handled similarly. Finally, Task execution is currently handled via allocating a certain fixed percentage of pro- cessor time, each cycle, to executing the top Tasks on the queue. Adaptation of this percentage may be valuable in the long term but was not yet implemented. Control processes are also implemented as MindAgents, and processed in the same way as the other kinds of CIM-Dynamics, although they tend to have fairly low importance. 19.4.5 The Cognitive Cycle We have mentioned the concept of a "cycle" in the discussion about scheduling, without ex- plaining what we mean. Let's address that now. All the Units in a CogPrime cluster are kept in sync by a global cognitive cycle, whose purpose is described in Section II. We mentioned above that each machine in the CogPrime cluster has a housekeeping process. One of its tasks is to keep track of the cognitive cycle, broadcasting when the machine has finished its cycle, and listening to similar broadcasts from its counterparts in the cluster. When all the machines have completed a cycle, a global counter is updated, and each machine is then free to begin the next cycle. EFTA00624164
18 19 The OpenCog Ftamework One potential annoyance with this global cognitive cycle is that some machines may complete their cycle much faster than others, and then sit idly while the stragglers finish their jobs. CogPrime addresses this issue in two ways: • Over the long run, a load balancing process will assign MindAgents from overburdened machines to underutilized ones. The MindAgent migration process is described in the next section. • In a shorter time horizon, during which a machine's configuration is fixed, there are two heuristics to minimize the waste of processor time without breaking the overall cognitive cycle coordination: - The Task queue in each of the machine's Units can be processed more extensively than it would by default; in extreme cases, the machine can go through the whole queue. - Background process MindAgents can be given extra activations, as their activity is unlikely to throw the system out of sync, unlike with more focused and goal-oriented processes. Both heuristics are implemented by the scheduler inside each unit, which has one boolean trigger for each heuristic. The triggers are set by the housekeeping process when it observes that the machine has been frequently idle over the recent past, and then reset if the situation changes. 19.5 Distributed AtomSpace and Cognitive Dynamics As hinted above, realistic CogPrime deployments will be spread around reasonably large clus- ters of co-located machines. This section describes how this distributed deployment scenario is planned for in the design of the AtomSpace and the MindAgents, and how the cognitive dynamics take place in such a scenario. We won't review the standard principles of distributed computing here, but we will focus on specific issues that arise when the CogPrime is spread across a relatively large number of machines. The two key issues that need to be handled are: • How to distribute knowledge (i.e., the AtomSpace) in a way that doesn't impose a large performance penalty? • How to allocate resources (i.e., machines) to the different cognitive processes (MindAgents) in a way that's flexible and dynamic? 19.5.1 Distributing the AtomSpace The design of a distributed AtomSpace was guided by the following high level requirements: 1. Scale up, transparently, to clusters of dozens to hundreds of machines, without requiring a single central master server. 2. The ability to store portions of an Atom repository on a number of machines in a cluster, where each machine also runs some MindAgens. The distribution of Atoms across the ma- EFTA00624165
19.5 Distributed AtomSpace and Cognitive Dynamics 19 chines should benefit from the fact that the cognitive processes on one machine are likely to access local Atoms more often than remote ones. 3. Provide transparent access to all Atoms in RAM to all machines in the cluster, even if at different latency and performance levels. 4. For local access to Atoms in the same machine, performance should be as close as possible to what one would have in a similar, but non-distributed AtomSpace. 5. Allow multiple copies of the same Atom to exist in different machines of the cluster, but only one copy per machine. 6. As Atoms are updated fairly often by cognitive dynamics, provide a mechanism for even- tual consistency. This mechanism needs not only to propagate changes to the Atoms, but sometimes to reconcile incompatible changes, such as when two cognitive processes update an Atom's TruthValue in opposite ways. Consistency is less important than efficiency, but should be guaranteed eventually. 7. Resolution of inconsistencies should be guided by the importance of the Atoms involved, so the more important ones are more quickly resolved. 8. System configuration can explicitly order the placement of sonic Atoms to specific machines, and mark a subset of those Atoms as immovable, which should ensure that local copies are always kept. 9. Atom placement across machines, aside from the immovable Atoms, should be dynamic, rebalancing based on frequency of access to the Atom by the different machines. The first requirement follows obviously from our estimates of how many machines CogPrime will require to display advanced intelligence. The second requirement above means that we don't have two kinds of machines in the cluster, where some are processing servers and some are database servers. Rather, we prefer each machine to store some knowledge and host some processes acting on that knowledge. This design assumes that there are simple heuristic ways to partition the knowledge across the machines, resulting in allocations that, most of the time, give the MindAgents local access to the Atoms they need most often. Alas, there will always be some cases in which a MindAgent needs an Atom that isn't available locally. In order to keep the design on the MindAgents simple, this leads to the third requirement, transparency, and to the fourth one, performance. This partition design, on the other hand, means that there must be some replication of knowledge, as there will always be some Atoms that are needed often by MindAgents on differ- ent machines. This leads to requirement five (allow redundant copies of an Atom). However, as MindAgents frequently update the mutable components of Atoms, requirements six and seven are needed, to minimize the impact of conflicts on system performance while striving to guar- antee that conflicts are eventually solved, and with priority proportional to the importance of the impacted Atoms. 19.5.1.1 Mechanisms of Managing Distributed Atomspaces When one digs into the details of distributed AtomSpaces, a number of subtleties emerge. Going into these in full detail here would not be appropriate, but we will make a few comments, just to give a flavor of the sorts of issues involved. To discuss these issues clearly, some special terminology is useful. In this context, it is useful to reserve the word "Atom" for its pure, theoretical definition, viz: "a Node is uniquely determined EFTA00624166
20 19 The OpenCog Ftamework by its name. A Link is uniquely determined by its outgoing set". Atoms sitting in RAM may then be called "Realized Atoms". Thus, given a single, pure "abstract/theoretical" Atom, there might be two different Realized Atoms, on two different servers, having the same name/outgoing-set. It's OK to think of a RealizedAtom as a clone of the pure, abstract Atom, and to talk about it that way. Analogously, we might call atoms living on disk, or flying on a wire, "Serialized Atoms"; and, when need be, use specialized terms like "ZMQ-serialized atoms", or " BerkeleyDB- serialized Atoms", etc. An important and obvious coherency requirement is: "If a MindAgent asks for the Handle of an Atom at time A, and then asks, later on, for the Handle of the same Atom, it should receive the same Handle." By the "AtomSpace", in general, we mean the container(s) that are used to store the set of Atoms used in an OpenCog system, both in RAM and on disk. In the case of an Atom space that is distributed across multiple machines or other data stores, we will call each of these an "Atom space portion" Atoms and Handles Each OpenCog Atom is associated with a Handle object, which is used to identify the Atom uniquely. The Handle is a sort of "key" used, at the infrastructure level, to compactly identify the Atom. In a single-machine, non-distributed Atomspace, one can effectively just use long ints as Handles, and assign successive ints as Handles to successively created new Atoms. In a distributed Atomspace, it's a little subtler. Perhaps the cleanest approach in this case is to use a hash of the serialized Atom data as the handle for an Atom. That way, if an Atom Ls created in any portion, it will inherently have the same handle as any of its clones. The issue of Handle collisions then occurs - it is possible, though it will be rare, that two different Atoms will be assigned the same Handle via the hashing function. This situation can be identified via checking, when an Atom is imported into a portion, whether there is already some Atom in that portion with the same Handle but different fundamental aspects. In the rare occasion where this situation does occur, one of the Atoms must then have its Handle changed. Changing an Atom's handle everywhere it's referenced in RAM is not a big deal, so long as it only happens occasionally. However, some sort of global record of Handle changes should be kept, to avoid confusion in the process of loading saved Atoms from disk. If a loaded Atomspace contains Atoms that have changed Handle since the file was saved, the Atom loading process needs to know about this. The standard mathematics of hash functions collisions, shows that if one has a space of H possible Handles, one will get two Atoms with the same Handle after 1.25 x NO) tries, on average.... Rearranging this, it means we'd need a space of around N2 Handles to have a space of Handles for N possible Atoms, in which one collision would occur on average.... So to have a probability of one collision, for N possible Atoms, one would have to use a handle range up to N2. The number of bits needed to encode N2 is twice as many as the number needed to encode N. So, if one wants to minimize collisions, one may need to make Handles twice as long, thus taking up more memory. However, this memory cost can be palliated via introducing "local Handles" separate from the global, system-wide Handles. The local Handles are used internally within each local Atomspace, and then each local Atomspace contains a translation table going back and forth between local EFTA00624167
19.5 Distributed AtomSpace and Cognitive Dynamics 21 and global Handles. Local handles may be long ints, allocated sequentially to each new Atom entered into a portion. Persistence to disk would always use the global Handles. To understand the memory, tradeoffs involved in these solutions, assume that the global Handles were k times as long as the local handles... and suppose that the average Handle occurred r times in the local Atomspace. Then the memory inflation ratio of the local/global solution as opposed to a solution using only the shorter local handles, would be (1 -F k + r)/r = 1 -F (k+ 1)/r if k=2 and r=10 (each handle is used 10 times on average, which is realistic based on current real-world OpenCog Atomspaces), then the ratio is just 1.3x - suggesting that using hash codes for global Handles, and local Handles to save memory in each local AtomSpace, is acceptable memory-wise. 19.5.1.2 Distribution of Atoms Given the goal of maximizing the probability that an Atom will be local to the machines of the MindAgents that need it, the two big decisions are how to allocate Atoms to machines, and then how to reconcile the results of MindAgents actuating on those Atoms. The initial allocation of Atoms to machines may be done via explicit system configuration, for Atoms known to have different levels of importance to specific MindAgents. That is, after all, how MindAgents are initially allocated to machines as well. One may, for instance, create a CogPrime cluster where one machine (or group) focuses on visual perception, one focuses on language processing, one focuses on abstract reasoning, etc. In that case one can hard-wire the location of Atoms. What if one wants to have three abstract-reasoning machines in one's cluster? Then one can define an abstract-reasoning zone consisting of three Atom repository portions. One can hard-wire that Atoms created by MindAgents in the zone must always remain in that zone - but can potentially be moved among different portions within that zone, as well as replicated across two or all of the machines, if need be. By default they would still initially be placed in the same portion as the MindAgent that created them. However Atoms are initially placed in portions, sometimes it will be appropriate to move them. And sometimes it will be appropriate to clone an Atom, so there's a copy of it in a different portion from where it exists. Various algorithms could work for this, but the following is one simple mechanism: • When an Atom A in machine M1 is requested by a MindAgent in machine M2, then a clone of A is temporarily created in M2. • When an Atom is forgotten (due to low LTI), then a check is made if it has any clones, and any links to it are changed into links to its clones. • The LTI of an Atom may get a boost if that Atom has no clones (the amount of this boost is a parameter that may be adjusted). EFTA00624168
22 19 The OpenCog Ftamework 19.5.1.3 MindAgents and the Distributed AtomSpace In the context of a distributed AtomSpace, the interactions between MindAgents and the knowl- edge store become subtler, as we'll now discuss. When a MindAgent wants to create an Atom, it will make this request of the local AtomSpace process, which hosts a subset of the whole AtomSpace. It can, on Atom creation, specify whether the Atom is immovable or not. In the former case, it will initially only be accessible by the MindAgents in the local machine. The process of assigning the new Atom an AtomHandle needs to be taken care of, in a way that doesn't introduce a central master. One way to achieve that is to make handles hierarchical, so the higher order bits indicate the machine. This, however, means that AtomHandles are no longer immutable. A better idea is to automatically allocate a subset of the AtomHandle universe to each machine. The initial use of those AtomHandles is the privilege of that machine but, as Atoms migrate or are cloned, the handles can move through the cluster. When a MindAgent wants to retrieve one or more Atoms, it will perform a query on the local AtomSpace subset, just as it would with a single machine repository. Along with the regular query parameters, it may specify whether the request should be processed locally only, or globally. Local queries will be fast, but may fail to retrieve the desired Atoms, while global queries may take a while to return. In the approach outlined above for MindAgent dynamics and scheduling, this would just cause the MindAgent to wait until results are available. Queries designed to always return a set of Atoms can have a third mode, which is "prioritize local Atoms". In this case, the AtomSpace, when processing a query that looks for Atoms that match a certain pattern would try to find all local responses before asking other machines. 19.5.1.4 Conflict Resolution A key design decision when implementing a distributed AtomSpace is the trade-off between consistency and efficiency. There is no universal answer to this conflict, but the usage scenarios for CogPrime, current and planned, tend to fall on the same broad category as far consistency goes. CogPrime's cognitive processes are relatively indifferent to conflicts and capable of working well with outdated data, especially if the conflicts are temporary. For applications such as the AGI Preschool, it is unlikely that outdated properties of single Atoms will have a large, noticeable impact on the system's behavior; even if that were to happen on rare occasions, this kind of inconsistency is often present in human behavior as well. On the other hand, CogPrime assumes fairly fast access to Atoms by the cognitive processes, so efficiency shouldn't be too heavily penalized. The robustness against mistakes and the need for performance mean that a distributed AtomSpace should follow the principle of "eventual consistency". This means that conflicts are allowed to arise, and even to persist for a while, but a mechanism is needed to reconcile them. Before describing conflict resolution, which in CogPrime is a bit more complicated than in most applications, we note that there are two kinds of conflicts. The simple one happens when an Atom that exists in multiple machines is modified in one machine, and that change isn't immediately propagated. The less obvious one happens when some process creates a new Atom in its local AtomSpace repository, but that Atom conceptually "already exists" elsewhere in the system. Both scenarios are handled in the same way, and can become complicated when, instead of a single change or creation, one needs to reconcile multiple operations. EFTA00624169
19.5 Distributed AtomSpace and Cognitive Dynamics 23 The way to handle conflicts is to have a special purpose control process, a reconciliation MindAgent, with one copy running on each machine in the cluster. This MindAgent keeps track of all recent write operations in that machine (Atom creations or changes). Each time the reconciliation MindAgent is called, it processes a certain number of Atoms in the recent writes list. It chooses the Atoms to process based on a combination of their STI, LTI and recency of creation/change. Highest priority is given to Atoms with higher STI and LTI that have been around longer. Lowest priority is given to Atoms with low STI or LTI that have been very recently changed - both because they may change again in the very near future, and because they may be forgotten before it's worth solving any conflicts. This will be the case with most perceptual Atoms, for instance. By tuning how many Atoms this reconciliation MindAgent processes each time it's activated we can tweak the consistency vs efficiency trade-off. When the AtomReconciliation agent processes an Atom, what it does is: • Searches all the machines in the cluster to see if there are other equivalent Atoms (for Nodes, these are Atoms with the same name and type; for Links, these are Atoms with the same type and targets). • If it finds equivalent Atoms, and there are conflicts to be reconciled, such as different TruthValues or AttentionValues, the decision of how to handle the conflicts is made by a special probabilistic reasoning rule, called the Rule of Choice (see Chapter 34). Basically, this means: - It decided whether to merge the conflicting Atoms. We always merge Links, but some Nodes may have different semantics, such as Nodes representing different procedures that have been given the same name. - In the case that the two Atoms A and B should be merged, it creates a new Atom C that has all the same immutable properties as A and B. It merges their TruthValues according to the probabilistic revision rule (see Chapter 34). The AttentionValues are merged by prioritizing the higher importances. - In the case that two Nodes should be allowed to remain separate, it allocates one of them (say, B) a new name. Optionally, it also evaluates whether a SimilarityLink should be created between the two different Nodes. Another use for the reconcilitation MindAgent is maintaining approximate consistency be- tween clones, which can be created by the AtomSpace itself, a S described above in Subsection 19.5.1.2. When the system knows about the multiple clones of an Atom, it keeps note of these versions in a list, which is processed periodically by a conflict resolution MindAgent, in order to prevent the clones from drifting too far apart by the actions of local cognitive processes in each machine. 19.5.2 Distributed Processing The OCF infrastructure as described above already contains a lot of distributed processing implicit in it. However. it doesn't tell you how to make the complex cognitive processes that are part of the CogPrime design distributed unto themselves - say, how to make PLN or MOSES themselves distributed. This turns out to be quite passible, but becomes quite intricate and EFTA00624170
24 19 The OpenCog Ftamework specific depending on the particular algorithms involved. For instance, the current MOSES implementation is now highly amenable to distributed and multiprocessor implementation, but in a way that depends subtly on the specifics of MOSES and has little to do with the role of MOSES in CogPrime as a whole. So we will not delve into these topics here. Another possibility worth mentioning is broadly distributed processing, in which CogPrime intelligence is spread across thousands or millions of relatively weak machines networked via the Internet. Even if none of these machines is exclusively devoted to CogPrime, the total processing power may be massive, and massively valuable. The use of this kind of broadly distributed computing resource to help CogPrime is quite possible, but involves numerous additional control problems which we will not address here. A simple case is massive global distribution of MOSES fitness evaluation. In the case where fitness evaluation is isolated and depends only on local data, this is extremely straightforward. In the more general case where fitness evaluation depends on knowledge stored in a large Atom- Space, it requires a subtler design. wherein each globally distributed MOSES subpopulation contains a pool of largely similar genotypes and a cache of relevant parts of the AtomSpace, which is continually refreshed during the fitness evaluation process. This can work so long as each globally distributed lobe has a reasonably reliable high bandwidth, low latency connection to a machine containing a large AtomSpace. On the more mundane topic of distributed processing within the main CogPrime cluster, three points are worth discussing: • Distributed communication and coordination between MindAgents. • Allocation of machines to functional groups, and MindAgent migration. • Machines entering and leaving the cluster. 19.5.2.1 Distributed Communication and Coordination Communications between MindAgents, Units and other CogPrime components are handled by a message queue subsystem. This subsystem provides a unified API, so the agents involved are unaware of the location of their partners: distributed messages, inter-process messages in the same machine, and intra-process messages in the same Unit are sent through the same API. and delivered to the same target queues. This design enables transparent distribution of MindAgents and other components. In the simplest case, of MindAgents within the same Unit, messages are delivered almost immediately, and will be available for processing by the target agent the next time it's enacted by the scheduler. In the case of messages sent to other Units or other machines, they're delivered to the messaging subsystem component of that unit, which has a dedicated thread for message delivery. That subsystem is scheduled for processing just like any other control process, although it tends to have a reasonably high importance, to ensure speedy delivery. The same messaging API and subsystem is used for control-level communications, such as the coordination of the global cognitive cycle. The cognitive cycle completion message can be used for other housekeeping contents as well. EFTA00624171
19.5 Distributed AtomSpace and Cognitive Dynamics 25 19.5.2.2 Functional Groups and MindAgent Migration A CogPrime cluster is composed of groups of machines dedicated to various high-level cognitive tasks: perception processing, language processing, background reasoning, procedure learning, action selection and execution, goal achievement planning, etc. Each of these high-level tasks will probably require a number of machines, which we call functional groups. Most of the support needed for functional groups is provided transparently by the mechanisms for distributing the AtomSpace and by the communications layer. The main issue is how much resources (i.e., how many machines) to allocate to each functional group. The initial allocation is determined by human administrators via the system configuration - each machine in the cluster has a local configuration file which tells it exactly which processes to start, along with the collection of MindAgents to be loaded onto each process and their initial AttentionValues. Over time, however, it may be necessary to modify this allocation, adding machines to overworked or highly important functional groups. For instance, one may add more machines to the natural language and perception processing groups during periods of heavy interaction with humans in the preschool environment, while repurposing those machines to procedure learning and background inference during periods in which the agent controlled by CogPrime is resting or 'sleeping'. This allocation of machines is driven by attention allocation in much the same way that processor time is allocated to MindAgents. Functional groups can be represented by Atoms, and their importance levels are updated according to the importance of the system's top level goals, and the usefulness of each functional group to their achievement. Thus, once the agent is engaged by humans, the goals of pleasing them and better understanding them would become highly important, and would thus drive the STI of the language understanding and language generation functional groups. Once there is an imbalance between a functional group's STI and its share of the machines in the cluster, a control process CIM-Dynamic is triggered to decide how to reconfigure the cluster. This CIM-Dynamic works approximately as follows: • First, it decides how many extra machines to allocate to each sub-represented functional group. • Then, it ranks the machines not already allocated to those groups based on a combination of their workload and the aggregate STI of their MindAgents and Units. The goal is to identify machines that are both relatively unimportant and working under capacity. • It will then migrate the MindAgents of those machines to other machines in the same functional group (or just remove them if clones exist), freeing them up. • Finally, it will decide how best to allocate the new machines to each functional group. This decision is heavily dependent on the nature of the work done by the MindAgents in that group, so in CogPrime these decisions will be somewhat hardcoded, as is the set of functional groups. For instance, background reasoning can be scaled just by adding extra inference MindAgents to the new machines without too much trouble. but communicating with humans requires MindAgents responsible for dialog management, and it doesn't make sense to clone those, so it's better to just give more resources to each MindAgent without increasing their numbers. The migration of MindAgents becomes, indirectly, a key driver of Atom migration. As MindA- gents move or are cloned to new machines, the AtomSpace repository in the source machine EFTA00624172
26 19 The OpenCog Ftamework should send clones of the Atoms most recently used by these MindAgents to the target ma- chine(s), anticipating a very likely distributed request that would create those clones in the near future anyway. If the MindAgents are moved but not cloned, the local copies of those Atoms in the source machine can then be (locally) forgotten. 19.5.2.3 Adding and Removing Machines Given the support for MindAgent migration and cloning outlined above, the issue of adding new machines to the cluster becomes a specific application of the heuristics just described. When a new machine Ls added to the cluster, CogPrime initially decides on a functional group for it, based both on the importance of each functional group and on their current performance - if a functional group consistently delays the completion of the cognitive cycle, it should get more machines, for instance. When the machine is added to a functional group, it is then populated with the most important or resource starved MindAgents in that group, a decision that is taken by economic attention allocation. Removal of a machine follows a similar process. First the system checks if the machine can be safely removed from its current functional group, without greatly impacting its performance. If that's the case, the non-cloned MindAgents in that machine are distributed among the remaining machines in the group, following the heuristic described above for migration. Any local-only Atoms in that machine's AtomSpace container are migrated as well, provided their LTI is high enough. In the situation in which removing a machine M1 would have an intolerable impact on the functional group's performance, a control process selects another functional group to lose a machine 2112. Then, the NlindAgents and Atoms in M1 are migrated to M2, which goes through the regular removal process first. In principle, one might use the insertion or removal of machines to perform a global op- timization of resource allocation within the system, but that process tends to be much more expensive than the simpler heuristics we just described. We believe these heuristics can give us most of the benefits of global re-allocation at a fraction of the disturbance for the system's overall dynamics during their execution. EFTA00624173
Chapter 20 Knowledge Representation Using the Atomspace 20.1 Introduction CogPrime's knowledge representation must be considered on two levels: implicit and explicit. This chapter considers mainly explicit knowledge representation, with a focus on representation of declarative knowledge. We will describe the Atom knowledge representation, a generalized hypergraph formalism which comprises a specific vocabulary of Node and Link types, used to represent declarative knowledge but also, to a lesser extent, other types of knowledge as well. Other mechanisms of representing procedural, episodic, attentional, and intentional knowledge will be handled in later chapters, as will the subtleties of implicit knowledge representation. The AtomSpace Node and Link formalism is the most obviously distinctive aspect of the OpenCog architecture, from the point of view of a software developer building AI processes in the OpenCog framework. But yet, the features of CogPrime that are most important, in terms of our theoretical reasons for estimating it likely to succeed as an advanced AGI system, are not really dependent on the particulars of the AtomSpace representation. What's important about the AtomSpace knowledge representation is mainly that it provides a flexible means for compactly representing multiple forms of knowledge, in a way that allows them to interoperate - where by "interoperate" we mean that e.g. a fragment of a chunk of declarative knowledge can link to a fragment of a chunk of attentional or procedural knowledge; or a chunk of knowledge in one category can overlap with a chunk of knowledge in another category (as when the same link has both a (declarative) truth value and an (attentional) importance value). In short, any representational infrastructure sufficiently flexible to support • compact representation of all the key categories of knowledge playing dominant roles in human memory • the flexible creation of specialized sub-representations for various particular subtypes of knowledge in all these categories, enabling compact and rapidly manipulable expression of knowledge of these subtypes • the overlap and interlinkage of knowledge of various types, including that represented using specialized sub-representations will probably be acceptable for CogPrime's purposes. However, precisely formulating these general requirements is tricky, and is significantly more difficult than simply articulating a single acceptable representational scheme, like the current OpenCog Atom formalism. The Atom 27 EFTA00624174
28 20 Knowledge Representation Using the Atomspace formalism satisfies the relevant general requirements and has proved workable from a practical software perspective. In terms of the Mind-World Correspondence Principle introduced in Chapter 10, the impor- tant point regarding the Atom representation is that it must be flexible enough to allow the compact and rapidly manipulable representation of knowledge that has aspects spanning the multiple common human knowledge categories, in a manner that allows easy implementation of cognitive processes that will manifest the Mind-World Correspondence Principle in everyday human-like situations. The actual manifestation of mind-world correspondence Ls the job of the cognitive processes acting on the AtomSpace - the job of the AtomSpace is to be an effi- cient and flexible enough representation that these cognitive processes can manifest mind-world correspondence in everyday human contexts given highly limited computational resources. 20.2 Denoting Atoms First we describe the textual notation we'll use to denote various sorts of Atoms throughout the following chapters. The discussion will also serve to give some particular examples of cognitively meaningful Atom constructs. 20.2.1 Meta-Language As always occurs when discussing (even partially) logic-based systems, when discussing Cog- Prime there is some potential for confusion between logical relationships inside the system, and logical relationships being used to describe parts of the system. For instance, we can state as observers that two Atoms inside CogPrime are equivalent, and this is different from stating that CogPrime itself contains an Equivalence relation between these two Atoms. Our formal notation needs to reflect this difference. Since we will not be doing any fancy mathematical analyses of CogPrime structures or dynamics here, there is no need to formally specify the logic being used for the metalanguage. Standard predicate logic may be assumed. So, for example, we will say things like 1lntensionallnheritanceLink Ben monster).TruthValue.strength is .5 This is a metalanguage statement. which means that the strength field of the TruthValue object associated with the link (IntensionalInheritance Ben monster) is equal to .5. This is different than saying EquivalenceLink ExOutLink GetStrength ExOutLink GetTruthValue IntensionalInheritanceLink Ben monster NumberNode 0.5 EFTA00624175
20.2 Denoting Atoms 29 which refers to an equivalence relation represented inside CogPrime. The former refers to an equals relationship observed by the authors of the book, but perhaps never represented explicitly inside CogPrime. In the first example above we have used the C++ convention structure_variable_name.field_name for denoting elements of composite structures; this convention will be stated formally below. In the second example we have used schema corresponding to TruthValue and Strength; these schema extract the appropriate fields from the Atoms they're applied to, so that e.g. ExOutLink GetTruthValue A returns the number A.TruthValue Following a convention from mathematical logic, we will also sometimes use the special symbol I - to mean "implies in the metalanguage." For example, the first-order PLN deductive inference strength rule may be written InheritanceLink A B csAB> InheritanceLink B C <BBC> i- InheritanceLink A C <sAC> where sAC sAB sBC + (1-sAB) ( sC - se sBC ) / (1- se ) This is different from saying ForAll SA, SB, SC, SsAB, Ss8C, SsAC ExtensionalImplicationLink_HOJ AND InheritanceLink SA $8 <$sAE> InheritanceLink SB SC <$sBC> AND InheritanceLink SA SC <$sAC> $sAC SsAB SsBC (1-$2AB) ($2C - Sae $28C) / (1- SsB) which is the most natural representation of the independence-based PLN deduction rule (for strength-only truth values) as a logical statement within CogPrime. In the latter expression the variables $A, $sAB, and so forth represent actual Variable Atoms within CogPrime. In the former expression the variables represent concrete, non-Variable Atoms within CogPrime, which however are being considered as variables within the metalanguage. (As explained in the PLN book, a link labeled with "Hos" refers to a "higher order judgment", meaning a relationship that interprets its relations as entities with particular truth values. For instance, ImplicationLink_HOJ Inh SX stupid <.9> Inh SX rich <.97 EFTA00624176
30 20 Knowledge Representation Using the Atomspace means that if (Init $X stupid) has a strength of .9, then (Inh $X rich) has a strength of .9). WIKISOURCE:AtomNotation 20.2.2 Denoting Atoms Atoms are the basic objects making up CogPrime knowledge. They come in various types, and are associated with various dynamics, which are embodied in MindAgents. Generally speaking Atoms are endowed with TruthValue and AttentionValue objects. They also sometimes have names, and other associated Values as previously discussed. In the following subsections we will explain how these are notated, and then discuss specific notations for Links and Nodes, the two types of Atoms in the system. 20.2.2.1 Names In order to denote an Atom in discussion, we have to call it something. Relatedly but separately, Atoms may also have names within the CogPrime system. (As a matter of implementation, in the current OpenCog version, no Links have names; whereas, all Nodes have names, but some Nodes have a null name, which is conceptually the same as not having a name.) (name,type) pairs mast be considered as unique within each Unit within a OpenCog system, otherwise they can't be used effectively to reference Atoms. It's OK if two different OpenCog Units both have Schemallodes named "+", but not if one OpenCog Unit has two Schemallodes both named "+" - this latter situation is disallowed on the software level, and is assumed in discussions not to occur. Sonic Atoms have natural names. For instance. the Schemallode corresponding to the ele- mentary schema function + may quite naturally be named "+". The NumberNode corresponding to the number .5 may naturally be named ".5", and the CharacterNode corresponding to the character c may naturally be named "c". These cases are the minority, however. For instance, a SpecificEntityNode representing a particular instance of + has no natural name, nor does a SpecificEntityNode representing a particular instance of c. Names should not be confused with Handles. Atoms have Handles, which are unique identi- fiers (in practice, numbers) assigned to them by the OpenCog core system; and these Handles are how Atoms are referenced internally, within OpenCog, nearly all the time. Accessing of Atoms by name is a special case - not all Atoms have names, but all Atoms have Handles. An example of accessing an Atom by name is looking up the CharacterNode representing the letter "c" by its name "c". There would then be two possible representations for the word "cat": 1. this word might be associated with a ListLink - and the ListLink corresponding to "cat" would be a list of the Handles of the Atoms of the nodes named "c", "ar, and "t". 2. for expedience, the word might be associated with a WordNode named "cat." In the case where an Atom has multiple versions, this may happen for instance if the Atom is considered in a different context (via a ContextLink), each version has a VersionHandle, so that accesbing an AtomVersion requires specifying an AtomHandle plus a VersionHandle. See Chapter 19 for more information on Handles. EFTA00624177
20.2 Denoting Atoms 31 OpenCog never assigns Atoms names on its own; in fact, Atom names are assigned only in the two sorts of cases just mentioned: 1. Via preprocessing of perceptual inputs (e.g. the names of NumberNode, CharacterNodes) 2. Via hard-wiring of names for Schemallodes and PredicateNodes corresponding to built-in elementary schema (e.g. +, AND, Say) If an Atom A has a name n in the system, we may write A.name n On the other hand, if we want to assign an Atom an external name, we may make a meta- language assertion such as LI :o (InheritanceLink Ben animal) indicating that we decided to name that link LI for our discussions, even though inside OpenCog it has no name. In denoting (nameless) Atoms we may use arbitrary manes like Ll. This is more convenient than using a Handle based notation which Atoms would be referred to as 1, 3433322, etc.; but sometimes we will use the Handle notation as well. Some ConceptNodes and conceptual PredicateNode or Schemallodes may correspond with human-language words or phrases like cat, bite, and so forth. This will be the minority case; more such nodes will correspond to parts of human-language concepts or fuzzy collections of human-language concepts. In discussions in this book, however, we will often invoke the unusual case in which Atoms correspond to individual human-language concepts. This is because such examples are the easiest ones to write about and discuss intuitively. The preponderance of named Atoms in the examples in the book implies no similar preponderance of named Atoms in the real OpenCog system. It is merely easier to talk about a hypothetical Atom named "cat" than it is about a hypothetical Atom with Handle 434. It is not impossible that a OpenCog system represents "cat" as a single ConceptNode, but it is just as likely that it will represent "cat" as a map composed of many different nodes without any of these having natural names. Each OpenCog works out for itself, implicitly, which concepts to represent as single Atoms and which in distributed fashion. For another example, ListLink CharacterNode "c" CharacterNode "a" CharacterNode "t" corresponds to the character string ("c", "a", "e") and would naturally be named using the string cat. In the system itself, however, this ListLink need not have any name. 20.2.2.2 Types Atoms also have types. When it is necessary to explicitly indicate the type of an atom, we will use the keyword Type, as in EFTA00624178
32 20 Knowledge Representation Using the Atomspace A .Type InheritanceLink N_345.Type ConceptNode On the other hand, there is also a built-in schema Ha.sType which lets us say EvaluationLink HasType A InheritanceLink EvaluationLink HasType N_345 ConceptNode This covers the case in which type evaluation occurs explicitly in the system, which is useful if the system is analyzing its own emergent structures and dynamics. Another option currently implemented in OpenCog is to explicitly restrict the type of a variable using TypedVariableLink such as follows TypedVariableLink VariableNode $X VariableTypeNode •ConceptNode• Note also that we will frequently remove the suffix Link or Node from their type name, such as Inheritance Concept A Concept B instead of InheritanceLink ConceptNode A ConceptNode B 20.2.2.3 Truth Values The truth value of an atom is a bundle of information describing how true the Atom is, in one of several different senses depending on the Atom type. It is encased in a TruthValue object associated with the Atom. Most of the time, we will denote the truth value of an atom in <>'s following the expression denoting the atom. This very handy notation may be used in several different ways. A complication is that some Atoms may have CompositeTruthValues, which consist of differ- ent estimates of their truth value made by different sources, which for whatever reason have not been reconciled (maybe no process has gotten around to reconciling them, maybe they corre- spond to different truth values in different contexts and thus logically need to remain separate, maybe their reconciliation is being delayed pending accumulation of more evidence, etc.). In this case we can still assume that an Atom has a default truth value, which corresponds to the highest-confidence truth value that it has, in the Universal Context. Most frequently, the notation is used with a single number in the brackets, e.g. A <.4> to indicate that the atom A has truth value .4: or IntensionallnheritanceLink Ben monster <.5> EFTA00624179
20.2 Denoting Atoms 33 to indicate that the Intensionallnheritance relation between Ben and monster has truth value strength .5. In this case, <tv> indicates (roughly speaking) that the truth value of the atom in question involves a probability distribution with a mean of tv. The precise semantics of the strength values associated with OpenCog Atoms is described in Probabilistic Logic Networks (see Chapter 34). Please note, though: This notation does not imply that the only data retained in the system about the distribution is the single number .5. If we want to refer to the truth value of an Atom A in the context C, we can use the construct ContextLink <truth value> C Sometimes, Atoms in OpenCog are labeled with two truth value components as defined by PLN: strength and weight-of-evidence. To denote these two components, we might write IntensionalInheritanceLink Ben scary <.9,.l> indicating that there is a relatively small amount of evidence in favor of the proposition that Ben is very scary. We may also put the TruthValue indicator in a different place, e.g. using indent notation, IntensionallnheritanceLink <.9,.1> Ben scary This is mostly useful when dealing with long and complicated constructions. If we want to denote a composite truth value (whose components correspond to different "versions" of the Atom), we can use a list notation, e.g. Intensionallnheritance (<.9,.1>, <.5,.9> [h,1231,<.6,.7> (c,655)) Ben scary where e.g. <.5,.9> (h,123) denotes the TruthValue version of the Atom indexed by Handle 123. The h denotes that the AtomVersion indicated by the VersionHandle h,123 is a Hypothetical Atom, in the sense de- scribed in the PLN book. Some versions may not have any index Handles. The semantics of composite TruthValues are described in the PLN book, but roughly they are as follows. Any version not indexed by a VersionHandle is a "primary TruthValue" that gives the truth value of the Atom based on some body of evidence. A version indexed by a VersionHandle is either contextual or hypothetical, as indicated rotationally by the c or h in its VersionHandle. So, for instance, if a TruthValue version for Atom A has VersionHandle h,123 that means it denotes the truth value of Atom A under the hypothetical context represented by the Atom with handle 123. If a TruthValue version for Atom A has VersionHandle c,655 this means it denotes the truth value of Atom A in the context represented by the Atom with Handle 655. Alternately, truth values may be expressed sometimes in <L,U,b> or <L,U,b,N> format, defined in terms of indefinite probability theory as defined in the PLN book and recalled in Chapter 34. For instance, IntensionallnheritanceLink Ben scary <.7,.9,.8,20> EFTA00624180
34 20 Knowledge Representation Using the Atomapace has the semantics that There is an estimated 80% chance that after 20 more observations have been made, the estimated strength of the link will be in the interval (.7,.9). The notation may also be used to specify a TruthValue probability distribution, e.g. A <g(5,7,12)› would indicate that the truth value of A is given by distribution g with parameters (5,7,12), or A <M> where M is a table of numbers, would indicate that the truth value of A is approximated by the table M. The <> notation for truth value is an unabashedly incomplete and ambiguous notation, but it is very convenient. If we want to specify, say, that the truth value strength of IntensionalIn- heritanceLink Ben monster is in fact the number .5, and no other truth value information is retained in the system, then we need to say (Intensionallnheritance Ben monster).TruthValue= [(strength, .5)) (where a hashtable form is assumed for TruthValue objects, i.e. a list of name-value pairs). But this kind of issue will rarely arise here and the <> notation will serve us well. 20.2.2.4 Attention Values The AttentionValue object associated with an Atom does not need to be notated nearly as often as truth value. When it does however we can use similar notational methods. AttentionValues may have several components, but the two critical ones are called short- term importance (STI) and long-term importance (LTI). Furthermore, multiple STI values are retained: for each (Atom, MindAgent) pair there may be a Mind-Agent-specific STI value for that Atom. The pragmatic import of these values will become clear in a later chapter when we discuss attention allocation. Roughly speaking, the long-term importance is used to control memory usage: when memory gets scarce, the atoms with the lowest LTI value are removed. On the other hand, the short-term importance is used to control processor time allocation: MindAgents, when they decide which Atoms to act on, will generally, but not only, choose the ones that have proved most useful to them in the recent past, and additionally those that have been useful for other MindAgents in the recent past. We will use the double bracket <c>> to denote attention value (in the rare cases where such denotation is necesbary). So, for instance, Cow_7 <<.5» will mean the node Cow_7 has an importance of .5; whereas, Cow_7 <<STI,•.1, LTI - .8» or simply Cow_7 <<.1, .8>> will mean the node Cow_7 has short-term importance = .1 and long-term importance = .8 . Of course, we can also use the style EFTA00624181
20.3 Representing Functions and Predicates 35 (IntensionalInheritanceLink Ben monster).AttentionValue = [(STI,.1), (LTI, .8)) where appropriate. 20.2.2.5 Links Links are represented using a simple notation that has already occurred many times in this book. For instance, Inheritance A Similarity A B Note that here the symmetry or otherwise of the link is not implicit in the notation. Similar- ityLinks are symmetrical, InheritanceLinks are not. When this distinction is necessary, it will be explicitly made. WIKISOURCE:FunctionNotation 20.3 Representing Functions and Predicates Schemallodes and PredicateNodes contain functions internally; and Links may also usefully be considered as functions. We now briefly discuss the representations and notations we will use to indicate functions in various contexts. Firstly, we will make some use of the currying notation drawn from combinatory, logic, in which adjacency indicates function application. So, for instance, using currying, f x means the function f evaluated at the argument x; and (f x y) means (f(x))(y). If we want to specify explicitly that a block of terminology is being specified using currying we will use the notation @Impression], for instance ia[f x y z) means (Cf (WW1 (z) We will also frequently use conventional notation to refer to functions, such as f(x,y). Of course, this is consistent with the currying convention if (x,y) is interpreted as a list and f is then a function that acts on 2-element lists. We will have many other occasions than this to use list notation. Also, we will sometimes use a non-curried notation, most commonly with Links, so that e.g. InheritanceLink x y does not mean a curried evaluation but rather means InheritanceLink(x,y). EFTA00624182
36 20 Knowledge Representation Using the Atomspace 20.3.0.6 Execution Output Links In the case where f refers to a schema, the occurrence of the combination f x in the system is represented by ExOutLink f x or graphically f x e Note that, just as when we write f xl we mean to apply f to the result of applying g to x, similarly when we write ExOutLink f (ExOutLink g x) we mean the same thing. So for instance EvaluationLink (ExOutLink g x) y <.8> means that the result of applying g to x is a predicate r, so that r(y) evaluates to 'Rue with strength .8. This approach, in its purest incarnation, does not allow multi-argument schemata. Now, multi-argument schemata are never actually necessary, because one can use argument currying to simulate multiple arguments. However, this is often awkward, and things become simpler if one introduces an explicit tupling operator, which we call ListLink. Simply enough, ListLink Al ... An denotes an ordered list (Al, ..., An) 20.3.1 Execution Links ExecutionLinks give the system an easy way to record acts of schema execution. These are ternary links of the form: Schemallode: S Atom: A, ExecutionLink S A In words, this says the procedure represented by Schemallode S has taken input A and produced output B. There may also be schemata that do not take output, or do not take input. But these are treated as PredicateNodes, to be discussed below; their activity is recorded by EvaluationLinIcs, not ExecutionLinks. The TruthValue of an ExecutionLink records how frequently the result encoded in the Exe- cutionLink occurs. Specifically, EFTA00624183
20.3 Representing Functions and Predicates 37 • the TruthValue of (ExecutionLink S A B) tells you the probability of getting B as output, given that you have run schema S on input A • the TruthValue of (ExecutionLink S A) tells you the probability that if S is run, it is run on input A Often it is useful to record the time at which a given act of schema execution was carried out; in that case one ruses the atTime link, writing e.g. atTimeLink ExecutionLink S A B where T is a TimeNode, or else one uses an implicit method such as storing the time-stamp of the ExecutionLink in a core-level data-structure called the TimeServer. The implicit method is logically equivalent to explicitly using atTime, and is treated the same way by PLN inference, but provides significant advantages in terms of memory usage and lookup speed. For purposes of logically reasoning about schema, it is useful to create binary links repre- senting ExecutionLinks with some of their arguments fixed. We name these as follows: ExecutionLink) A B means: X so that ExecutionLink X A B ExecutionLink2 A B means: X so that ExecutionLink A X B ExecutionLink3 A B means: X so that ExecutionLink A B X Finally, a Schemallode may be associated with a structure called a Graph. Where S is a Schemallode, Graph(S) (x,y): ExecutionLink S x y ) Sometimes, the graph of a Schemallode may be explicitly embodied as a ConceptNode; other times, it may be constructed implicitly by a MindAgent in analyzing the Schemallode (e.g. the inference MindAgent). Note that the set of ExecutionLinks describing a Schemallode may not define that SchemaN- ode exactly, because some of them may be derived by inference. This means that the model of a Schemallode contained in its ExecutionLinks may not actually be a mathematical function, in the sense of assigning only one output to each input. One may have ExecutionLink S X A c.5> ExecutionLink S X El c.S> meaning that the system does not know whether S(X) evaluates to A or to B. So the set of ExecutionLinks modeling a Schemallode may constitute a non-function relation, even if the schema inside the Schemallode is a function. Finally, what of the case where f x represents the action of a built-in system function f on an argument x? This is an awkward case that would not be necessary if the CogPrime system were revised so that all cognitive functions were carried out using Schemallod . However, in the current CogPrime version, where most cognitive functions are carried out using C++ MindAgent objects, if we want CogPrime to study its own cognitive behavior in a statistical way, we need BuiltInSchemallodes that refer to MindAgents rather than to ComboTrees (or else, we need to represent MindAgents using ComboTrees, which will become practicable once we have a sufficiently efficient Combo interpreter). The semantics here is thus basically the same as where f refers to a schema. For instance we might have EFTA00624184
38 20 Knowledge Representation Using the Atomspace ExecutionLink FirstOrderlnferenceMindAgent (LI, L2) L3 where LI, L2 and L3 are links related by LI L2 L3 according to the first-order PLN deduction rules. 20.3.1.1 Predicates Predicates are related but not identical to schema, both conceptually and notationally. Predi- cateNodes involve predicate schema which output TruthValue objects. But there is a difference between a Schemallode embodying a predicate schema and a PredicateNode, which is that a PredicateNode doesn't output a TruthValue, it adjusts its own TruthValue as a result of the output of its own internal predicate schema. The record of the activity of a PredicateNode is given not by an ExecutionLink but rather by an: EvaluationLink P A <tv> where P is a PredicateNode, A is its input, and <tv> is the truth value assumed by the EvaluationLink corresponding to the PredicateNode being fed the input A. There is also the variant EvaluationLink P <tv> for the case where the PredicateNode P embodies a schema that takes no inputs'. A simple example of a PredicateNode is the predicate GreaterThan. In this case we have, for instance EvaluationLink GreaterThan S 6 <0> EvaluationLink GreaterThan S 3 <I> and we also have: EquivalenceLink GreaterThan ExOutLink And ListLink ExOutLink Not LessThan ExOutLink Not EqualTo Note how the variables have been stripped out of the expression, see the PLN book for more explanation about that. We will also encounter many commonsense-semantics predicates such as isMale, with e.g. actually, if P does take some inputs, EvaluationLink P <tv> is defined too and tv corresponds to the average of P(X) over all inputs X, this is explained in more depth in the PLN book. EFTA00624185
20.3 Representing Functions and Predicates 39 EvaluationLink isMale Sen_Goertzel <1> Schemata that return no outputs are treated as predicates, and handled using Evaluation- Links. The truth value of such a predicate, as a default, is considered as True if execution is successful, and False otherwise. And, analogously to the Graph operator for Schemallodes, we have for PredicateNodes the SatisfyingSet operator, defined so that the SatisfyingSet of a predicate is the set whose members are the elements that satisfy the predicate. Formally, that is: S SatisfyingSet P means Truthvalue(MemberLink X s) equals TruthValue(EvaluationLink P X) This operator allows the system to carry out advanced logical operations like higher-order inference and unification. 20.3.2 Denoting Schema and Predicate Variables CogPrime sometimes uses variables to represent the expressions inside schemata and predicates, and sometimes uses variable-free, combinatory-logic-based representations. There are two sorts of variables in the system, either of which may exist either inside compound schema or predi- cates, or else in the AtomSpace as VariableNodes: It is important to distinguish between two sorts of variables that may exist in CogPrime: • Variable Atoms, which may be quantified (bound to existential or universal quantifiers) or unquantified • Variables that are used solely as function-arguments or local variables inside the "Combo tree" structures used inside some ProcedureNodes (PredicateNodes or Schemallodes) (to be described below), but are not related to Variable Atoms Examples of quantified variables represented by Variable Atoms are $X and $Y in: ForAll SX <.0001> ExtensionallmplicationLink ExtensionallnheritanceLink $X human ThereExists SY AND ExtensionallnheritanceLink SY human EvaluationLink parent_of (SX, SY) An example of an unquantified Variable Atom is $X in ExtensionallmplicationLink <.3> ExtensionallnheritanceLink $X human ThereExists SY AND ExtensionallnheritanceLink $Y human EvaluationLink parent_of (SX, SY) EFTA00624186
40 20 Knowledge Representation Using the Atomspace This ImplicationLink says that 30% of humans are parents: a more useful statement than the ForAll Link given above, which says that it is very very unlikely to be true that all humans are parents. We may also say, for instance, SatisfyingSet( EvaluationLink eats (cat, $X) to refer to the set of X so that eats(cat, X). On the other hand, suppose we have the implication Implication Evaluation f $X Evaluation f ExOut reverse $X where f is a PredicateNode embodying a mathematical operator acting on pairs of NumberN- odes, and reverse is an operator that reverses a list. So, this implication says that the f predicate is commutative. Now, suppose that f is grounded by the formula f(a,b) a (a > b - 1) embodied in a Combo Tree object (which is not commutative but that is not the point), stored in the ProcedureRepository and linked to the PredicateNode for f. These f-internal which are expressed here using the letters a and b, are not VariableNodes in the CogPrime AtomTable. The notation we use for these within the textual Combo language, that goes with the Combo Tree formalism, is to replace a and bin this example with #1 and #2, so the above grounding would be denoted f -> Cul > #2 - 1) version, it is assumed that type restrictions are always crisp, not probabilistically truth- valued. This assumption may be revisited in a later version of the system. 20.3.2.1 Links as Predicates It. is conceptually important to recognize that CogPrime link types may be interpreted as predicates. For instance, when one says InheritanceLink cat animal <.8> indicating an Inheritance relation between cat and animal with a strength .8, effectively one is declaring that one has a predicate giving an output of .8. Depending on the interpretation of InheritanceLink as a predicate, one has either the predicate InheritanceLink cat $X acting on the input animal or the predicate InheritanceLink $X animal acting on the input EFTA00624187
20.3 Representing Functions and Predicates 41 cat or the predicate InheritanceLink SX SY acting on the list input (cat, animal) This means that, if we wanted to, we could do away with all Link types except OrderedLink and UnorderedLink, and represent all other Link types as PredicateNodes embodying appro- priate predicate schema. This is not the approach taken in the current codebase. However, the situation is somewhat similar to that with CIM-Dynamics: • In future we will likely create a revision of CogPrime that regularly revises its own vocabu- lary, of Link types, in which case an explicit representation of link types as predicate schema will be appropriate. • In the shorter term, it can be useful to treat link types as virtual predicates, meaning that one lets the system create Schemallodes corresponding to them, and hence do some meta level reasoning about its own link types. 20.3.3 Variable and Combinator Notation One of the most important aspects of combinatory logic, from a CogPrime perspective, is that it allows one to represent arbitrarily complex procedures and patterns without using variables in any direct sense. In CogPrime, variables are optional, and the choice of whether or how to use them may be made (by CogPrime itself) on a contextual basis. This section deals with the representation of variable expressions in a variable-free way, in a CogPrime context. The general theory underlying this is well-known, and is usually expressed in terms of the elimination of variables from lambda calculus expressions (lambda lifting). Here we will not present this theory but will restrict ourselves to presenting a simple, hopefully illustrative example, and then discussing some conceptual implications. 20.3.3.1 Why Eliminating Variables is So Useful Before launching into the specifics, a few words about the general utility of variable-free ex- pression may be worthwhile. Some expressions look simpler to the trained human eye with variables, and some look simpler without them. However, the main reason why eliminating all variables from an expression is sometimes very useful, is that there are automated program-manipulation techniques that work much more nicely on programs (schemata, in CogPrime lingo) without any variables in them. As will he discussed later (e.g. Chapter 33 on evolutionary learning, although the same process is also useful for supporting probabilistic reasoning on procedures), in order to mine patterns among multiple schema that all try to do the same (or related) things, we want to put. schema into a kind of "hierarchical normal form." The normal form we wish to use generalizes EFTA00624188
42 20 Knowledge Representation Using the Atomspace Holman's Elegant Normal Form (which is discussed in Moshe Looks' PhD thesis) to program trees rather than just Boolean trees. But, putting computer programs into a useful, nicely-hierarchically-structured normal form is a hard problem - it requires one to have a pretty nice and comprehensive set of program transformations. But the only general, robust, systematic program transformation methods that exist in the computer science literature require one to remove the variables from one's programs, so that one can use the theory of functional programming (which ties in with the theory of monads in category, theory, and a lot of beautiful related math). In large part, we want to remove variables so we can use functional programming tools to normalize programs into a standard and pretty hierarchical form, in order to mine patterns among them effectively. However, we don't always want to be rid of variables, because sometimes, from a logical reasoning perspective, theorem-proving is easier with the variables in there. (Sometimes not.) So, we want to have the option to use variables, or not. 20.3.3.2 An Example of Variable Elimination Consider the PredicateNode AND InheritanceLink X cat eats X mice Here we have used a syntactically sugared representation involving the variable X. How can we get rid of the X? Recall the C combinator (from combinatory logic), defined by Cfxy fyx Using this tool. InheritanceLink X cat becomes C InheritanceLink cat X and eats X mice becomes C eats mice X so that overall we have AND C InheritanceLink cat C eats mice where the C combinators essentially give instructions as to where the virtual argument X should go. In this case the variable-free representation is basically just as simple as the variable-based representation, so there is nothing to lose and a lot to gain by getting rid of the variables. This EFTA00624189
20.3 Representing Functions and Predicates 43 won't always be the case - sometimes execution efficiency will be significantly enhanced by use of variables. WIKISOURCE:TypeInheritance 20.3.4 Inheritance Between Higher-Order Types Next, this section deals with the somewhat subtle matter of Inheritance between higher-order types. This is needed, for example, when one wants to cross over or mutate two complex schemata, in an evolutionary learning context. One encounters questions like: When mutation replaces a schema that takes integer input, can it replace it with one that takes general numerical input? How about vice versa? These questions get more complex when the inputs and outputs of schema may themselves be schema with complex higher-order types. However, they can be dealt with elegantly using some basic mathematical rules. Denote the type of a mapping from type T to type S, as T -> S. Use the shorthand inh to mean inherits from. Then the basic rule we use is that Ti -> Si inh T2 -> S2 iff T2 inh Ti Si inh S2 In other words, we assume higher-order type inheritance is countervariant. The reason is that, if RI = Ti -> Si is to be a special case of R2 = T2 -> 52, then one has to be able to use the latter everywhere one uses the former. This means that any input R2 takes, has to also be taken by RI (hence T2 inherits from Ti). And it means that the outputs R2 gives must be able to be accepted by any function that accepts outputs of RI (hence Si inherits from 52). This type of issue comes up in programming language design fairly frequently, and there are a number of research papers debating the pros and cons of countervariance versus covariance for complex type inheritance. However, for the purpose of schema type inheritance in CogPrime, the greater logical consistency of the countervariance approach holds sway. For instance, in this approach, INT -> INT is not a subtype of NO -> INT (where NO denotes FLOAT), because NO -> INT is the type that includes all functions which take a real and return an int, and an INT -> INT does not take a real. Rather, the containment is the other way around: every NO -> INT function is an example of an INT -> INT function. For example, consider the NO -> INT that takes every real number and rounds it up to the nearest integer. Considered as an INT -> INT function, this is simply the identity function: it is the function that takes an integer and rounds it up to the nearest integer. Of course. sapling of types is different, it's covariant. If one has an ordered pair whose elements are of different types, say (Ti, T2) , then we have (TI , Si) inh (T2, S2) if TI inh T2 Si inh S2 As a mnemonic formula, we may say EFTA00624190
44 20 KnowkdgeRepresentationUsingtheAtomspace (general -> specific) inherits from (specific -> general) (specific, specific) inherits from (general, general) In schema learning, we will also have use for abstract type constructions, such as (T1, T2) where T1 inherits from T2 Notationally, we will refer to variable types as Xvl, Xv2, etc., and then denote the inheritance relationships by using ntunerical indices, e.g. using [1 inh 2] to denote that Xv1 inh Xv2 So for example, (INT, VOID) inh (Xv1, Xv2) is true, because there are no restrictions on the variable types, and we can just assign Xv1 = INT, Xv2 = VOID. On the other hand, ( INT, VOID ) inh ( Xvl, Xv2 ), 1 inh 2 ] is false because the restriction Xvl inh Xv2 is imposed, but it's not true that INT inh VOID. The following list gives some examples of type inheritance, using the elementary types INT, FLOAT (FL), NUMBER (NO), CHAR and STRING (STR), with the elementary type inheri- tance relationships • INT inh NUMBER • FLOAT inh NUMBER • CHAR inh STRING • ( NO -> FL ) inh ( INT -> FL ) • ( FL -> INT ) inh ( FL -> NO ) • ( ( INT -> FL ) -> ( FL -> INT ) ) inh ( ( NO -> FL ) -> ( FL -> NO ) ) WIK- ISOURCE:AbstractSchemaManipulation 20.3.5 Advanced Schema Manipulation Now we describe some special schema for manipulating schema, which seem to be very useful in certain contexts. 20.3.5.1 Listification First, there are two ways to represent n-ary relations in CogPrime's Atom level knowledge representation language: using lists as in f_list (xl, xn) or using currying as in EFTA00624191
20.3 Representing Functions and Predicates 45 f_curry xl xn To make conversion between list and curried forms easier, we have chosen to introduce special schema (combinators) just for this purpose: listify f - f_list so that f_list (xl, xn ) f xl xn unlistify listify f - f For instance kick_curry Ben Ken denotes (kick_curry Ben) Ken which means that kick is applied to the argument Ben to yield a predicate schema applied to ICen. This is the curried style. The list style is kick List (Ben, Ken) where kick is viewed as taking as an argument the List (Ben, ICen). The conversion between the two is done by listify kick_curry kick_list unlistify kick_list - kick_curry As a more detailed example of unlistification, let us utilize a simple mathematical example, the function (X — 1)2. If we use the notations - and pow to denote Schemallodes embodying the corresponding operations, then this formula may be written in variable-free node-and-link form as ExOutLink pow ListLink ExOutLink ListLink 1 2 But to get rid of the nasty variable X, we need to first unlistify the functions pow and -, and then apply the C and B combinators a couple times to move the variable X to the front. The B combinator (see Combinatory Logic REF) is recalled below: Bfghof (g h) This is accomplished as follows (using the standard convention of left-associativity for the application operator, denoted C.g) in the tree representation given in Section 20.3.0.6) pow(-(x, 1), 2) unlistify pow (-(x, 1) 2) C (unlistify pow) 2 (-(x,1)) C (unlistify pow) 2 ((unlistify -) x 1) C (unlistify pow) 2 (C (unlistify -) 1 x) B (C (unlistify pow) 2) (C (unlistify -) 1) x EFTA00624192
46 20 Knowledge Representation Using the Atomspace yielding the final schema (C (unlistify pow) 2) (C (unlistify -) 1) By the way, a variable-free representation of this schema in CogPrime would look like ExOutLink ExOutLink B ExOutLink ExOutLink C ExOutLink unlistify pow 2 ExOutLink ExOutLink C ExOutLink unlistify 1 The main thing to be observed is that the introduction of these extra schema lets us remove the variable X. The size of the schema is increased slightly in this case, but only slightly - an increase that is well - justified by the elimination of the many difficulties that explicit variables would bring to the system. Furthermore, there is a shorter rendition which looks like ExOutLink ExOutLink B ExOutLink ExOutLink C pow_curried 2 ExOutLink ExOutLink C -_curried 1 This rendition uses alternate variants of - and pow schema, labeled -_cu rried and pow_curried, which do not act on lists but are curried in the manner of combinatory logic and Haskell. It is 13 lines whereas the variable-bearing version is 9 lines, a minor increase in length that brings a lot of operational simplification. 20.3.5.2 Argument Permutation In dealing with List relationships. there will sometimes be use for an argument-permutation operator, let us call it P, defined as follows (P p 0 (v1, vn) f (p (v1, vn )) where p is a permutation on n letters. This deals with the case where we want to say, for instance that EFTA00624193
20.3 Representing Functions and Predicates 47 Equivalence parent(x,y) child(y,x) Instead of positing variable names x and y that span the two relations parent(x,y) and child(y,x), what we can instead say in this example is Equivalence parent (P (2,11 child) For the case of two-argument functions, argument permutation is basically doing on the list level what the C combinator does in the curried function domain. On the other hand, in the case of n-argument functions with n>2, argument permutation doesn't correspond to any of the standard combinators. Finally, let's conclude with a similar example in a more standard predicate logic notation, involving both combinators and the permutation argument operator introduced above. We will translate the variable-laden predicate likes(y,x) AND likes ()cry) into the equivalent combinatory logic tree. Let us first recall the combinator S whose function is to distribute an argument over two terms. sfgx•• (f x) (g x) Assume that the two inputs are going to be given to us as a list. Now, the combinatory logic representation of this is S (B AND (B (P (2,1) likes))) likes We now show how this would be evaluated to produce the correct expression: S (B AND (B (P (2,1) likes))) likes (x,y) S gets evaluated first, to produce (B AND (B (P (2,11 likes)) (x,y)) (likes (x,y)) now the first B AND f(B (P (2,1) likes)) (x,y)) (likes (x,y)) now the second one AND ((P (2,11 likes) (x,y)) (likes (xl y)) now P AND (likes (y.x)) (likes (x. IF) ) which is what we wanted. EFTA00624194
EFTA00624195
Chapter 21 Representing Procedural Knowledge 21.1 Introduction We now turn to CogPrime's representation and manipulation of procedural knowledge. In a sense this is the most fundamental kind of knowledge - since intelligence is most directly about action selection, and it is procedures which generate actions. CogPrime involves multiple representations for procedures, including procedure maps and (for sensorimotor procedures) neural nets or similar structures. Its most basic procedural knowl- edge representation, however, is the program. The choice to use programs to represent proce- dures was made after considerable reflection — they are not of course the only choice, as other representations such as recurrent neural networks possess identical representational power, and are preferable in some regards (e.g. resilience with respect to damage). Ultimately, however, we chase programs due to their consilience with the software and hardware underlying Cog- Prime (and every other current AI program). CogPrime is a program, current computers and operating systems are optimized for executing and manipulating programs; and we humans now have many tools for formally and informally analyzing and reasoning about programs. The human brain probably doesn't represent most procedures as programs in any simple sense, but CogPrime is not intended to be an emulation of the human brain. So, the representation of programs as procedures is one major case where CogPrime deviates from the human cogni- tive architecture in the interest of more effectively exploiting its own hardware and software infrastructure. CogPrime represents procedures as programs in an internal programming language called "Combo." While Combo has a textual representation, described online at the OpenCog wiki, this isn't one of its more important aspects (and may be redesigned slightly or wholly without affecting system intelligence or architecture); the essence of Combo programs lies in their tree representation not their text representation. One could fairly consider Combo as a dialect of LISP, although it's not equivalent to any standard dialect, and it hasn't particularly been developed with this in mind. In this chapter we discuss the key concepts underlying the Combo approach to program representation, seeking to make clear at each step the motivations for doing things in the manner proposed. In terms of the overall CogPrime architecture diagram given in Chapter 6 of Part 1, this chapter is about the box labeled "Procedure Repository." The latter, in OpenCog, is a special- ized component connected to the AtomSpace, storing Combo tree representations of programs; 49 EFTA00624196
50 21 Representing Procedural Knowledge each program in the repository is linked to a Schemallode in the AtomSpace, ensuring full connectivity between procedural and declarative knowledge. 21.2 Representing Programs What is a "program" anyway? What distinguishes a program front an arbitrary representation of a procedure? The essence of programmatic representations is that they are well-specified, compact, com- binatorial, and hierarchical: • Well-specified: unlike sentences in natural language, programs are unambiguous; two distinct programs can be precisely equivalent. • Compact: programs allow us to compress data on the basis of their regularities. Accordingly, for the purposes of this chapter, we do not consider overly constrained representations such as the well-known conjunctive and disjunctive normal forms for Boolean formulae to be programmatic. Although they can express any Boolean function (data), they dramatically limit the range of data that can be expressed compactly, compared to unrestricted Boolean formulae. • Combinatorial: programs access the results of running other programs (e.g. via function application), as well as delete, duplicate, and rearrange these results (e.g., via variables or combinators). • Hierarchical: programs have intrinsic hierarchical organization, and may be decomposed into subprograms. Eric Baum has advanced a theory "under which one understands a problem when one has mental programs that can solve it and many naturally occurring variations" lBatiOtil. In this perspective - which we find an agreeable way to think about procedural knowledge, though perhaps an overly limited perspective on mind as a whole - one of the primary goals of ar- tificial general intelligence is systems that can represent, learn, and reason about such pro- grams [Bau06, B=01]. Furthermore, integrative AGI systems such as CogPrime may contain subsystems operating on programmatic representations. Would-be AGI systems with no direct support for programmatic representation will clearly need to represent procedures and proce- dural abstractions somehow. Alternatives such as recurrent neural networks have serious down- sides, including opacity and inefficiency, but also have their advantages (e.g. recurrent neural nets can be robust with regard to damage, and learnable via biologically plausible algorithms). Note that the problem of how to represent programs for an AGI system dissolves in the unrealistic case of unbounded computational resources. The solution is algorithmic information theory rha081, extended recently to the case of sequential decision theory Illut05al. The latter work defines the universal algorithmic agent AIXI, which in effect simulates all possible pro- grams that are in agreement with the agent's set of observations. While AIXI is =computable, the related agent AIM" may be computed, and is superior to any other agent bounded by time t and space 1 II Int0514. The choice of a representational language for programs' is of no consequence, as it will merely introduce a bias that will disappear within a constant number of time steps.2 I As well as a language for proofs in the case of AIXIti. 2 The universal distribution converges quickly. EFTA00624197
21.3 Representational Challenges 51 Our goal in this chapter Ls to provide practical techniques for approximating the ideal pro- vided by algorithmic probability, based on what Pei Wang has termed the assumption of in- sufficient knowledge and resources tWan061, and assuming an AGI architecture that's at least vaguely humanlike in nature, and operates largely in everyday human environments, but uses programs to represent many procedures. Given these assumptions, how programs are repre- sented is of paramount importance. as we shall see in the next two sections, where we give a conceptual fornmlation of what we mean by tractable program representations, and introduce tools for formalizing such representations. The fourth section delves into effective techniques for representing programs. A key concept throughout is syntactic-semantic correlation, meaning that programs which are similar on the syntactic level, within certain constraints will tend to also be similar in terms of their behavior (i.e. on the semantic level). Lastly, the fifth section changes direction a bit and discusses the translation of programmatic structure into declarative form for the purpases of logical inference. In the future, we will experimentally validate that these normal forms and heuristic trans- formations do in fact increase the syntactic-semantic correlation in program spaces, as has been shown so far only in the Boolean case. We would also like to explore the extent to which even stronger correlation, and additional tractability properties, can be observed when realistic probabilistic constraints on "natural" environment and task spaces are imposed. The importance of a good programmatic representation of procedural knowledge becomes quite clear when one thinks about it in terms of the Mind-World Correspondence Principle introduced in Chapter 10. That principle states, roughly, that transition paths between world- states should map naturally onto transition paths between mind-states. This suggests that there should be a natural, smooth mapping between red-world action series and the corresponding series of internal states. Where internal states are driven by explicitly given programs, this means that the transitions between internal program states should nicely mirror transitions between the states of the real world as it interacts with the system controlled by the program. The extent to which this is true will depend on the specifics of the programming language - and it will be true for a much greater extent, on the whole, if the programming language displays high syntactic-semantic correlation for behaviors that commonly occur when the program is used to control the system in the real world. So, the various technical issues mentioned above and considered below, regarding the qualities desired in a programmatic representation, are merely the manifestation of the general Mind-World Correspondence Principle in the context of procedural knowledge, under the assumption that procedures are represented as programs. The material in this chapter may be viewed as an approach to ensuring the validity of the Mind- World Correspondence principle for programmatically-represented procedural knowledge, for CogPrime systems concerned with achieving humanly meaningful goals in everyday human environments. 21.3 Representational Challenges Despite the advantages outlined in the previous section, there are a number of challenges in working with programmatic representations: • Open-endedness - in contrast to some other knowledge representations current in machine learning, programs vary in size and "shape", and there is no obvious problem-independent EFTA00624198
52 21 Representing Procedural Knowledge upper bound on program size. This makes it difficult to represent programs as points in a fixed-dimensional space, or to learn programs with algorithms that assume such a space. • Over-representation - often, syntactically distinct programs will be semantically iden- tical (i.e. represent the same underlying behavior or functional mapping). Lacking prior knowledge, many algorithms will inefficiently sample semantically identical programs re- peatedly [Loo07a, GIW.01h • Chaotic Execution - programs that are very similar, syntactically, may be very different, semantically. This presents difficulties for many heuristic search algorithms, which require syntactic and semantic distance to be correlated Ilmo07b, TVCC05]. • High resource-variance - programs in the same space vary greatly in the space and time they require to execute. It's easy to see how the latter two issues may present a challenge for mind-world correspon- dence! Chaotic execution makes it hard to predict whether a program will indeed manifest state-sequences mapping nicely to a corresponding world-sequences: and high resource-variance makes it hard to predict whether, for a given program, this sort of mapping can be achieved for relevant goals given available resources. Based on these concerns, it is no surprise that search over program spaces quickly succumbs to combinatorial explosion, and that heuristic search methods are sometimes no better than random sampling II.P02j. However, alternative representations of procedures also have their difficulties, and so far we feel the thornier aspects of programmatic representation are generally an acceptable price to pay in light of the advantages. For some special cases in CogPrime we have made a different choice - e.g. when we use DeSTIN for sensory perception (see Chapter 28 we utilize a more specialized representation comprising a hierarchical network of more specialized elements. DeSTIN doesn't have problems with resource variance or chaotic execution, though it does stiffer from over-representation. It is not very open-ended, which helps increase its efficiency in the perceptual processing domain, but may limit its applicability to more abstract cognition. In short we feel that, for general representation of cognitive procedures, the benefits of programmatic representation outweigh the casts; but for some special cases such as low-level perception and motor procedures, this may not be true and one may do better to opt for a more specialized, more rigid but less problematic representation. It would be possible to modify CogPrime to use, say, recurrent neural nets for procedure representation, rather than programs in an explicit language. However, this would rate as a rather major change in the architecture, and would cause multiple problems in other aspects of the system. For example, programs are reasonably straightforward to reason about using PLN inference, whereas reasoning about the internals of recurrent neural nets is drastically more problematic, though not impassible. The choice of a procedure representation approach for CogPrime has been made considering not only procedural knowledge in itself, but the interaction of procedural knowledge with other sorts of knowledge. This reflects the general synergetic nature of the CogPrime design. There are also various computation-theoretic issues regarding programs; however, we suspect these are not particularly relevant to the task of creating human-level AGI, though they may rear their heads when one gets into the domain of super-human, profoundly self-modifying AGI systems. For instance, in the context of the the difficulties caused by over-representation and high resource-variance, one might observe that determinations of e.g. programmatic equivalence for the former, and e.g. halting behavior for the latter, are uncomputable. But we feel that, given the assumption of insufficient knowledge and resources, these concerns dissolve into the EFTA00624199
21.4 What Makes a Representation Tractable? 53 larger issue of computational intractability and the need for efficient heuristics. Determining the equivalence of two Boolean formulae over 500 variables by computing and comparing their truth tables is trivial from a computability standpoint, but, in the words of Leonid Levin, "only math nerds would call 2500 finite" ILev9-1]. Similarly, a program that never terminates is a special case of a program that runs too slowly to be of interest to us. One of the key ideas underlying our treatment of programmatic knowledge is that, in order to tractably learn and reason about programs, an Al system must have prior knowledge of programming language semantics. That is, in the approach we advocate, the mechanism whereby programs are executed is assumed known a priori, and assumed to remain constant across many problems. One may then craft AI methods that make specific use of the programming language semantics, in various ways. Of course in the long run a sufficiently powerful AGI system could modify these aspects of its procedural knowledge representation; but in that case, according to our approach, it would also need to modify various aspects of its procedure learning and reasoning code accordingly. Specifically, we propose to exploit prior knowledge about program structure via enforcing programs to be represented in normal forms that preserve their hierarchical structure, and to be heuristically simplified based on reduction rules. Accordingly, one formally equivalent pro- gramming language may be preferred over another by virtue of making these reductions and transformations more explicit and concise to describe and to implement. The current OpenCog- Prime system uses a simple LISP-like language called Combo (which takes both tree form and textual form) to represent procedures, but this is not critical; the main point is using some language or language variant that is "tractable" in the sense of providing a context in which the semantically useful reductions and transformations we've identified are naturally expressible and easily usable. 21.4 What Makes a Representation Tractable? Creating a comprehensive formalization of the notion of a tractable program representation would constitute a significant achievement; and we will not answer that summons here. We will, however, take a step in that direction by enunciating a set of positive principles for tractable program representations, corresponding closely to the list of representational challenges above. While the discussion in this section is essentially conceptual rather than formal, we will use a bit of notation to ensure clarity of expression; S to denote a space of programmatic functions of the same type (e.g. all pure Lisp A-expressions mapping from lists to numbers), and B to denote a metric space of behaviors. In the case of a deterministic, side-effect-free program, execution maps from programs in $ to points in B, which will have separate dimensions for the function's output across various inputs of interest, as well as dimensions corresponding to the time and space costs of executing the program. In the case of a program that interacts with an external environment, or is intrinsically nondeterministic, execution will map from $ to probability distributions over points in B, which will contain additional dimensions for any side-effects of interest that programs in $ might have. Note the distinction between syntactic distance, measured as e.g. tree-edit distance between programs in 5, and semantic distance, measured between program's corresponding points in or probability distributions over B. We assume that semantic distance accurately quantifies our EFTA00624200
54 21 Representing Procedural Knowledge preferences in terms of a weighting on the dimensions of B; i.e., if variation along some axis is of great interest, our metric for semantic distance should reflect this. Let P be a probability distribution over B that describes our knowledge of what sorts of problems we expect to encounter, let R(n) C S be all the programs in our representation with (syntactic) size no greater than n. We will say that R(n) d-covers the pair (B,P) to extent p if the probability that, for a random behavior b E B chosen according to 12, there is some program in R whose behavior is within semantic distance d of 6, is greater than or equal to p. Then, some among the various properties of tractability that seem important based on the above discussion are as follows: • for fixed d, p quickly goes to I as n increases, • for fixed p, d quickly goes to 0 as n increases, • for fixed d and p, the minimal n needed for R(n) to d-cover (B, P) to extent p should be as small as possible, • ceteris paribus, syntactic and semantic distance (measured according to P) are highly cor- related. This is closely related to the Mind-Brain Correspondence Principle articulated in Chapter 10, and to the geometric formulation of cognitive synergy posited in Appendix ??. Syntactic distance has to do with distance along paths in mind-space related to formal program structures, and semantic distance has to do with distance along paths in mind-space and world-space corresponding to the record of the program's actual behavior. If syntax-semantics correlation failed, then there would be paths through mind-space (related to formal program structures) that were poorly matched to their closest corresponding paths through the rest of mind-space and world-space, hence causing a failure (or significant diminution) of cognitive synergy and mind-world correspondence. Since execution time and memory usage considerations may be incorporated into the defini- tion of program behavior, minimizing chaotic execution and managing resource variance emerges conceptually here as subcases of maximizing correlation between syntactic and semantic dis- tance. Minimizing over-representation follows from the desire for small program size: roughly speaking the less over-representation there is, the smaller average program size can be achieved. In some cases one can achieve fairly strong results about tractability of representations without any special assumptions about P: for example in prior work we have shown that adoption of an appropriate hierarchical normal form can generically increase correlation between syntactic and semantic distance in the space of Boolean functions I?, Loonbl. In this case we may say that we have a generically tractable representation. However, to achieve tractable representation of more complex programs, some fairly strong assumptions about P will be necessary. This should not be philosophically disturbing, since it's clear that human intelligence has evolved in a manner strongly conditioned by certain classes of environments; and similarly, what we need to do to create a viable program representation system for pragmatic AGI usage, is to achieve tractability relative to the distribution P corresponding to the actual problems the AGI is going to need to solve. Formalizing the distributions P of real-world interest is a difficult problem, and one we will not address here (recall the related, informal discussions of Chapter 9, where we considered the various important peculiarities of the human everyday world). However, we hypothesize that the representations presented in the following section may be tractable to a significant extent irrespective of P,3 and even more powerfully tractable with 3 Specifically, with only weak biases that prefer smaller and faster programs with hierarchical decompositions. EFTA00624201
21.6 Normal Forms Postulated to Provide Tractable Representations 55 respect to this as-yet unfonnalized distribution. As weak evidence in favor of this hypothesis, we note that many of the representations presented have proved useful so far in various narrow problem-solving situations. 21.5 The Combo Language The current version of OpenCogPrime uses a simple language called Combo, which is an example of a language in which the transformations we consider important for AGI-focused program representation are relatively simple and natural. Here we illustrate the Combo language by example, referring the reader to the OpenCog wiki site for a fonnal presentation. The main use of the Combo language in OpenCog is behind-the-scenes, i.e. using tree repre- sentations of Combo programs: but there is also a human-readable syntax, and an interpreter that allows humans to write Combo programs when needed. The main use of Combo, however, is not for human-coded programs, but rather for programs that are learned via various AI methods. In Combo all expressions are in prefix form like LISP, but the left parenthesis is placed after the operator instead of before, for example: • +(4 5) is a 0-ari expression that returns 4 t 5 • and (41 O<(42)) is a binary expression of type boot x float boot that returns true if and only if the first input is true and the second input positive. #n designates the n-th input. • fact (1) if (0<(41) *(41. fact (+(41 - 1))) 1) is a recursive definition of factorial. • and_seg (goto (stick) grab(stick) goto (owner) drop) is a 0-ari expression with side effects, it evaluates a sequence of actions until completion or failure of one of them. Each action is executed in the environment the agent is connected to and returns action_success upon success or action_ failure otherwise. The action sequence returns action_success if it completes or action_ failure if it does not. • if (near (owner self) lick (owner) and_seg (goto(owner) wag) is a 0-ary expression with side effects; it means that if at the time of its evaluation the agent referred as self (here a virtual pet) is near its owner then lick him/her, otherwise go to the owner and wag the tail. 21.6 Normal Forms Postulated to Provide Tractable Representations We now present a series of normal forms for programs, postulated to provide tractable repre- sentations in the contexts relevant to human-level, roughly human-like general intelligence. EFTA00624202
56 21 Representing Procedural Knowledge 21.6.1 A Simple Type System We use a simple type system to distinguish between the various normal forms introduced below. This is necessary to convey the minimal information needed to correctly apply the basic func- tions in our canonical forms. Various systems and applications may of course augment these with additional type information, up to and including the satisfaction of arbitrary predicates (e.g. a type for prime numbers). This can be overlaid on top of our minimalist system to con- vey additional bias in selecting which transformations to apply, and introducing constraints as necessary. For instance, a call to a function expecting a prime number, called with a potentially composite argument, may be wrapped in a conditional testing the argument's primality. A sim- ilar technique is used in the normal form for functions to deal with list arguments that may be empty. Normal forms are provided for Boolean and number primitive types, and the following parametrized types: • list types, tistr, where T is any type, • tuple types, tupleThrz_m, where all Z are types, and N is a positive natural number, • enum types, {s1,s2,... sN}, where N is a positive number and all se are unique identifiers, • function types T1, T2, ... TN -, 0, where 0 and all 711 are types, • action result types. A list of type tistr is an ordered sequence of any number of elements, all of which mast have type T. A tuple of type tuple2),T2....2), is an ordered sequence of exactly N elements, where every ith element is of type An cm= of type (s1, 82, sN) is some element si from the set. Action result types concern side-effectful interaction with some world external to the system (but perhaps simulated, of course), and will be described in detail in their subsection below. Other types may certainly be added at a later date, but we believe that those listed above provide sufficient expressive power to conveniently encompass a wide range of programs, and serve as a compelling proof of concept. The normal form for a type T is a set of elementary functions with codomain T, a set of constants of type T, and a tree grammar. Internal nodes for expressions described by the grammar are elementary functions, and leaves are either U„„r or ticonstani, where U is some type (often U = T). Sentences in a normal form grammar may be transformed into normal form expressions. The set of expressions that may be generated is a function of a set of bound variables and a set of external functions that must be provided (both bound variables and external functions are typed). The transformation is as follows: • Tc.„,i,„„t leaves are replaced with constants of type T, • T„ar leaves are replaced with either bound variables matching type T, or expressions of the form f (expri,expr2,... expr,w), where f is an external function of type 711, T2, Tim T, and each exp.; is a normal form expression of type Z (given the available bound variables and external functions). EFTA00624203
21.6 Normal Forms Postulated to Provide Tractable Representations 57 21.6.2 Boolean Normal Form The elementary functions are and, or, and not. The constants are {true, false}. The grammar is: bool_root or_form I and_form I literal I bool_constant literal - bool_var I not( bool_var ) or_form - or( (and_form I literalf(2,1 ) and_form and( (or_form I literalf(2,1 ) The construct foo(x,) refers to x or more matches of foo (e.g. ( x I 10(2,1 is two or more items in sequences where each item is either an x or a y). 21.6.3 Number Normal Form The elementary functions are * (times) and t (plus). The constants are some subset of the rationals (e.g. those with IEEE single-precision floating-point representations). The grammar is: num_root times_form I plus_form I num_constant I num_var times_form *( (num_constant I plus_form) plus_form(1,) ) I num_var plus_form +( (num_constant I times form} times_form(1,1 ) I num_var 21.6.4 List Normal Form For list types /istr, the elementary functions are list (an n-ary list constructor) and append. The only constant is the empty list (nil). The grammar is: list_T_root append_form I list_form I list_T_var I list_T_constant append_form - append( (list_form I list_T_var)(2,) I list_form - list( T_root(1,) ) 21.6.5 Tuple Normal Form For tuple types tapleT,,,r,,„,n ,, the only elementary function is the tuple constructor (tuple). The constants are Ti_constant xT2_constant x • • • x TN_constant The normal form is either a constant, a var, or tuple( Ti_root Ts_root TN_root ) EFTA00624204
58 21 Representing Procedural Knowledge 21.6.6 Enum. Normal Form Emun.s are atomic tokens with no internal structure - accordingly, there are no elementary functions. The constants for the enum (Si. .92 91.11 are the as. The normal form is either a constant or a variable. 21.6.7 Function Normal Form For T1, T2, ... TN 0. the normal form is a lambda-expression of arity N whose body is of type O. The list of variable names for the lambda-expression Ls not a "proper" argument - it does not have a normal form of its own. Assuming that none of the Tis is a list type, the body of the lambda-expression is simply in the normal form for type O (with the possibility of the lambda-expressions arguments appearing with their appropriate types). If one or more Tis are list types, then the body is a call to the split function with all arguments in normal form. Split is a family of functions with type signatures (71,1iStri ,T2, Tk, iiSirk -, 0), tupletistr,,o,tuplelistr2,0, • • • tupletistr„,O -> 0. To evaluate split( f i tuple(li,o1),tuple(I2,O2), tuple(lk,ok)), the list arguments 11,12,...1k are examined sequentially. If some is found that is empty, then the result is the corresponding value If all l; are nonempty, we deconstruct each of them into xi : xsi, where xi is the first element of the list and xsi is the rest. The result is then f(xi, xst, x2, xs2, . xk,xsk). The split function thus acts as an implicit case statement to deconstruct lists only if they are nonempty. 21.6.8 Action Result Normal Form An action result type act corresponds to the result of taking an action in some world. Every, action result type has a corresponding world type, world. Associated with action results and worlds are two special sorts of functions. • Perceptions - functions that take a world as their first argument and regular (non-world and non-action-result) types as their remaining arguments, and return regular types. Unlike other function types, the result of evaluating a perception call may be different at different times, because the world will have different configurations at different times. • Actions - functions that take a world as their first argument and regular types as their remaining arguments, and return action results (of the type associated with the type of their world argument). As with perceptions, the result of evaluating an action call may be different at different times. Furthermore, actions may have side effects in the associated world that they are called in. Thus, unlike any other sort of function, actions must be evaluated, even if their return values are ignored. Other sorts of functions acting on worlds (e.g. ones that take multiple worlds as arguments) are disallowed. EFTA00624205
21.7 Program Transformations 59 Note that an action result expression cannot appear nested inside an expression of any other type. Consequently, there is no way to convert e.g. an action result to a Boolean, although con- version in the opposite direction is permitted. This is required because mathematical operations in our language have classical mathematical semantics; x and y must equal y and x, which will not generally be the case if x or y can have side-effects. Instead, there are special sequential versions of logical functions which may be used instead. The elementary functions for action result types are andacq (sequential and, equivalent to C's short-circuiting &&), arscl (sequential or, equivalent to C's short-circuiting I I ), and fails (negates success to failure and vice versa). The constants may vary from type to type but must at least contain success and failure, indicating absolute success/failure in execution.4 The normal form is as follows: act_root seqlit act orseq_form andseq_form orseq_form I andseq_form I seqlit act I fails( act ) act constant I act_var orseq( (andseq_form I secrliti(2,) ) andseq( (orseq_form I secrliti(2,) ) 21.7 Program Transformations A program transformation is any type-preserving mapping from expressions to expressions. Transformations may be guaranteed to preserve semantics. When doing program evolution there is an intermediate category of fitness preserving transformations that may alter semantics, but not fitness. In general, the only way that fitness preserving transformations will be uncovered is by scoring programs that have had their semantics potentially transformed to determine their fitness, which is what most fitness function does. On the other hand if the fitness function is encompassed in the program itself, so a candidate directly outputs the fitness itself, then only preserving semantics transformations are needed. 21.7.1 Reductions These are semantics preserving transformations that do not increase some size measure (typ- ically number of symbols), and are idempotent. For example, and(x, x, y) and(x, y) is a reduction for Boolean expressions. A set of canonical reductions is defined for every type that has a normal form. For numerical functions, the simplifier in a computer algebra system may be used. The full list of reductions is omitted in for brevity. An expression is reduced if it maps to itself under all canonical reductions for its type, and all of its children are reduced. Another important set of reductions are the compressive abstractions, which reduce or keep constant the size of expressions by introducing new functions. Consider list is (Ha p q) r) *(+(b p cil r) it(+(c p q) r)) A do(arghar92,...argN) statement (known as progn in Lisp), which evaluates its arguments sequen- tially regardless of success or failure, is equivalent to and,,cq(oracq(argi, success). or seq(arst 2, success). ... orstq(arg iv , success)). EFTA00624206
60 21 Representing Procedural Knowledge which contains 19 symbols. Transforniing this to f(x) w(+(x p q) r) list (f (a) f(b) f(el) reduces the total number of symbols to 15. One can generalize this notion to consider com- pressive abstractions across a set of programs. Compressive abstractions appear to be rather expensive to uncover, although not prohibitively so, the computation may easily be parallelized and may rely heavily on subtree mining !TODD REF]. 21.7.1.1 A Simple Example of Reduction We now give a simple example of how CogPrhne's reduction engine can transform a program into a semantically equivalent but shorter one. Consider the following program and the chain of reduction: 1. We start with the expression if(P and_seq (if (P A B) B) and_seq(A B)) 2. A reduction rule permits to reduce the conditional if IP A B) to if (true A B). Indeed if P is true, then the first branch is evaluated and P must still be true. if(P and_seq (if (true A B) B) and_seq (A B) I 3. Then a rule can reduce if (true A B) to A. if(P and_seq (A B) and_seq (A B)) 4. And finally another rule replaces the conditional by one of its branches since they are identical and_seq (A B) Note that the reduced program is not only smaller (3 symbols instead of 11) but a bit faster too. Of course it is not generally true that smaller programs are faster but in the restricted context of our experiments it has often been the case. 21.7.2 Neutral Transformations Semantics preserving transformations that are not reductions are not useful on their own - they can only have value when followed by transformations from some other class. They are thus more speculative than reductions, and more costly to consider. I will refer to these as neutral transformations 1O1s951. • Abstraction - given an expression E containing non-overlapping subexpressions Et, E2, ... EN, let E' be E with all E1 replaced by the unbound variables Define the function f (vi, v2, ... vs) = E', and replace E with f(Ei, E2,... EN). Abstraction is distinct from compressive abstraction because only a single call to the new function f is introduced.' • Inverse abstraction - replace a call to a user-defined function with the body of the function, with arguments instantiated (note that this can also be used to partially invert a compressive abstraction). 6 In compressive abstraction there must be at least two calls in order to avoid increasing the number of symbols. EFTA00624207
21.7 Program 'I'ransformations 61 • Distribution - let E be a call to some function f, and let E' be an expression of E's ith argument that is a call to some function g, such that f is distributive over g's arguments, or a subset thereof. We shall refer to the actual arguments to g in these positions in E' as x1, x2,... xn. Now, let D(F) be the function that is obtained by evaluating E with its ith argument (the one containing E') replaced with the expression F. Distribution is replacing E with E', and then replacing each xj (1 ≤ j ≤ n) with D(x3). For example, consider +(x *(y if(cond a b))) Since both + and * are distributive over the result branches of if, there are two possible distribution transformations, giving the expressions if(cond +(x *(y a)) +(x *CY b))) + (x (cond *(y a) *ly b) ) ) • Inverse distribution (factorization) - the opposite of distribution. This is nearly a reduction; the exceptions are expressions such as f(g(x)), where f and g are mutually distributive. • Arity broadening - given a function f, modify it to take an additional argument of some type. All calls to f must be correspondingly broadened to pass it an additional argument of the appropriate type. • List broadening - given a function f with some ith argument x of type T, modify f to instead take an argument y of type list?, which gets split into x : xs. All calls to f with ith argument x' must be replaced by corresponding calls with ith argument list(e). • Conditional insertion - an expression x is replaced by if(true,x,y), where y is some expression of the same type of x. As a technical note, action result expressions (which may cause side-effects) complicate neu- tral transformations. Specifically, abstractions and compressive abstractions must take their arguments lazily (i.e. not evaluate them before the function call itself is evaluated), in order to be neutral. Furthermore, distribution and inverse distribution may only be applied when f has no side-effects that will vary (e.g. be duplicated or halved) in the new expression, or affect the nested computation (e.g. change the result of a condition within a conditional). Another way to think about this issue is to consider the action result type as a lazy domain-specific language embedded within a pure functional language (where evaluation order is unspecified). Spector has performed an empirical study of the tradeoffs in lazy vs. eager function abstraction for program evolution ISpe961. The number of neutral transformations applicable to any given program grows quickly with program size.? Furthermore, synthesis of complex programs and abstractions does not seem to be passible without them. Thus, a key hypothesis of any approach to AGI requiring significant program synthesis, without assuming the currently infeasible computational capacities required to brute-force the problem, is that the inductive bias to select promising neutral transforma- tions can be learned and/or programmed. Referring back to the initial discussion of what con- stitutes a tractable representation, we speculate that perhaps, whereas well-chosen reductions are valuable for generically increasing program representation tractability, well-chosen neutral transformations will be valuable for increasing program representation tractability relative to distributions P to which the transformations have some (possibly subtle) relationship. 6 Analogous tuple-broadening transformations may be defined as well, but are omitted for brevity. 7 Exact calculations are given by Olsson EFTA00624208
62 21 Representing Procedural Knowledge 21.7.5 Non-Neutral Transformations Non-neutral transformations are the general class defined by removal, replacement, and inser- tion of subexpressions, acting on expressions in normal form, and preserving the normal form property. Clearly these transformations are sufficient to convert any normal form expression into any other. What is desired is a subclass of the non-neutral transformations that is combi- natorially complete, where each individual transformation is nonetheless a semantically small step. The full set of transformations for Boolean expressions is given in Ilmo06]. For numerical expressions, the transcendental functions sin, log, and 9 are used to construct transformations. These obviate the need for division (a/b = el°5O)-mg(b)), and subtraction (a — b = a + —1 * b). For lists, transformations are based on insertion of new leaves (e.g. to append function calls), and "deepening" of the normal form by insertion of subclauses (see II.00061 for details). For tuples, we take the union of the transformations of all the subtypes. For other mixed-type expressions the union of the non-neutral transformations for all types mast be considered as well. For enum types the only transformation is replacing one symbol with another. For function types, the transformations are based on function composition. For action result types. actions are inserted/removed/altered, akin to the treatment of Boolean literals for the Boolean type. We propose an additional class of non-neutral transformations based on the marvelous fold function: fokl(f,v,I) = if (empty(I),v, f (first(l), fold(f, v, rest(l)))) With fold we can express a wide variety of iterative constructs, with guaranteed termination and a bias towards low computational complexity. In fact, fold allows us to represent exactly the primitive recursive functions pint 94 Even considering only this reduced space of passible transformations, in many cases there are still too many possible programs "nearby" some target to effectively consider all of them. For example many probabilistic model-building algorithms, such as learning the structure of a Bayesian network from data, can require time cubic in the number of variables (in this context each independent non-neutral transformation can correspond to a variable). Especially as the size of the programs we wish to learn grows, and as the number of typologically matching functions increases, there will be simply too many variables to consider each one intensively, let alone apply a quadratic-time algorithm. To alleviate this scaling difficulty, we propose three techniques. The first is to consider each potential variable (i.e. independent non-neutral transformation) to heuristically determine its usefulness in expressing constructive semantic variation. For exam- ple, a Boolean transformation that collapses the overall expression into a tautology is assumed to be useless.8 The second is heuristic coupling rules that allow us to calculate, for a pair of transformations, the expected utility of applying them in conjunction. Finally, while fold is powerful, it may need to be augmented by other methods in order to provide tractable representation of complex programs that would normally be written using nu- merous variables with diverse scopes. One approach that we have explored involves application of ISN1191's ideas about director strings as combinators. In Sinot's approach, special program 8 This S heuristic because such a transformation might be useful together with other transformations. EFTA00624209
21.8 Interfacing Between Procedural and Declarative Knowledge 63 tree nodes are labeled with director strings, and special algebraic operators interrelate these strings. One then achieves the representational efficiency of local variables with diverse scopes, without needing to do any actual variable management. Reductions and other (non-)neutral transformation rules related to broadening and reducing variable scope may then be defined using the director string algebra. 21.8 Interfacing Between Procedural and Declarative Knowledge Finally, another critical aspect of procedural knowledge is its interfacing with declarative knowl- edge. We now discuss the referencing of declarative knowledge within procedures, and the ref- erencing of the details of procedural knowledge within CogPrime's declarative knowledge store. 21.8.1 Programs Manipulating Atoms Above we have used Combo syntax implicitly, referring to Appendix ?? for the formal defi- nitions. Now we introduce one additional, critical element of Combo syntax: the capability to explicitly reference declarative knowledge within procedures. For this purpose Combo mast contain the following types: Atom, Node, Link,TruthV clue, AtomType, AtomTable Atom is the union of Node and Link. So a type Node within a Combo program refers to a Node in CogPrime's AtomTable. The mechanisms used to evaluate these entities during program evaluation are discussed in Chapter 25. For example, suppose one wishes to write a Combo program that creates Atoms embodying the predicate-argument relationship eats(cat, fish), represented Evaluation eats (cat, fish) aka Evaluation eats List cat fish To do this, one could say for instance, new-link (EvaluationLink new-node(PredicateNode "eats") new-link (ListLink new-node(ConceptNode "cat") new-node(ConceptNode "fish")) (new-sty .99 .99)) EFTA00624210
64 21 Representing Procedural Knowledge 21.9 Declarative Representation of Procedures Next, we consider the representation of program tree internals using declarative data structures. This is important if we want OCP to inferentially understand what goes on inside programs. In itself, it is more of a "bookkeeping" issue than a deep conceptual issue, however. First, note that each of the entities that can live at an internal node of a program, can also live in its own Atom. For example. a number in a program tree corresponds to a NmnberNode; an argument in a Combo program already corresponds to some Atom; and an operator in a program can be wrapped up in a Schemallode all its own, and considered as a one-leaf program tree. Thus, one can build a kind of virtual, distributed program tree by linking a number of ProcedureNodes (i.e. PredicateNodes or Schemallodes) together. All one needs in order to achieve this is an analogue of the @ symbol (as defined in Section 20.3 of Chapter 20) for relating ProcedureNodes. This is provided by the ExecutionLink type, where (ExecutionLink f g1 essentially means the same as f g in curried notation or e / \ f g The same generalized evaluation rules used inside program trees may be thought of in terms of ExecutionLinks; formally, they are crisp ExtensionalimplicationLinks among ExecutionLinks. Note that we are here using ExecutionLink as a curried function; that is, we are looking at (ExecutionLink f g) as a function that takes an argument x, where the truth value of (ExecutionLink f g1 x represents the probability that executing f, on input g, will give output x. One may then construct combinator expressions linking multiple ExecutionLinks together; these are the analogues of program trees. For example, using ExecutionLinks, one equivalent of y = x + x^2 is: Hypothetical SeguentialAND ExecutionLink pow List vl 2 v2 ExecutionLink List vl v2 v3 Here the vl, v2, v3 are variables which may be internally represented via combinators. This AND is sequential in case the evaluation order inside the program interpreter makes a difference. As a practical matter, it seems there is no purpose to explicitly storing program trees in conjunction-of-ExecutionLinks form. The information in the ExecutionLink conjunct is already there in the program tree. However, the PLN reasoning system, when reasoning on program trees, may carry out this kind of expansion internally as part of its analytical process. EFTA00624211
Section II The Cognitive Cycle EFTA00624212
EFTA00624213
Chapter 22 Emotion, Motivation, Attention and Control Co-authored with Zhenhua Cai 22.1 Introduction This chapter begins the heart of the book: the part that explains how the CogPrime design aims to implement roughly human-like general intelligence, at the human level and ultimately beyond. First, here in Section II we explain how CogPrime can be used to implement a simplistic animal-like agent without much learning: an agent that perceives, acts and remembers, and chooses actions that it thinks will achieve its goals; but doesn't do any sophisticated learning or reasoning or pattern recognition to help it better perceive, act, remember or figure out how to achieve its goals. We're not claiming CogPrime is the best way to implement such an animal-like agent, though we suggest it's not a bad way and depending on the complexity and nature of the desired behaviors, it could be the best way. We have simply chosen to split off the parts of CogPrime needed for animal-like behavior and present them first, prior to presenting the various "knowledge creation" (learning, reasoning and pattern recognition) methods that constitute the more innovative and interesting part of the design. In Stan Franklin's terms, what we explain here in Section II is how a basic cognitive cycle may be achieved within CogPrime. In that sense, the portion of CogPrime explained in this Sectionis somewhat similar to the parts of Stan's LIDA architecture that have currently been worked out in detail, and that . However, while LIDA has not yet been extended in detail (in theory or implementation) to handle advanced learning, cognition and language, those aspects of CogPrime have been developed and in fact constitute the largest portion of this book. Looking back to the integrative diagram from Chapter 5, the cognitive cycle is mainly about integrating vaguely LIDA-like structures and mechanisms with heavily Psi-like structures and mechanisms - but doing so in a way that naturally links in with perception and action mecha- nisms "below," and more abstract and advanced learning mechanisms "above." In terms of the general theory of general intelligence, the basic CogPrime cognitive cycle can be seen to have a foundational importance in biasing the CogPrime system toward the problem of controlling an agent in an environment requiring a variety of real-time and near-real-time responses based on a variety of kinds of knowledge. Due to its basis in human and animal cognition, the CogPrime cognitive cycle likely incorporates many useful biases in ways that are not immediately obvious, but that would become apparent if comparing intelligent agents controlled by such a cycle versus intelligent agents controlled via other means. The cognitive cycle also provides a framework in which other cognitive processes, relating to various aspects of the goals and environments relevant to human-level general intelligence, 67 EFTA00624214
68 22 Emotion, Motivation, Attention and Control may conveniently dynamically interoperate. The "Mind OS" aspect of the CogPrime archi- tecture provides general mechanisms in which various cognitive processes may interoperate on a common knowledge store; the cognitive cycle goes further and provides a specific dynamical pattern in which multiple cognitive processes may intersect. Its effective operation places strong demands on the cognitive synergy between the various cognitive processes involved, but also provides a framework that encourages this cognitive synergy to develop and persist. Finally, it should be stressed that the cognitive cycle is not all-powerful nor wholly pervasive in CogPrime's dynamics. It's critical for the real-time interaction of a CogPrime-controlled agent with a virtual or physical world; but there may be many processes within CogPrime that most naturally operate outside such a cycle. For instance, humans will habitually do deep intellectual thinking (even something so abstract as mathematical theorem proving) within a cognitive cycle somewhat similar to the one they use for practical interaction with the external world. But, there's no reason that CogPrime systems need to be constrained in this way. Deviating from a cognitive cycle based dynamic may cause a CogPrime system to deviate further from human- likeness in its intelligence, but may also help it to perform better than humans on some tasks, e.g. tasks like scientific data analysis or mathematical theorem proving that benefit from styles of information processing that humans aren't particularly good at. 22.2 A Quick Look at Action Selection We will begin our exposition of CogPrime's cognitive cycle with a quick look at action selection. As Stan Franklin likes to point out, the essence of an intelligent agent is that it does things; it takes actions. The particular mechanisms of action selection in CogPrime are a bit involved and will be given in Chapter 24; in this chapter we will give the basic idea of the action selection mechanism and then explain how a variant of the Psi model (described in Chapter 4 of Part 1 above) is used to handle motivation (emotions, drives, goals, etc.) in CogPrime, including the guidance of action selection. The crux of CogPrime's action selection mechanism is as follows • the action selector chooses procedures that seem likely to help achieve important goals in the current context - Example: If the goal is to create a block structure that will surprise Bob, and there is plenty of time, one procedure worth choosing might be a memory search procedure for remembering situations involving Bob and physical structures. Alternately, if there isn't much time, one procedure worth choosing might be a procedure for building the base of a large structure - as this will give something to use as part of whatever structure is eventually created. Another procedure worth choosing might be one that greedily assembles structures from blocks without any particular design in mind. • to support the action selector, the system builds implications of the form Context&Procedure Goal, where Context is a predicate evaluated based on the agent's situation — Example: If Bob has asked the agent to do something, and it knows that Bob is very insistent on being obeyed, then implications such as • "Bob instructed to do X" and "do X" -, "please Bob" < .9, .9 > will be utilized EFTA00624215
22.3 Psi in CogPrime 69 — Example: If the agent wants to make a tower taller, then implications such as • ' is a blocks structure" and "place block atop —) "make T taller" < .9, .9 > will be utilized • the truth values of these implications are evaluated based on experience and inference - Example: The above implication involving Bob could be evaluated based on experience, by assessing it against remembered episodes involving Bob giving instructions — Example: The same implication could be evaluated based on inference, using analogy to experiences with instructions from other individuals similar to Bob; or using things Bob has explicitly said, combined with knowledge that Bob's self-descriptions tend to be reasonably accurate • Importance values are propagated between goals using economic attention allocation (and, inference is used to learn subgoals from existing goals) - Example: If Bob has told the agent to do X, and the agent has then derived (from the goal of pleasing Bob) the goal of doing X, then the "please Bob" goal will direct some of its currency to the "do X" goal (which the latter goal can then pass to its subgoals, or spend on executing procedures) These various processes are carried out in a manner orchestrated by Domer's Psi model as refined by Joscha Bach (as reviewed in Chapter 4 above), which supplies (among other features) • a specific theory regarding what "demands" should be used to spawn the top-level goals • a set of (four) interrelated system parameters governing overall system state in a useful manner reminiscent of human and animal psychology • a systematic theory of how various emotions (wholly or partially) emerge from more fun- damental underlying phenomena 22.3 Psi in CogPrime The basic concepts of the Psi approach to motivation, as reviewed in Chapter 4 of Part 1 above, are incorporated in CogPrime as follows (note that the following list includes many concepts that will be elaborated in more detail in later chapters): • Demands are GroundedPredicateNodes (GPNs), i.e. Nodes that have their truth value com- puted at each time by some internal C++ code or some Combo procedure in the Procedur- eRepository - Examples: Alertness, perceived novelty, internal novelty, reward from teachers, social stimulus - Humans and other animals have familiar demands such as hunger, thirst and excretion; to create an AGI closely emulating a human or (say) a dog one may wish to simulate these in one's AGI system as well • Urges are also GPNs, with their truth values defined in terms of the truth values of the Nodes for corresponding Demands. However in CogPrime we have chosen the term "Ubergoar instead of Urge, as this is more evocative of the role that these entities play in the system's dynamics (they are the top-level goals). EFTA00624216
70 22 Emotion, Motivation, Attention and Control • Each system comes with a fixed set of Ubergoals (and only very advanced CogPrime systems will be able to modify their Ubergoals) — Example: Stay alert and alive now and in the future; experience and learn new things now and in the future; get reward from the teachers now and in the future; enjoy rich social interactions with other minds now and in the future - A more advanced CogPrime system could have abstract (but experientially grounded) ethical principles among its Ubergoals, e.g. an Ubergoal to promote joy. an Ubergoal to promote growth and an Ubergoal to promote choice, in accordance with the ethics described in IGoci • The ShortTermImportance of an Ubergoal indicates the urgency of the goal, so if the De- mand corresponding to an Ubergoal is within its target range, then the Ubergoal will have zero STI. But all Ubergoals can be given maximal LTI to guarantee they don't get deleted. - Example: If the system is in an environment continually providing an adequate level of novelty (according to its Ubergoal), then the Ubergoal corresponding to external novelty will have low STI but high LTI. The system won't expend resources seeking novelty. But then, if the environment becomes more monotonous, the urgency of the external novelty goal will increase, and its STI will increase correspondingly, and resources will begin getting allocated toward improving the novelty of the stimuli received by the agent. • Pleasure is a GPN, and its internal truth value computing program compares the satisfaction of Ubergoals to their expected satisfaction - Of course, there are various mathematical functions (e.g. p'th power averages' for dif- ferent p) that one can use to average the satisfaction of multiple Ubergoals; and choices here, i.e. different specific ways of calculating Pleasure, could lead to systems with different "personalities" • Goals are Nodes or Links that are on the system's list of goals (the GoalPool). Ubergoals are automatically Goals, but there will also be many other Goals also - Example: The Ubergoal of getting reward from teachers might spawn subgoals like "getting reward from Bob" (if Bob is a teacher), or "making teachers smile" or "create surprising new structures" (if the latter often garners teacher reward). The subgoal of "create surprising new structures" might, in the context of a new person entering the agent's environment with a bag of toys, lead to the creation of a subgoal of asking for a new toy of the sort that could be used to help create new structures. Etc. • Psi's memory is CogPrime's AtomTable, with associated structures like the Procedur- eRepository (explained in Chapter 19), the SpaceServer and TimeServer (explained in Chapter 26), etc. — Examples: The knowledge of what blocks look like and the knowledge that tall struc- tures often fall down, go in the AtomTable; specific procedures for picking up blocks of different shapes go in the ProcedureRepository; the layout of a room or a pile of blocks at a specific point in time go in the SpaceServer; the series of events involved in the building-up of a tower are temporally indexed in the TimeServer. I the p'th power average is defined as vE XP EFTA00624217
22.3 Psi in CogPrime 71 - In Psi and MicroPsi, these same phenomena are stored in memory in a rather different way, yet the basic Psi motivational dynamics are independent of these representational choices • Psi's "motive selection" process is carried out in CogPrime by economic attention allocation, which allocates ShortTermImportance to Goal nodes - Example: The flow of importance from "Get reward from teachers" to "get reward from Bob" to "make an interesting structure with blocks" is an instance of what Psi calls "motive selection". No action is being taken yet, but choices are being made regarding what specific goals are going to be used to guide action selection. • Psi's action selection plays the same role as CogPrime's action selection, with the clari- fication that in CogPrime this is a matter of selecting which procedures (i.e. schema) to run, rather than which individual actions to execute. However, this notion exists in Psi as well, which accounts for "automatized behaviors" that are similar to CogPrime schemata the only (minor) difference here is that in CogPrime automatized behaviors are the default case. — Example: If the goal "make an interesting structure with blocks" has a high STI, then it may be used to motivate choice of a procedure to execute, e.g. a procedure that finds an interesting picture or object seen before and approximates it with blocks, or a procedure that randomly constructs something and then filters it based on interestingness. Once a blocks-structure-building procedure is chosen, this procedure may invoke the execution of sub-procedures such as those involved with picking up and positioning particular blocks. • Psi's planning is carried out via various learning processes in CogPrime, including PLN plus procedure learning methods like MOSES or hillclimbing — Example: If the agent has decided to build a blocks structure emulating a pyramid (which it saw in a picture), and it knows how to manipulate and position individual blocks, then it must figure out a procedure for carrying out individual-block actions that will result in production of the pyramid. In this case, a very inexperienced agent might use MOSES or hillclimbing and "guidedly-randomly" fiddle with different construction procedures until it hit on something workable. A slightly more experienced agent would use reasoning based on prior structures it had built, to figure out a rational plan (like: "start with the base, then iteratively pile on layers, each one slightly smaller than the previous.") • The modulators are system parameters which may be represented by PredicateNodes, and which must be incorporated appropriately in the dynamics of various MindAgents, e.g. - activation affects action selection. For instance this may be effected by a process that, each cycle, causes a certain amount of STICurrency to pass to schema satisfying certain properties (those involving physical action, or terminating rapidly). The amount of currency passed in this way would be proportional to the activation - resolution level affects perception schema and MindAgents, causing them to expend less effort in processing perceptual data - certainty affects inference and pattern mining and concept creation processes, causing them to place less emphasis on certainty in guiding their activities, i.e. to be more EFTA00624218
72 22 Emotion, Motivation, Attention and Control accepting of uncertain conclusions. To give a single illustrative example: lV1Len backward chaining inference is being used to find values for variables, a "fitness target" of the form strength x confidence Ls sometimes used: this may be replaced with strengths' x cortfidence2-P, where activation parameter affects the exponent p, so when p tends to 0 confidence is more important, when p tends to 2 strength is more important and when p tends to 1 strength and confidence are equally important. - selection threshold may be used to effect a process that, each cycle, causes a certain amount of STICurrency (proportional to the selection threshold) to pass to the Goal Atoms that were wealthiest at the previous cycle. Based on this run-down, Psi and CogPrime may seem very similar, but that's because we have focused here only on the motivation and emotion aspect. Psi uses a very different knowledge representation than CogPrime; and in the Psi architecture diagram, nearly all of CogPrime is pushed into the role of "background processes that operate in the memory box." According to the theoretical framework underlying CogPrime, the multiple synergetic processes operating in the memory box are actually the crux of general intelligence. But getting the motivation/emotion framework right Ls also very important, and Psi seems to do an admirable job of that. 22.4 Implementing Emotion Rules atop Psi's Emotional Dynamics Human motivations are largely determined by human emotions, which are the result of human- ity's evolutionary heritage and embodiment, which are quite different than the heritage and embodiment of current AI systems. So, if we want to create AGI systems that lack humanlike bodies, and didn't evolve to adapt to the same environments as humans did, yet still have vaguely human-like emotional and motivational structures, the latter will need to be explicitly engineered or taught in some way. For instance, if one wants to make a CogPrime agent display anger, something beyond Psi's model of emotion needs to be coded into the agent to enable this. After all, the rule that when angry the agent has some propensity to harm other beings, is not implicit in Psi and needs to be programmed in. However, making use of Psi's emotion model, anger could be characterized as an emotion consisting of high arousal, low resolution, strong motive dominance, few background checks, strong goal-orientedness (as the Psi model suggests) and a propensity to cause harm to agents or objects. This is much simpler than specifying a large set of detailed rules characterizing angry behavior. The "anger" example brings up the point that desirability of giving AGI systems closely humanlike emotional and motivational systems is questionable. After all we humans cause ourselves a lot of problems with these aspects of our mind/brains, and we sometimes put our more ethical and intellectual sides at war with our emotional and motivational systems. Looking into the future, an AGI with greater power than humans yet a humanlike motivational and emotional system, could be a very dangerous thing. On the other hand, if an AGI's motivational and emotional system is too different from human nature, we might have trouble understanding it, and it understanding us. This problem shouldn't be overblown - it seems possible that an AGI with a more orderly and rational motivational system than the human one might be able to understand us intellectually very, well, and that we might be able to understand it well using our analytical tools. However, if we want to have mutual empathy with an AGI system, then its motivational and emotional EFTA00624219
22.5 Coals and Contexts 73 framework had better have at least some reasonable overlap with our own. The value of empathy for ethical behavior was stressed extensively in Chapter 12 of Part 1. This is an area where experimentation is going to be key. Our initial plan is to supply CogPrime with rough emulations of some but not all human emotions. We see no need to take explicit pains to simulate emotions like anger, jealousy and hatred. On the other hand, joy, curiosity, sadness, wonder, fear and a variety of other human emotions seem both natural in the context of a robotically or virtually embodied CogPrime system, and valuable in terms of allowing mutual human/CogPrime empathy. 22.4.1 Grounding the Logical Structure of Emotions in the Psi Model To make this point in a systematic way, we point out that Ortony et al's FOCC90I "cognitive theory of emotions" can be grounded in CogPrime's version of Psi in a natural way. This theory captures a wide variety of human and animal emotions in a systematic logical framework, so that grounding their framework in CogPrime Psi goes a long way toward explaining how CogPrime Psi accounts for a broad spectrum of human emotions. The essential idea of the cognitive theory of emotions can be seen in Figure 22.1. What we see there is that common emotions can be defined in terms of a series of choices: • Is it positive or negative? • Is it a response to an agent, an event or an object? • Is it focused on consequences for oneself, or for another? - If on another, is it good or bad for the other? - If on oneself, is it related to some event whose outcome is uncertain? • if it's related to an uncertain outcome, did the expectation regarding the outcome get fulfilled or not? Figure 22.1 shows how each set of answers to these questions leads to a different emotion. For instance: what is a negative emotion, responding to events, focusing on another, and undesirable to the other? Pity. In the list of questions, we see that two of them - positive vs. negative, and expectation ful- fillment vs. otherwise - are foundational in the Psi model. The other questions are evaluations that an intelligent agent would naturally make, but aren't bound up with Psi's emotion/moti- vation infrastructure in such a deep way. Thus, the cognitive theory of emotion emerges as a combination of some basic Psi factors with some more abstract cognitive properties (good vs. bad for another; agents vs. events vs. objects). 22.5 Goals and Contexts Now we dig deeper into the details of motivation in CogPrime. Just as we have both explicit (local) and implicit (global) memory in CogPrime, we also have both explicit and implicit goals. An explicit goal is formulated as a Goal Atom, and then MindAgents specifically orient the system's activity toward achievement of that goal. An implicit goal is something that the EFTA00624220
74 22 Emotion, Motivation, Attention and Control witt FTC 0 REACTION TO r Cant aA a a Tarot% Wawa ••••••••0 ASPIC'S a MUM dukci OK SODAS) al I row/moot Luta AOCMI I COGEOUEMat KO OMER OEISTAI UPOISRAIll FOR OTHER la OMER I I PROSPECTS Kama I COMKOLIDICES f ULF PPOSPECTS IRREIRVIPTI POSY Mel ••••••••N *Ps SRN iatelin — — NMY POMMISCIP ATTIPINTRIN ATTRACT— •41.1.41OPICI Ton Imp COMMED DICOPPORIG0 pans. paw. Wean fiesepdbal 8•10 IffivelTemod SYR VARIAINO*11110,MOR COSOMPS Fig. 22.1: Ontology of Emotions from IOCC90] system works toward, but in a more loosely organized way, and without necessarily explicitly representing the knowledge that it is working toward that goal. Here we will focus mainly on explicit motivation, beginning with a description of Goal Atoms, and the Contexts in which Goals are worked toward via executing Procedures. Figure 22.2 gives a rough depiction of the relationship between goals, procedures and context, in a simple example relevant to an OpenCogPrime-controlled virtual agent in a game world. 22.5.1 Goal Atoms A Goal Atom represents a target system state and is true to the extent that the system satisfied the conditions it represents. A Context Atom represents an observed state of the world/mind, and is true to the extent that the state it defines is observed. Taken together, these two Atom types provide the infrastructure CogPrime needs to orient its actions in specific contexts toward specific goals. Not all of CogPrime's activity is guided by these Atoms; much of it is non-goal- directed and spontaneous, or ambient as we sometimes call it. But it is important that some EFTA00624221
22.5 Coals and Contexts 75 / 'X " Yell Bob Bob's fewer 71 Glee Bob I a hug Ask Bob 'What are you doing?' CONTEXT Bob is nearby building a tower AND Fig. 22.2: Context, Procedures and Goals. Examples of the basic "goal/context/procedure" triad in a simple game-agent situation. of the system's activity - and in some cases, a substantial portion - is controlled explicitly via goals. Specifically, a Goal Atom is simply an Atom (usually a PredicateNocle, sometimes a Link, and potentially another type of Atom) that has been selected by the GoalRefinement MindAgent as one that represents a state of the atom space which the system finds important to achieve. The extent to which an Atom is considered a Goal Atom at a particular point in time is determined by how much of a certain kind of financial instrument called an RFS (Request For Service) it possess (as will be explained in Chapter 24). A CogPrime instance must begin with some initial Ubergools (aka top level supergoals), but may then refine these goals in various ways using inference. Immature, "childlike" CogPrime systems cannot modify their Ubergoals nor add nor delete Ubergoals. Advanced CogPrime systems may be allowed to modify, add or delete Ubergoals, but this is a critical and subtle aspect of system dynamics that must be treated with great care. WIKISOURCE:ContextAtom EFTA00624222
76 22 Emotion. Motivation, Attention and Control 22.6 Context Atoms Next, a Context is simply an Atom that is used as the source of a ContextLink, for instance Context quantum_computing Inheritance Ben amateur or Context game_of_fetch PredictiveAttraction Evaluation give (ball, teacher) Satisfaction The former simply says that Ben is an amateur in the context of quantum computing. The latter says that in the context of the game of fetch, giving the ball to the teacher implies satisfaction. A more complex instance pertinent to our running example would be Context Evaluation Recently List Minute Evaluation Ask List Bob ThereExists SX And Evaluation Build List self SX Evaluation surprise List SX Bob AverageQuantifier SY PredictiveAttraction And Evaluation Build List self SY Evaluation surprise List SY Jim Satisfaction EFTA00624223
22.7 Ubergoal Dynamics 77 which says that, if the context is that Bob has recently asked for something surprising to be built, then one strategy for getting satisfaction is to build something that seems likely to satisfy Jim. An implementation-level note: in the current OpenCogPrime implementation of CogPrime, ContextLinks are implicit rather than explicit entities. An Atom can contain a ComplexTruth- Value which in turn contains a number of VersionHandles. Each VersionHandle associates a Context or a Hypothetical with a TruthValue. This accomplishes the same thing as a formal ContextLink, but without the creation of a ContextLink object. However, we continue to use ContextLinks in this book and other documents about CogPrime; and it's quite possible that future CogPrime implementations might handle them differently. 22.7 Ubergoal Dynamics In the early phases of a CogPrime system's cognitive development, the goal system dynamics will be quite simple. The Ubergoals are supplied by human programmers, and the system's adaptive cognition is used to derive subgoals. Attentional currency allocated to the Ubergoals is then passed along to the subgoals. as judged appropriate. As the system becomes more advanced, however, more interesting phenomena may arise regarding Ubergoals: implicit and explicit Ubergoal creation. 22.7.1 Implicit Ubergoal Pool Modification First of all, implicit Ubergoal creation or destruction may occur. Implicit Ubergoal destruction may occur when there are multiple Ubergoals in the system, and some prove easier to achieve than others. The system may then decide not to bother achieving the more difficult Ubergoals. Appropriate parameter settings may mitigate against this phenomenon, of course. Implicit Ubergoal creation may occur if some Goal Node G arises that inherits as a subgoal from multiple Ubergoals. This Goal G may then come to act implicitly as an Ubergoal, in that it may get more attentional currency than any of the Ubergoals. Also, implicit Ubergoal creation may occur via forgetting. Suppose that G becomes a goal via inferred inheritance from one or more Ubergoals. Then, suppose G forgets why this inheritance exists, and that in fact the reason becomes obsolete, but the system doesn't realize that and keeps the inheritance there. Then, G is an implicit Ubergoal in a strong sense: it gobbles up a lot of attentional currency, potentially more than any of the actual Ubergoals, but actually doesn't help achieve the Ubergoals, even though the system thinks it does. This kind of dynamic is obviously very bad and should be avoided - and can be avoided with appropriate tuning of system parameters (so that the system pays a lot of attention to making sure that its subgoaling- related inferences are correct and are updated in a timely way). EFTA00624224
78 22 Emotion, Motivation, Attention and Control 22.7.2 Explicit Ubergoal Pool Modification An advanced CogPrime system may be given the ability to explicitly modify its Ubergoal pool. This is a very interesting but very subtle type of dynamic, which is not currently well understood and which potentially could lead to dramatically unpredictable behaviors. However, modification, creation and deletion of goals is a key aspect of human psycholou, and the granting of this capability to mature CogPrime systems must be seriously considered. In the case that Ubergoal pool modification is allowed, one useful heuristic may be to make implicit Ubergoals into explicit Ubergoals. For instance: if an Atom is found to consistently receive a lot of RFSs, and has a long time-scale associated with it, then the system should consider making it an Ubergoal. But this heuristic is certainly not sufficient, and any advanced CogPrime system that is going to modify its own Ubergoals should definitely be tuned to put a lot of thought into the process! The science of Ubergoal pool dynamics basically does not exist at the moment, and one would like to have some nice mathematical models of the process prior to experimenting with it in any intelligent capable CogPrime system. Although Schmiddhuber's Godel machine ISch061 has the theoretical capability to modify its ubergoal (note that CogPrime is, in some way, a Gadd machine), there is currently no mathematics allowing us to assess the time and space complexity of such process in a realistic context, given a certain safety confidence target. 22.8 Goal Formation Goal formation in CogPrime is done via PLN inference. In general, what PLN does for goal formation is to look for predicates that can be proved to probabilistically imply the existing goals. These new predicates will then tend to receive RFS currency, according to the logic of RFS's to be outlined in Chapter 24, which (according to goal-driven attention allocation dynamics) will make the system more likely to enact procedures that lead to their satisfaction. As an example of the goal formation process, consider the case where ExternalNovelty Ls an Ubergoal. The agent may then learn that whenever Bob gives it a picture to look at, its quest for external novelty Ls satisfied to a singificant degree. That is, it learns Attraction Evaluation give (Bob, me, picture) ExternalNovelty where Attraction A B measures how much A versus implies B (as explained in Chapter 34). This information allows the agent (the Goal Formation MindAgent) to nominate the atom: EvaluationLink give (Bob, me, picture) as a goal (a subgoal of the original Ubergoal). This is an example of goal refinement, which is one among many ways that PLN can create new goals from existing ones. EFTA00624225
22.10 Context Formation 79 22.9 Goal Fulfillment and Predicate Schematization When there is a Goal Atom G important in the system (with a lot of RFS), the GoalFulfillment MindAgent seeks Schemallodes $ that it has reason to believe, if enacted, will cause G to become true (satisfied). It then adds these to the ActiveSchemaPool, an object to be discussed below. The dynamics by which the GoalFulfillment process works will be discussed in Chapter 24 below. For example, if a Context Node Chas a high truth value at that time (because it is currently satisfied), and is involved in a relation: Attraction C PredictiveAttraction S G (for some Schemallode S and Goal Node G) then this Schemallode S is likely to be se- lected by the GoalFulfillment process for execution. This is the fully formalized version of the Context&Schema —> Goal notion discussed frequently above. The process may also allow the importance of various schema S to bias its choices of which schemata to execute. For instance, following up previous examples, we might have Attraction Evaluation near List self Bob PredictiveAttraction Evaluation ask List Bob "Show me a picture" ExternalNovelty Of course this is a very simplistic relationship but it's similar to a behavior a young child might display. A more advanced agent would utilize a more abstract relationship that distinguishes various situations in which Bob is nearby, and also involves expressing a concept rather than a particular sentence. The formation of these schema-context-goal triads may occur according to generic inference mechanisms. However, a specially-focused PredicateSchematization MindAgent is very useful here as a mechanism of inference control, increasing the number of such relations that will exist in the system. 22.10 Context Formation New contexts are formed by a combination of processes: • The MapEncapstdation MindAgent, which creates Context Nodes embodying repeated pat- terns in the perceived world. This process encompasses - Maps creating Context Nodes involving Atoms that have high STI at the same time EFTA00624226
80 22 Emotion. Motivation, Attention and Control • Example: A large number of Atoms related to towers could be joined into a single map, which would then be a ConceptNode pointing to "tower-related ideas, proce- dures and experiences" - Maps creating Context Nodes that are involved in a temporal activation pattern that recurs at multiple points in the system's experience. • Example: There may be a common set of processes involving creating a building out of blocks: first build the base, then the walls, then the roof. This could be encapsulated as a temporal map embodying the overall nature of the process. In this case, the map contains information of the nature: first do things related to this, then do things related to this, then do things related to this... • A set of concept creation MindAgents (see Chapter 38, which fuse and split Context Nodes to create new ones. — The concept of a building and the concept of a person can be merged to create the concept of a BuildingMan - The concept of a truck built with Legos can be subdivided into trucks you can actually carry Lego blocks with, versus trucks that are "just for show" and can't really be loaded with objects and then carry them around 22.11 Execution Management The GoalFulfillment MindAgent chooses schemata that are found likely to achieve current goals, but it doesn't actually execute these schemata. What it does is to take these schemata and place them in a container called the ActiveSchemaPool. The ActiveSchemaPool contains a set of schemata that have been determined to be reason- ably likely, if enacted, to significantly help with achieving the current goal-set. I.e., everything in the active schema pool should be a schema S so that it has been concluded that Attraction C PredictiveAttraction S G - where C is a currently applicable context and C is one of the goals in the current goal pool - has a high truth value compared to what could be obtained from other known schemata S or other schemata S that could be reasonably expected to be found via reasoning. The decision of which schemata in the ActiveSchemaPool to enact is made by an object called the ExecutionManager, which is invoked each time the SchemaActivation MindAgent is executed. The ExecutionManager is used to select which schemata to execute, based on doing reasoning and consulting memory regarding which active schemata can usefully be executed simultaneously without causing destructive interference (and hopefully causing constructive interference). This process will also sometimes (indirectly) cause new schemata to be created and/or other schemata from the AtomTable to be made active. This process is described more fully in Chapter 24 on action selection. WIKISOURCE:GoalsAndTime For instance, if the agent is involved in building a blocks structure intended to surprise or please Bob, then it might simultaneously carry out sonic blocks-manipulation schema, and also a schema involving looking at Bob to garner his approval. If it can do the blocks manipulation without constantly looking at the blocks, this should be unproblematic for the agent. EFTA00624227






