Ben Goertzel with Cassio Pennachin & Nil Geisweiller & the OpenCog ‘Team Engineering General Intelligence, Part 1: A Path to Advanced AGI via Embodied Learning and Cognitive Synergy September 19, 2013 HOUSE_OVERSIGHT_012899
HOUSE_OVERSIGHT_012900
This book 1s dedicated by Ben Goertzel to his beloved, departed grandfather, Leo Zwell — an amazingly warm-hearted, giving human being who was also a deep thinker and excellent scientist, who got Ben started on the path of science. As a careful experimentalist, Leo would have been properly skeptical of the big hypotheses made here — but he would have been eager to see them put to the test! HOUSE_OVERSIGHT_012901
HOUSE_OVERSIGHT_012902
Preface This is a large, two-part book with an even larger goal: To outline a practical approach to engineering software systems with general intelligence at the human level and ultimately beyond. Machines with flexible problem-solving ability, open-ended learning capability, creativity and eventually, their own kind of genius. Part 1, this volume, reviews various critical conceptual issues related to the nature of intel- ligence and mind. It then sketches the broad outlines of a novel, integrative architecture for Artificial General Intelligence (AGT) called CogPrime ... and describes an approach for giving a young AGI system (CogPrime or otherwise) appropriate experience, so that it can develop its own smarts, creativity and wisdom through its own experience. Along the way a formal theory of general intelligence is sketched, and a broad roadmap leading from here to human-level arti- ficial intelligence. Hints are also given regarding how to eventually, potentially create machines advancing beyond human level — including some frankly futuristic speculations about strongly self-modifying AGI architectures with flexibility far exceeding that of the human brain. Part 2 then digs far deeper into the details of CogPrime’s multiple structures, processes and functions, culminating in a general argument as to why we believe CogPrime will be able to achieve general intelligence at the level of the smartest humans (and potentially greater), and a detailed discussion of how a CogPrime-powered virtual agent or robot would handle some simple practical tasks such as social play with blocks in a preschool context. It first describes the CogPrime software architecture and knowledge representation in detail; then reviews the cognitive cycle via which CogPrime perceives and acts in the world and reflects on itself; and next turns to various forms of learning: procedural, declarative (e.g. inference), simulative and integrative. Methods of enabling natural language functionality in CogPrime are then discussed; and then the volume concludes with a chapter summarizing the argument that CogPrime can lead to human-level (and eventually perhaps greater) AGI, and a chapter giving a thought experiment describing the internal dynamics via which a completed CogPrime system might solve the problem of obeying the request “Build me something with blocks that I haven’t seen before.” The chapters here are written to be read in linear order — and if consumed thus, they tell a coherent story about how to get from here to advanced AGI. However, the impatient reader may be forgiven for proceeding a bit nonlinearly. An alternate reading path for the impatient reader would be to start with the first few chapters of Part 1, then skim the final two chapters of Part 2, and then return to reading in linear order. The final two chapters of Part 2 give a broad overview of why we think the CogPrime design will work, in a way that depends on the technical vil HOUSE_OVERSIGHT_012903
vili details of the previous chapters, but (we believe) not so sensitively as to be incomprehensible without them. This is admittedly an unusual sort of book, mixing demonstrated conclusions with unproved conjectures in a complex way, all oriented toward an extraordinarily ambitious goal. Further, the chapters are somewhat variant in their levels of detail — some very nitty-gritty, some more high level, with much of the variation due to how much concrete work has been done on the topic of the chapter at time of writing. However, it is important to understand that the ideas presented here are not mere armchair speculation — they are currently being used as the basis for an open-source software project called OpenCog, which is being worked on by software developers around the world. Right now OpenCog embodies only a percentage of the overall CogPrime design as described here. But if OpenCog continues to attract sufficient funding or volunteer interest, then the ideas presented in these volumes will be validated or refuted via practice. (As a related note: here and there in this book, we will refer to the "current" CogPrime implementation (in the OpenCog framework); in all cases this refers to OpenCog as of late 2013.) To state one believes one knows a workable path to creating a human-level (and potentially greater) general intelligence is to make a dramatic statement, given the conventional way of thinking about the topic in the contemporary scientific community. However, we feel that once a little more time has passed, the topic will lose its drama (if not its interest and importance), and it will be widely accepted that there are many ways to create intelligent machines — some simpler and some more complicated; some more brain-like or human-like and some less so; some more efficient and some more wasteful of resources; etc. We have little doubt that, from the perspective of AGI science 50 or 100 years hence (and probably even 10-20 years hence), the specific designs presented here will seem awkward, messy, inefficient and circuitous in various respects. But that is how science and engineering progress. Given the current state of knowledge and understanding, having any concrete, comprehensive design and plan for creating AGI is a significant step forward; and it is in this spirit that we present here our thinking about the CogPrime architecture and the nature of general intelligence. In the words of Sir Edmund Hillary, the first to scale Everest: “Nothing Venture, Nothing Win.” Prehistory of the Book The writing of this book began in earnest in 2001, at which point it was informally referred to as “The Novamente Book.” The original “Novamente Book” manuscript ultimately got too big for its own britches, and subdivided into a number of different works — The Hidden Pattern [Goe06a], a philosophy of mind book published in 2006; Probabilistic Logic Networks [GIGHO8], a more technical work published in 2008; Real World Reasoning [GGC" 11], a sequel to Proba- bilistic Logic Networks published in 2011; and the two parts of this book. The ideas described in this book have been the collaborative creation of multiple overlapping communities of people over a long period of time. The vast bulk of the writing here was done by Ben Goertzel; but Cassio Pennachin and Nil Geisweiller made sufficient writing, thinking and editing contributions over the years to more than merit their inclusion of co-authors. Further, many of the chapters here have co-authors beyond the three main co-authors of the book; and HOUSE_OVERSIGHT_012904
ix the set of chapter co-authors does not exhaust the set of significant contributors to the ideas presented. The core concepts of the CogPrime design and the underlying theory were conceived by Ben Goertzel in the period 1995-1996 when he was a Research Fellow at the University of Western Australia; but those early ideas have been elaborated and improved by many more people than can be listed here (as well as by Ben’s ongoing thinking and research). The collaborative design process ultimately resulting in CogPrime started in 1997 when Intelligenesis Corp. was formed — the Webmind AI Engine created in Intelligenesis’s research group during 1997-2001 was the predecessor to the Novamente Cognition Engine created at Novamente LLC during 2001-2008, which was the predecessor to CogPrime. Acknowledgements For sake of simplicity, this acknowledgements section is presented from the perspective of the primary author, Ben Goertzel. Ben will thus begin by expressing his thanks to his primary co-authors, Cassio Pennachin (collaborator since 1998) and Nil Geisweiller (collaborator since 2005). Without outstandingly insightful, deep-thinking colleagues like you, the ideas presented here — let alone the book itself— would not have developed nearly as effectively as what has happened. Similar thanks also go to the other OpenCog collaborators who have co-authored various chapters of the book. Beyond the co-authors, huge gratitude must also be extended to everyone who has been involved with the OpenCog project, and/or was involved in Novamente LLC and Webmind Inc. before that. We are grateful to all of you for your collaboration and intellectual companionship! Building a thinking machine is a huge project, too big for any one human; it will take a team and I’m happy to be part of a great one. It is through the genius of human collectives, going beyond any individual human mind, that genius machines are going to be created. A tiny, incomplete sample from the long list of those others deserving thanks is: e Ken Silverman and Gwendalin Qi Aranya (formerly Gwen Goertzel), both of whom listened to me talk at inordinate length about many of the ideas presented here a long, long time before anyone else was interested in listening. Ken and I schemed some AGI designs at Simon’s Rock College in 1983, years before we worked together on the Webmind AI Engine. Allan Combs, who got me thinking about consciousness in various different ways, at a very early point in my career. I’m very pleased to still count Allan as a friend and sometime collaborator! Fred Abraham as well, for introducing me to the intersection of chaos theory and cognition, with a wonderful flair. George Christos, a deep AI/math/physics thinker from Perth, for re-awakening my interest in attractor neural nets and their cognitive implications, in the mid-1990s. e All of the 130 staff of Webmind Inc. during 1998-2001 while that remarkable, ambitious, peculiar AGI-oriented firm existed. Special shout-outs to the "Voice of Reason" Pei Wang and the "Siberian Madmind" Anton Kolonin, Mike Ross, Cate Hartley, Karin Verspoor and the tragically prematurely deceased Jeff Pressing (compared to whom we are all mental midgets), who all made serious conceptual contributions to my thinking about AGI. Lisa Pazer and Andy Siciliano who made Webmind happen on the business side. And of course Cassio Pennachin, a co-author of this book; and Ken Silverman, who co-architected the whole Webmind system and vision with me from the start. HOUSE_OVERSIGHT_012905
The Webmind Diehards, who helped begin the Novamente project that succeeded Webmind beginning in 2001: Cassio Pennachin, Stephan Vladimir Bugaj, Takuo Henmi, Matthew Ikle’, Thiago Maia, Andre Senna, Guilherme Lamacie and Saulo Pinto Those who helped get the Novamente project off the ground and keep it progressing over the years, including some of the Webmind Diehards and also Moshe Looks, Bruce Klein, Izabela Lyon Freire, Chris Poulin, Murilo Queiroz, Predrag Janicic, David Hart, Ari Heljakka, Hugo Pinto, Deborah Duong, Paul Prueitt, Glenn Tarbox, Nil Geisweiller and Cassio Pennachin (the co-authors of this book), Sibley Verbeck, Jeff Reed, Pejman Makhfi, Welter Silva, Lukasz Kaiser and more All those who have helped with the OpenCog system, including Linas Vepstas, Joel Pitt, Jared Wigmore / Jade O'Neill, Zhenhua Cai, Deheng Huang, Shujing Ke, Lake Watkins, Alex van der Peet, Samir Araujo, Fabricio Silva, Yang Ye, Shuo Chen, Michel Drenthe, Ted Sanders, Gustavo Gama and of course Nil and Cassio again. Tyler Emerson and Eliezer Yudkowsky, for choosing to have the Singularity Institute for AI (now MIRI) provide seed funding for OpenCog. The numerous members of the AGI community who have tossed around AGI ideas with me since the first AGI conference in 2006, including but definitely not limited to: Stan Franklin, Juergen Schmidhuber, Marcus Hutter, Kai-Uwe Kuehnberger, Stephen Reed, Blerim Enruli, Kristinn Thorisson, Joscha Bach, Abram Demski, Itamar Arel, Mark Waser, Randal Koene, Paul Rosenbloom, Zhongzhi Shi, Steve Omohundro, Bill Hibbard, Eray Ozkural, Brandon Rohrer, Ben Johnston, John Laird, Shane Legg, Selmer Bringsjord, Anders Sandberg, Alexei Samsonovich, Wlodek Duch, and more The inimitable "Artilect Warrior" Hugo de Garis, who (when he was working at Xiamen University) got me started working on AGI in the Orient (and introduced me to my wife Ruiting in the process). And Changle Zhou, who brought Hugo to Xiamen and generously shared his brilliant research students with Hugo and me. And Min Jiang, collaborator of Hugo and Changle, a deep AGI thinker who is helping with OpenCog theory and practice at time of writing. Gino Yu, who got me started working on AGI here in Hong Kong, where I am living at time of writing. As of 2013 the bulk of OpenCog work is occurring in Hong Kong via a research grant that Gino and I obtained together e Dan Stoicescu, whose funding helped Novamente through some tough times. e Jeffrey Epstein, whose visionary funding of my AGI research has helped me through a number of tight spots over the years. At time of writing, Jeffrey is helping support the OpenCog Hong Kong project. Zeger Karssen, founder of Atlantis Press, who conceived the Thinking Machines book series in which this book appears, and who has been a strong supporter of the AGI conference series from the beginning My wonderful wife Ruiting Lian, a source of fantastic amounts of positive energy for me since we became involved four years ago. Ruiting has listened to me discuss the ideas contained here time and time again, often with judicious and insightful feedback (as she is an excellent AI researcher in her own right); and has been wonderfully tolerant of me diverting numerous evenings and weekends to getting this book finished (as well as to other AGl-related pursuits). And my parents Ted and Carol and kids Zar, Zeb and Zade, who have also indulged me in discussions on many of the themes discussed here on countless occasions! And my dear, departed grandfather Leo Zwell, for getting me started in science. HOUSE_OVERSIGHT_012906
x1 e Crunchkin and Pumpkin, for regularly getting me away from the desk to stroll around the village where we live; many of my best ideas about AGI and other topics have emerged while walking with my furry four-legged family members September 2013 Ben Goertzel HOUSE_OVERSIGHT_012907
HOUSE_OVERSIGHT_012908
Contents 1 WWtFGAUWCHION » pci ms mimes emims emis Mews Hs EE ws ERE ws EE ws EES Ew KEE Eom 1 1.1 AT Returns to Its Roots ......... 0.00 eee eta 1 1.2 AGI versus Narrow AI ....... 000.0000 teens 2 1.3 CogPrime 2.0.0.0... teen etn teen teens 3 LA The Sée@et SAUCE :a:es emses cuee@s Emdes dE SMES MEME SOEMS SUEMS EME!s tweed ee 3 1.5 Esxtraordinary Proof? + csins caine cae us a3 ou ems CHEE CREE ORES REE RES wR 4 1.6 Potential Approaches to AGI ..........0..0..0 00. ec tee 6 1.6.1 Build AGI from Narrow Al ..........00.0000 20000 6 1.6.2 Enhancing Chatbots ..........0 00.00 ccc eee eee 6 1:6:8 Emulating the Brain,s casas caus os ewams yams Abs BRS BREE BREWS Bae 6 1.6.4 Evolve an AGI... 2... tenes 7 1.6.5 Derive an AGI design mathematically................0....0.2.00000. 7 1.6.6 Use heuristic computer science methods ............0. 0 ccc cece eee 8 1.6.7 Integrative Cognitive Architecture ............0.0020 000 e eee eee 8 1.6.8 Can Digital Computers Really Be Intelligent? .....................-0. 8 1.7 Five Key Words ........ 00.000: c cece ene teeta 9 1.7.1. Memory and Cognition in CogPrime ................ 000.00 eee eee 10 1.8 Virtually and Robotically Embodied AI ............ 0.0.0: cece ee eee 11 1.9 Language learning: sss animes am ems Oem SMS EMERG THES CRED PERS BOERS URES ae 12 1.10 AGI Ethics 2.0.0... ee ene nent ene ae 12 1.11 Structure of the Book .............0 00002 tee 13 1.12 Key Claims of the Book ......... 00.0... 13 Section I Artificial and Natural General Intelligence 2 What Is Human-Like General Intelligence? ...........................000. 19 Ql Introduction, asus cama ue ems oe ems UE RS BR EMSRS CRESS EAERS FAERIE BAERS BAERS BiH 19 2.1.1 What Is General Intelligence? ......... 0.0.00. eee 19 2.1.2 What Is Human-like General Intelligence? ......................0.0005 20 2.2 Commonly Recognized Aspects of Human-like Intelligence ................... 20 2.3 Further Characterizations of Humanlike Intelligence .....................-.. 24 2.3.1 Competencies Characterizing Human-like Intelligence ................. 24 2.3.2 Gardner’s Theory of Multiple Intelligences........................--. 25 xili HOUSE_OVERSIGHT_012909
xiv Contents 2.3.3 Newell’s Criteria for a Human Cognitive Architecture ................. 26 2.3.4 intelligence and Creativity ........0 0.000 c cee ce eee ee eee 26 2.4 Preschool as a View into Human-like General Intelligence.................... 27 2.4.1 Design for an AGI Preschool.......... 0.0.0 ee eee 28 2.5 Integrative and Synergetic Approaches to Artificial General Intelligence ....... 29 2.5.1 Achieving Humanlike Intelligence via Cognitive Synergy ............... 30 A Patternist Philosophy of Mind .....................0 2.00.0 cece eee 35 3.1 Introduction ........ 00.000 ce tenet nee ee 35 3.2 Some Patternist Principles...........000..0 0000 ec tenes 35 3d COSRIIVE SYPETEY a: ws coe ws cmems mews Ca we ws Swe w Ss Swe w Ss EME ws EME mS MES Hom 40 3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis ........ 42 3.4.1 Component-Systems and Self-Generating Systems.................... 42 3.4.2 Analysis and Synthesis........ 0.0.0. c eee eee eee 43 3.4.3 The Dynamic of Iterative Analysis and Synthesis .................... 46 3.4.4 Self and Focused Attention as Approximate Attractors of the Dynamic of Iterated Forward-Analysis .......0.0.. 00000: cece eee eee eee AT 3.4.5 Conclusion ......... 000002 50 3.5 Perspectives on Machine Consciousness........0..000 000 cece eee eee eee 51 3.6 Postscript: Formalizing Pattern ......... 0000. cee tees 53 Brief Survey of Cognitive Architectures .................. 0.00. cee ee eee eee 57 4.1 Introduction ..........0.0. 000 0c eee eee ee 57 4.2 Symbolic Cognitive Architectures.........000 0.0 cece eee eee eee 58 4.2.1 SOAR 2.0... 2 ec tee teen teens 60 4.2.2 ACTAR 2.0... cette tte e et eee 61 4.2.38 Cycand Texai ..... 0. ee eee eens 62 4.2.4 NARS 2.0... 2c ett ttt ents 63 42.0 GLATR and SNePS wo: sasaz sais5 95 22ER5 THEW CES SRPMS SG EME Oe EME Bae 64 4.3 Emergentist Cognitive Architectures ...........0..00 000 cece eee 65 4.3.1 DeSTIN: A Deep Reinforcement Learning Approach to AGI ........... 66 4.3.2 Developmental Robotics Architectures ........0.0000 0.0 ccc eee eee 72 4.4 Hybrid Cognitive Architectures......0... 00000: cece eee teenies 73 4.4.1 Neural versus Symbolic; Global versus Local ............... 002.000 00- 75 4.5 Globalist versus Localist Representations ............0...00 00 eee eee eee 78 4.5.1 CLARION ..... 0.0.0.0 79 4.5.2 The Society of Mind and the Emotion Machine ...................... 80 AB JDUUAD: sms cass SecEE ShFE SESE 48 SHEMG THEW CSTE S SRPMS EME OeE mE SR 80 ABA AD/RCS o.oo ccc c cece eee v cen eeeeeeteteeeerenbrees 81 4.5.5 PolyScheme ........ 00.0.0 e eee eee nee tenes 82 4.5.6 Joshua Blue ........ 0. teens 83 Holst Ji. oss seems amems tees EOS dE SMES MEME SOEMS SUES EME!s dees ee; 84 4.5.8 The Global Workspace ........0...0.000 00 cee 84 4.5.9 The LIDA Cognitive Cycle ....... 0.0... ee eee 85 4.5.10 Psi and MicroPsi .............0 00.020 88 4.5.11 The Emergence of Emotion in the Psi Model....................22255 91 4.5.12 Knowledge Representation, Action Selection and Planning in Psi ....... 93 HOUSE_OVERSIGHT_012910
Contents xv 4.5.13 Psi versus CogPrime ...... 0.0... eee eee 94 5 A Generic Architecture of Human-Like Cognition ........................ 95 5.1 Introduction ........ 00.000 tne teen eee 95 5.2 Key Ingredients of the Integrative Human-Like Cognitive Architecture Diagram 96 5.38 An Architecture Diagram for Human-Like General Intelligence ............... 97 5.4 Interpretation and Application of the Integrative Diagram ................... 104 6 A Brief Overview of CogPrime ................ 0... cc eee 107 6.1 Introduction ........ 00.000 0c eee 107 6.2 High-Level Architecture of CogPrime .......... 0.00.00 eee eee eee eee 107 6.3 Current and Prior Applications of OpenCog ..............0..0 022 eee eee 108 6.3.1 Transitioning from Virtual Agents to a Physical Robot ................ 110 6.4 Memory Types and Associated Cognitive Processes in CogPrime ............. 110 6.4.1 Cognitive Synergy in PLN... eee 111 6.5 Goal-Oriented Dynamics in CogPrime ......... 0.0.0.0 e eee ee eee 113 6.6 Analysis and Synthesis Processes in CogPrime ...............00 2000 c eee eee 114 6.7 Conclusion .......0 0.00 ete etneeeee 116 Section II Toward a General Theory of General Intelligence 7 <A Formal Model of Intelligent Agents .................. 0... cee ee ee 129 Cel, LAPOUCHON, 22s cue ws cme ws (mews (mews Ca Swe WS EMER EMEwS EME WS EME wS KES Hom 129 7.2 A Simple Formal Agents Model (SRAM) .............. 00: eee ee eee eee eens 130 T7210 Goals... ee eee ene e ene ee 131 7.2.2 Memory Stores...... 00... ene ene 132 7.2.3 The Cognitive Schematic ........... 0.000 c cece ee eee 133 7.3 Toward a Formal Characterization of Real-World General Intelligence ......... 135 7.3.1 Biased Universal Intelligence............ 0.00. ee eee 136 7.3.2 Connecting Legg and Hutter’s Model of Intelligent Agents to the Real World... ee cet ete e tne e eens 137 7.3.3 Pragmatic General Intelligence ............ 00... c eee eee 138 7.3.4 Incorporating Computational Cost............0. 0.0000 eee eee 139 7.3.5 Assessing the Intelligence of Real-World Agents ...................-.. 139 7.4 Intellectual Breadth: Quantifying the Generality of an Agent’s Intelligence..... 141 7.5 Conclusion ..... 0... eee eee eee tree eens 142 8 Cognitive Synergy ........... 0... een eens 143 8.1 Cognitive Synergy... 0... eet e nee tenes 143 8.2 Cognitive Synergy .... 0... eee teen een eee 144 8.3 Cognitive Synergy in CogPrime .......... 0.00 cece ee tees 146 8.3.1 Cognitive Processes in CogPrime 1.0.0... 0... cee ee eee 146 8.4 Some Critical Synergies ..... 00... eee eee ees 149 8.5 The Cognitive Schematic ......... 0.00. eee eee eee 151 8.6 Cognitive Synergy for Procedural and Declarative Learning .................. 153 8.6.1 Cognitive Synergy in MOSES... 0... eee 153 8.6.2 Cognitive Synergy in PLN .... 2... eee 155 8.7 Is Cognitive Synergy Tricky?.. 0.0.0... eee ee eee ee 157 HOUSE_OVERSIGHT_012911
xvi Contents 8.7.1 The Puzzle: Why Is It So Hard to Measure Partial Progress Toward Hii lével AGL? ws euses ames €3 8@9mE S@EWS SUES EREMS EOEws dOEAs eos 157 8.7.2 A Possible Answer: Cognitive Synergy is Tricky!...................... 158 8.7.38 Conclusion ......... 000.0 teeta 159 9 General Intelligence in the Everyday Human World ...................... 161 O21 Jntreduction, a:as vacms se3ms Sa M2 Sh EMT SE EMERG SERS CREWE SETHE SG EWE GREE See 161 9.2 Some Broad Properties of the Everyday World That Help Structure Intelligence 162 9.3 Embodied Communication..........0.0..0 00000 eee 163 9.3.1 Generalizing the Embodied Communication Prior .................... 166 GA Naive: PRYSIES ws cuews cuews mews (mews Ca Swe wS Ewe WS MEWS EME ws EME wS KEE Lom 166 9.4.1 Objects, Natural Units and Natural Kinds .......................-.. 167 9.4.2 Events, Processes and Causality ........... 00... c cece ee eee 168 9.4.3 Stuffs, States of Matter, Qualities....... 0... 0. eee eee eee eee 168 9.4.4 Surfaces, Limits, Boundaries, Media .......... 0.0.0.0 cece eee eee eee 168 9.4.5 What Kind of Physics Is Needed to Foster Human-like Intelligence?..... 169 9.5 Folk Psychology ........ 0... 00. eee e nee tenes 170 9.5.1 Motivation, Requiredness, Value ......... 00.00 cece cee eee eee eee 171 9.6 Body and Mind .......... 0... cece eee eet een eee 171 9.6.1 The Human Sensorium........... 2000000 c eect eee eee 171 9.6.2 The Human Body’s Multiple Intelligences ......................00005 172 9.7 The Extended Mind and Body ........... 0.0.0.0 c eee eee 176 9.8 Conclusion ......... 000.0 ett eee teens 176 10 A Mind-World Correspondence Principle ...........................022055 177 10.1 Introduction ..........0 0.00 teens 177 10.2 What Might a General Theory of General Intelligence Look Like? ............ 178 10.3 Steps Toward A (Formal) General Theory of General Intelligence ............. 179 10.4 The Mind-World Correspondence Principle ................00 00022 eee eee 180 10.5 How Might the Mind-World Correspondence Principle Be Useful? ............ 181 10.6 Conclusion ........00..0 0002 ct teeta 182 Section III Cognitive and Ethical Development 11 Stages of Cognitive Development ...............00. 0.00 eee eee 187 11.1 Introduction»... 0.0... ee en eee e eens 187 11.2 Piagetan Stages in the Context of a General Systems Theory of Development .. 188 11.3 Piaget’s Theory of Cognitive Development .............. 0.0... c eee ee eee 188 11.3.1 Perry’s Stages... 0.00. eee teen eee eee 192 11.3.2 Keeping Continuity in Mind .............. 00.0. eee eee eee 192 11.4 Piaget’s Stages in the Context of Uncertain Inference ....................005 193 11.4.1 The Infantile Stage .... 0... eee eee 195 11.4.2 The Concrete Stage.... 00... ee eet eee eee 196 11.4.8 The Formal Stage .......0.. 0.0 eee eee eee 200 114.4 ‘The Réflexivé Sta@¢ s cascs ewe cs es caw eeiws eeews EME ws EME wE EEE Oo 202 HOUSE_OVERSIGHT_012912
Contents xvii 12 The Engineering and Development of Ethics..........................005. 205 121 JPOAWMEtIOH, «205 sa2ms emees Cees ENNeS LE SMES MEME OEMS SERS EME!s dees em; 205 12.2 Review of Current Thinking on the Risks of AGI ................ 00.0002 eee 206 12.3 The Value of an Explicit Goal System. ....... 0.0... eee ee ee 209 12.4 Ethical Synergy ...... 0.00. eee eee tenet eae 210 12.4.1 Stages of Development of Declarative Ethics .................2..000005 211 12.4.2 Stages of Development of Empathic Ethics .....................2.2-- 214 12.4.3 An Integrative Approach to Ethical Development..................... 215 12.4.4 Integrative Ethics and Integrative AGI ...................0 0. eee eee 216 12.5 Clarifying the Ethics of Justice: Extending the Golden Rule in to a Multifactorial Ethical Model ..........0..000 000 cc eee eee 219 12.5.1 The Golden Rule and the Stages of Ethical Development ............. 222 12.5.2 The Need for Context-Sensitivity and Adaptiveness in Deploying Ethical Principles .......0.0.000 0000 ec eee eee 223 12.6 The Ethical Treatment of AGIs ..........0...0 000 c cece ee 226 12.6.1 Possible Consequences of Depriving AGIs of Freedom ................. 228 12.6.2 AGI Ethics as Boundaries Between Humans and AGIs Become Blurred . 229 12.7 Possible Benefits of Closely Linking AGIs to the Global Brain................ 230 12.7.1 The Importance of Fostering Deep, Consensus-Building Interactions Between People with Divergent Views............... 00.0: e ee eee 231 12.8 Possible Benefits of Creating Societies of AGIS ........... 00. cece eee eee 233 12.9 AGI Ethics As Related to Various Future Scenarios ................00 00 eee 234 12.9.1 Capped Intelligence Scenarios ...........00.. 0000 eects 234 12.9.2 Superintelligent AI: Soft-Takeoff Scenarios ................0 0020.00 e eee 235 12.9.3 Superintelligent AI: Hard-Takeoff Scenarios .................2.000005 235 12.9.4 Global Brain Mindplex Scenarios ..............0.0..00 000 eee eee 237 12.10Conclusion: Eight Ways to Bias AGI Toward Friendliness.................... 239 12.10.1 Encourage Measured Co-Advancement of AGI Software and AGI Ethics TDCOLY +s esos WES UES URES UR ERISA CREE CRESS ERERG BAERS BRESS Bret 241 12.10.2Develop Advanced AGI Sooner Not Later .................0..002205- 241 Section IV Networks for Explicit and Implicit Knowledge Representation 13 Local, Global and Glocal Knowledge Representation...................... 245 13.1 Introduction ..........000 0002 eens 245 13.2 Localized Knowledge Representation using Weighted, Labeled Hypergraphs .... 246 13.2.1 Weighted, Labeled Hypergraphs............... 0000: c eee eee eee eee 246 13.3 Atoms: Their Types and Weights ............ 0.00. cee cee eee eee eee 247 13.3.1 Some Basic Atom Types .......... 0.0.00 eee eee 247 13.3.2 Variable Atoms ..........0 000000 cee teen ee 249 13.368 Logical Links o3 casms ease ones os oWams EW EMS EWES ERS BREE BREWS Bm: 251 13.3.4 Temporal Links ..........00 0.00000 tenes 252 13.3.5 Associative Links... 0.2.0.0... 0000 eee ee 253 13.3.6 Procedure Nodes .........00..00 00 cette eee 254 13.3.7 Links for Special External Data Types ..............0 002.00 eee eee eee 254 13.3.8 Truth Values and Attention Values ........... 0.00.00 eee eee eee eee 255 13.4 Knowledge Representation via Attractor Neural Networks ................... 256 HOUSE_OVERSIGHT_012913
xviii Contents 13.4.1 The Hopfield neural net model ..............0..0 022 e eee 256 13.4.2 Knowledge Representation via Cell Assemblies .................2.05- 257 13.5 Neural Foundations of Learning ..........0 0.000 ee eee 258 13.5.1 Hebbian Learning............ 0.0.00 eee ete eee 258 13.5.2 Virtual Synapses and Hebbian Learning Between Assemblies .......... 258 35:38 Netital DArwitiiSitt +2; cases cose: cs s@5m5 $@EME DOES SERS EOE RS Emtes CO: 259 13.6 Glocal Memory .......00 0.0 cc eee t ene e eae 260 13.6.1 A Semi-Formal Model of Glocal Memory .............0...0 00 e eevee 262 13.6.2 Glocal Memory in the Brain ............ 000... eee 263 13.6.3 Glocal Hopfield Networks... .......0.000. 0022 c cee ee 268 13.6.4 Neural-Symbolic Glocality in CogPrime ............. 0000.0 269 14 Representing Implicit Knowledge via Hypergraphs ....................... 271 14.1 Introduction .......00..000 000 tte teen ee 271 14.2: Key Vertex atid Edge Typé$ casas cuine cs ce ees ee ses ems es emo ws Meee MERE OE 271 14.3 Derived Hypergraphs .......... 0.000. c cece tenet eae 272 14.3.1 SMEPH Vertices .........000 0000. tte 272 14.3.2 SMEPH Edges ...... 0... ee eee eens 273 14.4 Implications of Patternist Philosophy for Derived Hypergraphs of Intelligent SYSUCIIS ss cm ems FERS HEME HEME OEE OS CREME CR EMS CR EMS ORES OR EME OR EE Ome 274 14.4.1 SMEPH Principles in CogPrime ............. 000... c eee eee 276 15 Emergent Networks of Intelligence..............00... 002. c eee eee 279 L5el JPOAWEtIOH, «205 seme emees Cees EMNOs LE SMEG MEME SEMS SERS EM E!s dees ema 279 15.2 Small World Networks ........0..000.0 0000 c cette 280 15.3 Dual Network Structure .......0..0.0 000 ete 281 15.3.1 Hierarchical Networks...........0.00.000 00 eee ee 281 15.3.2 Associative, Heterarchical Networks. ........00.000 0000 ccc ees 282 15-38 Dual Networks: « sss: sasa3 ac $5 65 {MEMS EMTWS CM EG SRE EASE wE BREE SR 284 Section V A Path to Human-Level AGI 16 AGI Preschool.......0.... 0... teen eens 289 16.1 Introduction .......... 0.0000 teen teens 289 16.1.1 Contrast to Standard AI Evaluation Methodologies ................... 290 16.2 Elements of Preschool Design ...........00 0.00 e eee eee eee 291 16.3 Elements of Preschool Curriculum ........0... 00000 c eee eee eee 292 16.3.1 Preschool in the Light of Intelligence Theory ....................0-. 293 16.4 Task-Based Assessment in AGI Preschool ............0 0.00 cece ee eee 295 16.5 Beyond Preschool ......... 00... eee ete teens 298 16.6 Issues with Virtual Preschool Engineering .............. 000: cece eee eee 298 16.6.1 Integrating Virtual Worlds with Robot Simulators .................... 301 16.6.2 BlocksNBeads World ............00. 0000 eee cece eee 301 17 A Preschool-Based Roadmap to Advanced AGI.....................000055 307 LG Jnttreduction, asm: isms sa32 Sh 23 SRESE OF TEMG THEW CHES PREM SEEMS OeT mE Sa 307 17.2 Measuring Incremental Progress Toward Human-Level AGI .................. 308 17.3 Conclusion ........00..0 0002 ete eee tees 315 HOUSE_OVERSIGHT_012914
Contents xix 18 Advanced Self-Modification: A Possible Path to Superhuman AGI........ 317 L8el JPOAWEtIOH, «205 ses emtes Cees ENNOE LE SMES MEME SEMS SERS EME!s dees ems 317 18.2 Cognitive Schema Learning .......... 0.000 e cee eee eens 318 18.3 Self-Modification via Supercompilation ..............0..00 000 cee eee 319 18.3.1 Three Aspects of Supercompilation .............0..00 000022 eee eee ee 321 18.3.2 Supercompilation for Goal-Directed Program Modification............. 322 18.4 Self-Modification via Theorem-Proving ........... 0.00 eee eee eee 323 A Glossary ..... 0.00000 tne teen eee 325 A.1 List of Specialized Acronyms.........0.. 00 0c cece eee eee eee 325 A.2 Glossary of Specialized Terms ........... 00000. c cece eee teen eee 326 References 2.0... eet et ete eee t eee 343 HOUSE_OVERSIGHT_012915
HOUSE_OVERSIGHT_012916
Chapter 1 Introduction 1.1 AI Returns to Its Roots Our goal in this book is straightforward, albeit ambitious: to present a conceptual and technical design for a thinking machine, a software program capable of the same qualitative sort of general intelligence as human beings. It’s not certain exactly how far the design outlined here will be able to take us, but it seems plausible that once fully implemented, tuned and tested, it will be able to achieve general intelligence at the human level and in some respects beyond. Our ultimate aim is Artificial General Intelligence construed in the broadest sense, including artificial creativity and artificial genius. We feel it is important to emphasize the extremely broad potential of Artificial General Intelligence systems. The human brain is not built to be modified, except via the slow process of evolution. Engineered AGI systems, built according to designs like the one outlined here, will be much more susceptible to rapid improvement from their initial state. It seems reasonable to us to expect that, relatively shortly after achieving the first roughly human-level AGI system, AGI systems with various sorts of beyond-human-level capabilities will be achieved. Though these long-term goals are core to our motivations, we will spend much of our time here explaining how we think we can make AGI systems do relatively simple things, like the things human children do in preschool. The penultimate chapter of (Part 2 of) the book describes a thought-experiment involving a robot playing with blocks, responding to the request "Build me something I haven’t seen before." We believe that preschool creativity contains the seeds of, and the core structures and dynamics underlying, adult human level genius ... and new, as yet unforeseen forms of artificial innovation. Much of the book focuses on a specific AGI architecture, which we call CogPrime, and which is currently in the midst of implementation using the OpenCog software framework. CogPrime is large and complex and embodies a host of specific decisions regarding the various aspects of intelligence. We don’t view CogPrime as the unique path to advanced AGI, nor as the ultimate end-all of AGI research. We feel confident there are multiple possible paths to advanced AGI, and that in following any of these paths, multiple theoretical and practical lessons will be learned, leading to modifications of the ideas possessed while along the early stages of the path. But our goal here is to articulate one path that we believe makes sense to follow, one overall design that we believe can work. HOUSE_OVERSIGHT_012917
2 1 Introduction 1.2 AGI versus Narrow AI An outsider to the AI field might think this sort of book commonplace in the research literature, but insiders know that’s far from the truth. The field of Artificial Intelligence (AT) was founded in the mid 1950s with the aim of constructing “thinking machines” - that is, computer systems with human-like general intelligence, including humanoid robots that not only look but act and think with intelligence equal to and ultimately greater than human beings. But in the intervening years, the field has drifted far from its ambitious roots, and this book represents part of a movement aimed at restoring the initial goals of the AI field, but in a manner powered by new tools and new ideas far beyond those available half a century ago. After the first generation of AI researchers found the task of creating human-level AGI very difficult given the technology of their time, the AI field shifted focus toward what Ray Kurzweil has called "narrow AI" — the understanding of particular specialized aspects of intelligence; and the creation of AI systems displaying intelligence regarding specific tasks in relatively narrow domains. In recent years, however, the situation has been changing. More and more researchers have recognized the necessity — and feasibility — of returning to the original goals of the field. In the decades since the 1950s, cognitive science and neuroscience have taught us a lot about what a cognitive architecture needs to look like to support roughly human-like general intelli- gence. Computer hardware has advanced to the point where we can build distributed systems containing large amounts of RAM and large numbers of processors, carrying out complex tasks in real time. The AI field has spawned a host of ingenious algorithms and data structures, which have been successfully deployed for a huge variety of purposes. Due to all this progress, increasingly, there has been a call for a transition from the current focus on highly specialized “narrow AI” problem solving systems, back to confronting the more difficult issues of “human level intelligence” and more broadly “artificial general intelligence (AGI).” Recent years have seen a growing number of special sessions, workshops and confer- ences devoted specifically to AGI, including the annual BICA (Biologically Inspired Cognitive Architectures) AAAT Symposium, and the international AGI conference series (one in 2006, and annual since 2008). And, even more exciting, as reviewed in Chapter 4, there are a number of contemporary projects focused directly and explicitly on AGI (sometimes under the name "AGI", sometimes using related terms such as "Human Level Intelligence"). In spite of all this progress, however, we feel that no one has yet clearly articulated a detailed, systematic design for an AGI, with potential to yield general intelligence at the human level and ultimately beyond. In this spirit, our main goal in this lengthy two-part book is to outline a novel design for a thinking machine — an AGI design which we believe has the capability to produce software systems with intelligence at the human adult level and ultimately beyond. Many of the technical details of this design have been previously presented online in a wikibook [Goel0b]; and the basic ideas of the design have been presented briefly in a series of conference papers [GPSL03, GPPG06, Goe09c]. But the overall design has not been presented in a coherent and systematic way before this book. In order to frame this design properly, we also present a considerable number of broader theoretical and conceptual ideas here, some more and some less technical in nature. HOUSE_OVERSIGHT_012918
1.4 The Secret Sauce 3 1.3 CogPrime The AGI design presented here has not previously been granted a name independently of its particular software implementations, but for the purposes of this book it needs one, so we’ve christened it CogPrime . This fits with the name “OpenCogPrime” that has already been used to describe the software implementation of CogPrime within the open-source OpenCog AGI software framework. The OpenCogPrime software, right now, implements only a small fraction of the CogPrime design as described here. However, OpenCog was designed specifically to enable efficient, scalable implementation of the full CogPrime design (as well as to serve as a more general framework for AGI R&D); and work currently proceeds in this direction, though there is a lot of work still to be done and many challenges remain. 1 The CogPrime design is more comprehensive and thorough than anything that has been presented in the literature previously, including the work of others reviewed in Chapter 4. It covers all the key aspects of human intelligence, and explains how they interoperate and how they can be implemented in digital computer software. Part 1 of this work outlines CogPrime at a high level, and makes a number of more general points about artificial general intelligence and the path thereto; then Part 2 digs deeply into the technical particulars of CogPrime. Even Part 2, however, doesn’t explain all the details of CogPrime that have been worked out so far, and it definitely doesn’t explain all the implementation details that have gone into designing and building OpenCogPrime. Creating a thinking machine is a large task, and even the intermediate level of detail takes up a lot of pages. 1.4 The Secret Sauce There is no consensus on why all the related technological and scientific progress mentioned above has not yet yielded AI software systems with human-like general intelligence (or even greater levels of brilliance!). However, we hypothesize that the core reason boils down to the following three points: e Intelligence depends on the emergence of certain high-level structures and dynamics across a system’s whole knowledge base; e We have not discovered any one algorithm or approach capable of yielding the emergence of these structures; e Achieving the emergence of these structures within a system formed by integrating a number of different ATI algorithms and structures requires careful attention to the manner in which | This brings up a terminological note: At several places in this Volume and the next we will refer to the current CogPrime or OpenCog implementation; in all cases this refers to OpenCog as of late 2013. We realize the risk of mentioning the state of our software system at time of writing: for future readers this may give the wrong impression, because if our project goes well, more and more of CogPrime will get implemented and tested as time goes on (e.g. within the OpenCog framework, under active development at time of writing). However, not mentioning the current implementation at all seems an even worse course to us, since we feel readers will be interested to know which of our ideas — at time of writing — have been honed via practice and which have not. Online resources such as http: //opencog.org may be consulted by readers curious about the current state of the main OpenCog implementation; though in future forks of the code may be created, or other systems may be built using some or all of the ideas in this book, ete. HOUSE_OVERSIGHT_012919
4 1 Introduction these algorithms and structures are integrated; and so far the integration has not been done in the correct way. The human brain appears to be an integration of an assemblage of diverse structures and dynamics, built using common components and arranged according to a sensible cognitive archi- tecture. However, its algorithms and structures have been honed by evolution to work closely together — they are very tightly inter-adapted, in the same way that the different organs of the body are adapted to work together. Due to their close interoperation they give rise to the overall systemic behaviors that characterize human-like general intelligence. We believe that the main missing ingredient in AI so far is cognitive synergy: the fitting-together of differ- ent intelligent components into an appropriate cognitive architecture, in such a way that the components richly and dynamically support and assist each other, interrelating very closely in a similar manner to the components of the brain or body and thus giving rise to appropriate emergent structures and dynamics. This leads us to one of the central hypotheses underlying the CogPrime approach to AGI: that the cognitive synergy ensuing from integrating multiple symbolic and subsymbolic learning and memory components in an appro- priate cognitive architecture and environment, can yield robust intelligence at the human level and ultimately beyond. The reason this sort of intimate integration has not yet been explored much is that it’s difficult on multiple levels, requiring the design of an architecture and its component algorithms with a view toward the structures and dynamics that will arise in the system once it is coupled with an appropriate environment. Typically, the AI algorithms and structures corresponding to different cognitive functions have been developed based on divergent theoretical principles, by disparate communities of researchers, and have been tuned for effective performance on different tasks in different environments. Making such diverse components work together in a truly synergetic and cooperative way is a tall order, yet we believe that this — rather than some particular algorithm, structure or architectural principle — is the “secret sauce” needed to create human-level AGI based on technologies available today. 1.5 Extraordinary Proof? There is a saying that “extraordinary claims require extraordinary proof” and by that stan- dard, if one believes that having a design for an advanced AGI is an extraordinary claim, this book must be rated a failure. We don’t offer extraordinary proof that CogPrime, once fully implemented and educated, will be capable of human-level general intelligence and more. It would be nice if we could offer mathematical proof that CogPrime has the potential we think it does, but at the current time mathematics is simply not up to the job. We’ll pursue this direction briefly in Chapter 7 and other chapters, where we'll clarify exactly what kind of mathematical claim “CogPrime has the potential for human-level intelligence” turns out to be. Once this has been clarified, it will be clear that current mathematical knowledge does not yet let us evaluate, or even fully formalize, this kind of claim. Perhaps one day rigorous and detailed analyses of practical AGI designs will be feasible — and we look forward to that day — but it’s not here yet. Also, it would of course be profoundly exciting if we could offer dramatic practical demon- strations of CogPrime’s capabilities. We do have a partial software implementation, in the OpenCogPrime system, but currently the things OpenCogPrime does are too simple to really HOUSE_OVERSIGHT_012920
1.5 Extraordinary Proof? 5 serve as proofs of CogPrime’s power for advanced AGI. We have used some CogPrime ideas in the OpenCog framework to do things like natural language understanding and data mining, and to control virtual dogs in online virtual worlds; and this has been very useful work in multiple senses. It has taught us more about the CogPrime design; it has produced some useful software systems; and it constitutes fractional work building toward a full OpenCog based implemen- tation of CogPrime. However, to date, the things OpenCogPrime has done are all things that could have been done in different ways without the CogPrime architecture (though perhaps not as elegantly nor with as much room for interesting expansion). The bottom line is that building an AGT is a big job. Software companies like Microsoft spend dozens to hundreds of man-years building software products like word processors and operating systems, so it should be no surprise that creating a digital intelligence is also a relatively large- scale software engineering project. As time advances and software tools improve, the number of man-hours required to develop advanced AGI gradually decreases — but right now, as we write these words, it’s still a rather big job. In the OpenCogPrime project we are making a serious attempt to create a CogPrime based AGI using an open-source development methodology, with the open-source Linux operating system as one of our inspirations. But the open-source methodology doesn’t work magic either, and it remains a large project, currently at an early stage. I emphasize this point so that readers lacking software engineering expertise don’t take the currently fairly limited capabilities of OpenCogPrime as somehow a damning indictment of the potential of the CogPrime design. The design is one thing, the implementation another — and the OpenCogPrime implementation currently encompasses perhaps one third to one half of the key ideas in this book. So we don’t have extraordinary proof to offer. What we aim to offer instead are clearly- constructed conceptual and technical arguments as to why we think the CogPrime design has dramatic AGI potential. It is also possible to push back a bit on the common intuition that having a design for human- level AGI is such an “extraordinary claim.” It may be extraordinary relative to contemporary science and culture, but we have a strong feeling that the AGI problem is not difficult in the same ways that most people (including most AI researchers) think it is. We suspect that in hindsight, after human-level AGI has been achieved, people will look back in shock that it took humanity so long to come up with a workable AGI design. As you'll understand once you’ve finished Part 1 of the book, we don’t think general intelligence is nearly as “extraordinary” and mysterious as it’s commonly made out to be. Yes, building a thinking machine is hard — but humanity has done a lot of other hard things before. It may seem difficult to believe that human-level general intelligence could be achieved by something as simple as a collection of algorithms linked together in an appropriate way and used to control an agent. But we suggest that, once the first powerful AGI systems are produced, it will become apparent that engineering human-level minds is not so profoundly different from engineering other complex systems. All in all, we’ll consider the book successful if a significant percentage of open-minded, appropriately-educated readers come away from it scratching their chins and pondering: “Hmm. You know, that just might work.” and a small percentage come away thinking "Now that’s an initiative I'd really like to help with!". HOUSE_OVERSIGHT_012921
6 1 Introduction 1.6 Potential Approaches to AGI In principle, there is a large number of approaches one might take to building an AGI, starting from the knowledge, software and machinery now available. This is not the place to review them in detail, but a brief list seems apropos, including commentary on why these are not the approaches we have chosen for our own research. Our intent here is not to insult or dismiss these other potential approaches, but merely to indicate why, as researchers with limited time and resources, we have made a different choice regarding where to focus our own energies. 1.6.1 Build AGI from Narrow AI Most of the AI programs around today are “narrow AI” programs — they carry out one particular kind of task intelligently. One could try to make an advanced AGI by combining a bunch of enhanced narrow AI programs inside some kind of overall framework. However, we’re rather skeptical of this approach because none of these narrow AI programs have the ability to generalize across domains — and we don’t see how combining them or ex- tending them is going to cause this to magically emerge. 1.6.2 Enhancing Chatbots One could seek to make an advanced AGI by taking a chatbot, and trying to improve its code to make it actually understand what it’s talking about. We have some direct experience with this route, as in 2010 our AI consulting firm was contracted to improve Ray Kurzweil’s online chatbot "Ramona". Our new Ramona understands a lot more than the previous Ramona version or a typical chatbot, due to using Wikipedia and other online resources, but still it’s far from an AGI. A more ambitious attempt in this direction was Jason Hutchens’ a-i.com project, which sought to create a human child level AGI via development and teaching of a statistical learning based chatbot (rather than the typical rule-based kind). The difficulty with this approach, however, is that the architecture of a chatbot is fundamentally different from the architecture of a generally intelligent mind. Much of what’s important about the human mind is not directly observable in conversations, so if you start from conversation and try to work toward an AGI architecture from there, you’re likely to miss many critical aspects. 1.6.3 Emulating the Brain One can approach AGI by trying to figure out how the brain works, using brain imaging and other tools from neuroscience, and then emulating the brain in hardware or software. One rather substantial problem with this approach is that we don’t really understand how the brain works yet, because our software for measuring the brain is still relatively crude. There is no brain scanning method that combines high spatial and temporal accuracy, and none is HOUSE_OVERSIGHT_012922
1.6 Potential Approaches to AGI 7 likely to come about for a decade or two. So to do brain-emulation AGI seriously, one needs to wait a while until brain scanning technology improves. Current AI methods like neural nets that are loosely based on the brain, are really not brain- like enough to make a serious claim at emulating the brain’s approach to general intelligence. We don’t yet have any real understanding of how the brain represents abstract knowledge, for example, or how it does reasoning (though the authors, like many others, have made some speculations in this regard [GMIH08]). Another problem with this approach is that once you’re done, what you get is something with a very humanlike mind, and we already have enough of those! However, this is perhaps not such a serious objection, because a digital-computer-based version of a human mind could be studied much more thoroughly than a biology-based human mind. We could observe its dynamics in real-time in perfect precision, and could then learn things that would allow us to build other sorts of digital minds. 1.6.4 Evolve an AGI Another approach is to try to run an evolutionary process inside the computer, and wait for advanced AGI to evolve. One problem with this is that we don’t know how evolution works all that well. There’s a field of artificial life, but so far its results have been fairly disappointing. It’s not yet clear how much one can vary on the chemical structures that underly real biology, and still get powerful evolution like we see in real biology. If we need good artificial chemistry to get good artificial biology, then do we need good artificial physics to get good artificial chemistry? Another problem with this approach, of course, is that it might take a really long time. Evolution took billions of years on Earth, using a massive amount of computational power. To make the evolutionary approach to AGI effective, one would need some radical innovations to the evolutionary process (such as, perhaps, using probabilistic methods like BOA [Pel05] or MOSES [Loo06] in place of traditional evolution). 1.6.5 Derwe an AGI design mathematically One can try to use the mathematical theory of intelligence to figure out how to make advanced AGI. This interests us greatly, but there’s a huge gap between the rigorous math of intelligence as it exists today and anything of practical value. As we'll discuss in Chapter 7, most of the rigorous math of intelligence right now is about how to make AI on computers with dramati- cally unrealistic amounts of memory or processing power. When one tries to create a theoretical understanding of real-world general intelligence, one arrives at quite different sorts of consider- ations, as we will roughly outline in Chapter 10. Ideally we would like to be able to study the CogPrime design using a rigorous mathematical theory of real-world general intelligence, but at the moment that’s not realistic. The best we can do is to conceptually analyze CogPrime and its various components in terms of relevant mathematical and theoretical ideas; and perform analysis of CogPrime’s individual structures and components at varying levels of rigor. HOUSE_OVERSIGHT_012923
8 1 Introduction 1.6.6 Use heuristic computer science methods The computer science field contains a number of abstract formalisms, algorithms and structures that have relevance beyond specific narrow AI applications, yet aren’t necessarily understood as thoroughly as would be required to integrate them into the rigorous mathematical theory of intelligence. Based on these formalisms, algorithms and structures, a number of "single formal- ism/algorithm focused" AGI approaches have been outlined, some of which will be reviewed in Chapter 4. For example Pei Wang’s NARS ("Non-Axiomatic Reasoning System”) approach is based on a specific logic which he argues to be the "logic of general intelligence" — so, while his system contains many other aspects than this logic, he considers this logic to be the crux of the system and the source of its potential power as an AGI system. The basic intuition on the part of these "single formalism/algorithm focused" researchers seems to be that there is one key formalism or algorithm underlying intelligence, and if you achieve this key aspect in your AGI program, you're going to get something that fundamentally thinks like a person, even if it has some differences due to its different implementation and embodiment. On the other hand, it’s also possible that this idea is philosophically incorrect: that there is no one key formalism, algorithm, structure or idea underlying general intelligence. The CogPrime approach is based on the intuition that to achieve human-level, roughly human- like general intelligence based on feasible computational resources, one needs an appropriate heterogeneous combination of algorithms and structures, each coping with different types of knowledge and different aspects of the problem of achieving goals in complex environments. 1.6.7 Integrative Cognitive Architecture Finally, to create advanced AGI one can try to build some sort of integrative cognitive architec- ture: a software system with multiple components that each carry out some cognitive function, and that connect together in a specific way to try to yield overall intelligence. Cognitive science gives us some guidance about the overall architecture, and computer science and neuroscience give us a lot of ideas about what to put in the different components. But still this approach is very complex and there is a lot of need for creative invention. This is the approach we consider most “serious” at present (at least until neuroscience ad- vances further). And, as will be discussed in depth in these pages, this is the approach we’ve chosen: CogPrime is an integrative AGI architecture. 1.6.8 Can Digital Computers Really Be Intelligent? All the AGI approaches we’ve just mentioned assume that it’s possible to make AGI on digital computers. While we suspect this is correct, we must note that it isn’t proven. It might be that — as Penrose [Pen96], Hameroff [Mam87] and others have argued — we need quantum computers or quantum gravity computers to make AGI. However, there is no evidence of this at this stage. Of course the brain like all matter is described by quantum mechanics, but this doesn’t imply that the brain is a “macroscopic quantum system” in a strong sense (like, say, a Bose-Einstein condensate). And even if the brain does use quantum phenomena in HOUSE_OVERSIGHT_012924
1.7 Five Key Words 9 a dramatic way to carry out some of its cognitive processes (a hypothesis for which there is no current evidence), this doesn’t imply that these quantum phenomena are necessary in order to carry out the given cognitive processes. For example there is evidence that birds use quantum nonlocal phenomena to carry out navigation based on the Earth’s magnetic fields [GRM~ 11]; yet scientists have built instruments that carry out the same functions without using any special quantum effects. The importance of quantum phenomena in biology (except via their obvious role in giving rise to biological phenomena describable via classical physics) remains a subject of debate [AGBD* 08]. Quantum “magic” aside, it is also conceivable that building AGI is fundamentally impossible for some other reason we don’t understand. Without getting religious about it, it is rationally quite possible that some aspects of the universe are beyond the scope of scientific methods. Science is fundamentally about recognizing patterns in finite sets of bits (e.g. finite sets of finite-precision observations), whereas mathematics recognizes many sets much larger than this. Selmer Bringsjord [BZ03], and other advocates of “hypercomputing” approaches to intelligence, argue that the human mind depends on massively large infinite sets and therefore can never be simulated on digital computers nor understood via finite sets of finite-precision measurements such as science deals with. But again, while this sort of possibility is interesting to speculate about, there’s no real reason to believe it at this time. Brain science and AI are both very young sciences and the “working hypothesis” that digital computers can manifest advanced AGI has hardly been explored at all yet, relative to what will be possible in the next decades as computers get more and more powerful and our understanding of neuroscience and cognitive science gets more and more complete. The CogPrime AGI design presented here is based on this working hypothesis. Many of the ideas in the book are actually independent of the “mind can be implemented digitally” working hypothesis, and could apply to AGI systems built on analog, quantum or other non-digital frameworks — but we will not pursue these possibilities here. For the moment, outlining an AGI design for digital computers is hard enough! Regardless of speculations about quantum computing in the brain, it seems clear that AGI on quantum computers is part of our future and will be a powerful thing; but the description of a CogPrime analogue for quantum computers will be left for a later work. 1.7 Five Key Words As noted, the CogPrime approach lies squarely in the integrative cognitive architecture camp. But it is not a haphazard or opportunistic combination of algorithms and data structures. At bottom it is motivated by the patternist philosophy of mind laid out in Ben Goertzel’s book The Hidden Pattern [Goe06al, which was in large part a summary and reformulation of ideas presented in a series of books published earlier by the same author [Goe94], [Goe93a], [Goe93b], [Goe97], [Goe01]. A few of the core ideas of this philosophy are laid out in Chapter 3, though that chapter is by no means a thorough summary. One way to summarize some of the most important yet commonsensical parts of the patternist philosophy of mind, in an AGI context, is to list five words: perception, memory, prediction, action, goals. In a phrase: “A mind uses perception and memory to make predictions about which actions will help it achieve its goals.” HOUSE_OVERSIGHT_012925
10 1 Introduction This ties in with the ideas of many other thinkers, including Jeff Hawkins’ “memory/ predic- tion” theory [B06], and it also speaks directly to the formal characterization of intelligence presented in Chapter 7: general intelligence as “the ability to achieve complex goals in complex environments.” Naturally the goals involved in the above phrase may be explicit or implicit to the intelligent agent, and they may shift over time as the agent develops. Perception is taken to mean pattern recognition: the recognition of (novel or familiar) pat- terns in the environment or in the system itself. Memory is the storage of already-recognized patterns, enabling recollection or regeneration of these patterns as needed. Action is the for- mation of patterns in the body and world. Prediction is the utilization of temporal patterns to guess what perceptions will be seen in the future, and what actions will achieve what effects in the future — in essence, prediction consists of temporal pattern recognition, plus the (implicit or explicit) assumption that the universe possesses a "habitual tendency" according to which previously observed patterns continue to apply. 1.7.1 Memory and Cognition in CogPrime Each of these five concepts has a lot of depth to it, and we won’t say too much about them in this brief introductory overview; but we will take a little time to say something about memory in particular. As we'll see in Chapter 7, one of the things that the mathematical theory of general intelli- gence makes clear is that, if you assume your AI system has a huge amount of computational resources, then creating general intelligence is not a big trick. Given enough computing power, a very brief and simple program can achieve any computable goal in any computable environ- ment, quite effectively. Marcus Hutter’s AT XI" design [Hut05] gives one way of doing this, backed up by rigorous mathematics. Put informally, what this means is: the problem of AGI is really a problem of coping with inadequate compute resources, just as the problem of natural intelligence is really a problem of coping with inadequate energetic resources. One of the key ideas underlying CogPrime is a principle called cognitive synergy, which explains how real-world minds achieve general intelligence using limited resources, by appropri- ately organizing and utilizing their memories. This principle says that there are many different kinds of memory in the mind: sensory, episodic, procedural, declarative, attentional, intentional. Each of them has certain learning processes associated with it; for example, reasoning is associated with declarative memory. Synergy arises here in the way the learning processes associated with each kind of memory have got to help each other out when they get stuck, rather than working at cross-purposes. Cognitive synergy is a fundamental principle of general intelligence — it doesn’t tend to play a central role when you’re building narrow-AI systems. In the CogPrime approach all the different kinds of memory are linked together in a single meta-representation, a sort of combined semantic/neural network called the AtomSpace. It represents everything from perceptions and actions to abstract relationships and concepts and even a system’s model of itself and others. When specialized representations are used for other types of knowledge (e.g. program trees for procedural knowledge, spatiotemporal hierarchies for perceptual knowledge) then the knowledge stored outside the AtomSpace is represented via HOUSE_OVERSIGHT_012926
1.8 Virtually and Robotically Embodied AI 11 tokens (Atoms) in the AtomSpace, allowing it to be located by various cognitive processes, and associated with other memory items of any type. So for instance an OpenCog AI system has an AtomSpace, plus some specialized knowledge stores linked into the AtomSpace; and it also has specific algorithms acting on the AtomSpace and appropriate specialized stores corresponding to each type of memory. Each of these algo- rithms is complex and has its own story; for instance (an incomplete list, for more detail see the following section of this Introduction): e Declarative knowledge is handled using Probabilistic Logic Networks, described in Chapter 34 and others; e Procedural knowledge is handled using MOSES, a probabilistic evolutionary learning algo- rithm described in Chapter 21 and others; e Attentional knowledge is handled by ECAN (economic attention allocation), described in Chapter 23 and others; e OpenCog contains a language comprehension system called RelEx that takes English sen- tences and turns them into nodes and links in the AtomSpace. It’s currently being ex- tended to handle Chinese. RelEx handles mostly declarative knowledge but also involves some procedural knowledge for linguistic phenomena like reference resolution and semantic disambiguation. But the crux of the CogPrime cognitive architecture is not any particular cognitive process, but rather the way they all work together using cognitive synergy. 1.8 Virtually and Robotically Embodied AI Another issue that will arise frequently in these pages is embodiment. There’s a lot of debate in the AI community over whether embodiment is necessary for advanced AGI or not. Personally, we doubt it’s necessary but we think it’s extremely convenient, and are thus considerably interested in both virtual world and robotic embodiment. The CogPrime architecture itself is neutral on the issue of embodiment, and it could be used to build a mathematical theorem prover or an intelligent chat bot just as easily as an embodied AGI system. However, most of our attention has gone into figuring out how to use CogPrime to control embodied agents in virtual worlds, or else (to a lesser extent) physical robots. For instance, during 2011-2012 we are involved in a Hong Kong government funded project using OpenCog to control video game agents in a simple game world modeled on the game Minecraft [GPC™ 11]. Current virtual world technology has significant limitations that make them far less than ideal from an AGI perspective, and in Chapter 16 we will discuss how they can be remedied. However, for the medium-term future virtual worlds are not going to match the natural world in terms of richness and complexity — and so there’s also something to be said for physical robots that interact with all the messiness of the real world. With this in mind, in the Artificial Brain Lab at Xiamen University in 2009-2010, we con- ducted some experiments using OpenCog to control the Nao humanoid robot [GD09]. The goal of that work was to take the same code that controls the virtual dog and use it to control the physical robot. But it’s harder because in this context we need to do real vision processing and real motor control. A similar project is being undertaken in Hong Kong at time of writ- ing, involving a collaboration between OpenCog AI developers and David Hanson’s robotics HOUSE_OVERSIGHT_012927
12 1 Introduction group. One of the key ideas involved in this project is explicit integration of subsymbolic and more symbolic subsystems. For instance, one can use a purely subsymbolic, hierarchical pattern recognition network for vision processing, and then link its internal structures into the nodes and links in the AtomSpace that represent concepts. So the subsymbolic and symbolic systems can work harmoniously and productively together, a notion we will review in more detail in Chapter 26. 1.9 Language Learning One of the subtler aspects of our current approach to teaching CogPrime is language learning. Three relatively crisp and simple approaches to language learning would be: e Build a language processing system using hand-coded grammatical rules, based on linguistic theory; e Train a language processing system using supervised, unsupervised or semisupervised learn- ing, based on computational linguistics; e Have an AI system learn language via experience, based on imitation and reinforcement and experimentation, without any built-in distinction between linguistic behaviors and other behaviors. While the third approach is conceptually appealing, our current approach in CogPrime (de- scribed in a series of chapters in Part 2) is none of the above, but rather a combination of the above. OpenCog contains a natural language processing system built using a combination of the rule-based and statistical approaches, which has reasonably adequate functionality; and our plan is to use it as an initial condition for ongoing adaptive improvement based on embodied communicative experience. 1.10 AGI Ethics When discussing AGI work with the general public, ethical concerns often arise. Science fic- tion films like the Terminator series have raised public awareness of the possible dangers of advanced AGI systems without correspondingly advanced ethics. Non-profit organizations like the Singularity Institute for AI ( (http://singinst.org) have arisen specifically to raise attention about, and foster research on, these potential dangers. Our main focus here is on how to create AGI, not how to teach an AGI human ethical principles. However, we will address the latter issue explicitly in Chapter 12, and we do think it’s important to emphasize that AGI ethics has been at the center of the design process throughout the conception and development of CogPrime and OpenCog. Broadly speaking there are (at least) two major threats related to advanced AGI. One is that people might use AGIs for bad ends; and the other is that, even if an AGI is made with the best intentions, it might reprogram itself in a way that causes it to do something terrible. If it’s smarter than us, we might be watching it carefully while it does this, and have no idea what’s going on. HOUSE_OVERSIGHT_012928
1.12 Key Claims of the Book 13 The best way to deal with this second “bad AGI” problem is to build ethics into your AGI architecture — and we have done this with CogPrime, via creating a goal structure that explicitly supports ethics-directed behavior, and via creating an overall architecture that supports “ethical synergy” along with cognitive synergy. In short, the notion of ethical synergy is that there are different kinds of ethical thinking associated with the different kinds of memory and you want to be sure your AGI has all of them, and that it uses them together effectively. In order to create AGI that is not only intelligent but beneficial to other sentient beings, ethics has got to be part of the design and the roadmap. As we teach our AGI systems, we need to lead them through a series of instructional and evaluative tasks that move from a primitive level to the mature human level — in intelligence, but also in ethical judgment. 1.11 Structure of the Book The book is divided into two parts. The technical particulars of CogPrime are discussed in Part 2; what we deal with in Part 1 are important preliminary and related matters such as: e The nature of real-world general intelligence, both conceptually and from the perspective of formal modeling (Section I). e The nature of cognitive and ethical development for humans and AGIs (Section IIT). e The high-level properties of CogPrime, including the overall architecture and the various sorts of memory involved (Section IV). e What kind of path may viably lead us from here to AGI, with focus laid on preschooL-type environments that easily foster humanlike cognitive development. Various advanced aspects of AGI systems, such as the network and algebraic structures that may emerge from them, the ways in which they may selfmodify, and the degree to which their initial design may constrain or guide their future state even after long periods of radical self-improvement (Section V). One point made repeatedly throughout Part 1, which is worth emphasizing here, is the current lack of a really rigorous and thorough general technical theory of general intelligence. Such a theory, if complete, would be incredibly helpful for understanding complex AGI architectures like CogPrime. Lacking such a theory, we must work on CogPrime and other such systems using a combination of theory, experiment and intuition. This is not a bad thing, but it will be very helpful if the theory and practice of AGI are able to grow collaboratively together. 1.12 Key Claims of the Book We will wrap up this Introduction with a systematic list of some of the key claims to be argued for in these pages. Not all the terms and ideas in these claims have been mentioned in the preceding portions of this Introduction, but we hope they will be reasonably clear to the reader anyway, at least in a general sense. This list of claims will be revisited in Chapter 49 near the end of Part 2, where we will look back at the ideas and arguments that have been put forth in favor of them, in the intervening chapters. HOUSE_OVERSIGHT_012929
14 1 Introduction In essence this is a list of claims such that, if the reader accepts these claims, they should probably accept that the CogPrime approach to AGI is a viable one. On the other hand if the reader rejects one or more of these claims, they may find one or more aspects of CogPrime unacceptable for some reason. Without further ado, now, the claims: 1. General intelligence (at the human level and ultimately beyond) can be achieved via creating a computational system that seeks to achieve its goals, via using perception and memory to predict which actions will achieve its goals in the contexts in which it finds itself. 2. To achieve general intelligence in the context of human-intelligence-friendly environments and goals using feasible computational resources, it’s important that an AGI system can handle different kinds of memory (declarative, procedural, episodic, sensory, intentional, attentional) in customized but interoperable ways. 3. Cognitive synergy: It’s important that the cognitive processes associated with different kinds of memory can appeal to each other for assistance in overcoming bottlenecks in a manner that enables each cognitive process to act in a manner that is sensitive to the particularities of each others’ internal representations, and that doesn’t impose unreasonable delays on the overall cognitive dynamics. 4, As a general principle, neither purely localized nor purely global memory is sufficient for general intelligence under feasible computational resources; “glocal” memory will be re- quired. 5. To achieve human-like general intelligence, it’s important for an intelligent agent to have sensory data and motoric affordances that roughly emulate those available to humans. We don’t know exactly how close this emulation needs to be, which means that our AGI systems and platforms need to support fairly flexible experimentation with virtual-world and/or robotic infrastructures. 6. To work toward adult human-level, roughly human-like general intelligence, one fairly easily comprehensible path is to use environments and goals reminiscent of human childhood, and seek to advance one’s AGI system along a path roughly comparable to that followed by human children. 7. It is most effective to teach an AGI system aimed at roughly human-like general intelli- gence via a mix of spontaneous learning and explicit instruction, and to instruct it via a combination of imitation, reinforcement and correction, and a combination of linguistic and nonlinguistic instruction. 8. One effective approach to teaching an AGI system human language is to supply it with some in-built linguistic facility, in the form of rule-based and statistical-linguistics-based NLP systems, and then allow it to improve and revise this facility based on experience. 9. An AGI system with adequate mechanisms for handling the key types of knowledge men- tioned above, and the capability to explicitly recognize large-scale patterns in itself, should, upon sustained interaction with an appropriate environment in pursuit of ap- propriate goals, emerge a variety of complex structures in its internal knowledge network, including, but not limited to: e a hierarchical network, representing both a spatiotemporal hierarchy and an approxi- mate “default inheritance” hierarchy, cross-linked e a heterarchical network of associativity, roughly aligned with the hierarchical network e aself network which is an approximate micro image of the whole network HOUSE_OVERSIGHT_012930
1.12 Key Claims of the Book 15 e inter-reflecting networks modeling self and others, reflecting a “mirrorhouse” design pattern 10. Given the strengths and weaknesses of current and near-future digital computers, a. A (loosely) neural-symbolic network is a good representation for directly storing many kinds of memory, and interfacing between those that it doesn’t store directly; b. Uncertain logic is a good way to handle declarative knowledge. To deal with the prob- lems facing a human-level AGI, an uncertain logic must integrate imprecise probability and fuzziness with a broad scope of logical constructs. PLN is one good realization. c. Programs are a good way to represent procedures (both cognitive and physical-action, but perhaps not including low-level motor-control procedures). d. Evolutionary program learning is a good way to handle difficult program learning prob- lems. Probabilistic learning on normalized programs is one effective approach to evolu- tionary program learning. MOSES is one good realization of this approach. e. Multistart hill-climbing, with a strong Occam prior, is a good way to handle relatively straightforward program learning problems. f. Activation spreading and Hebbian learning comprise a reasonable way to handle atten- tional knowledge (though other approaches, with greater overhead cost, may provide better accuracy and may be appropriate in some situations). e Artificial economics is an effective approach to activation spreading and Hebbian learning in the context of neural-symbolic networks; e ECAN is one good realization of artificial economics; e A good trade-off between comprehensiveness and efficiency is to focus on two kinds of attention: processor attention (represented in CogPrime by ShortTermImpor- tance) and memory attention (represented in CogPrime by LongTermImportance). g. Simulation is a good way to handle episodic knowledge (remembered and imagined). Running an internal world simulation engine is an effective way to handle simulation. h. Hybridization of one’s integrative neural-symbolic system with a spatiotemporally hier- archical deep learning system is an effective way to handle representation and learning of low-level sensorimotor knowledge. DeSTIN is one example of a deep learning system of this nature that can be effective in this context. i. One effective way to handle goals is to represent them declaratively, and allocate atten- tion among them economically. CogPrime’s PLN/ECAN based framework for handling intentional knowledge is one good realization. 11. It is important for an intelligent system to have some way of recognizing large-scale pat- terns in itself, and then embodying these patterns as new, localized knowledge items in its memory. Given the use of a neural-symbolic network for knowledge representation, a graph-mining based “map formation” heuristic is one good way to do this. 12. Occam’s Razor: Intelligence is closely tied to the creation of procedures that achieve goals in environments in the simplest possible way. Each of an AGI system’s cognitive algorithms should embody a simplicity bias in some explicit or implicit form. 13. An AGI system, if supplied with a commonsensically ethical goal system and an intentional component based on rigorous uncertain inference, should be able to reliably achieve a much higher level of commonsensically ethical behavior than any human being. 14. Once sufficiently advanced, an AGI system with a logic-based declarative knowledge ap- proach and a program-learning-based procedural knowledge approach should be able to HOUSE_OVERSIGHT_012931
16 1 Introduction radically self-improve via a variety of methods, including supercompilation and automated theorem-proving. HOUSE_OVERSIGHT_012932
Section I Artificial and Natural General Intelligence HOUSE_OVERSIGHT_012933
HOUSE_OVERSIGHT_012934
Chapter 2 What Is Human-Like General Intelligence? 2.1 Introduction CogPrime, the AGI architecture on which the bulk of this book focuses, is aimed at the creation of artificial general intelligence that is vaguely human-like in nature, and possesses capabilities at the human level and ultimately beyond. Obviously this description begs some foundational questions, such as, for starters: What is "general intelligence"? What is "human-like general intelligence"? What is "intelligence" at all? Perhaps in the future there will exist a rigorous theory of general intelligence which applies usefully to real-world biological and digital intelligences. In later chapters we will give some ideas in this direction. But such a theory is currently nascent at best. So, given the present state of science, these two questions about intelligence must be handled via a combination of formal and informal methods. This brief, informal chapter attempts to explain our view on the nature of intelligence in sufficient detail to place the discussion of CogPrime in appropriate context, without trying to resolve all the subtleties. Psychologists sometimes define human general intelligence using IQ tests and related instru- ments — so one might wonder: why not just go with that? But these sorts of intelligence testing approaches have difficulty even extending to humans from diverse cultures [HHPO12] [Fis01]. So it’s clear that to ground AGI approaches that are not based on precise modeling of human cognition, one requires a more fundamental understanding of the nature of general intelligence. On the other hand, if one conceives intelligence too broadly and mathematically, there’s a risk of leaving the real human world too far behind. In this chapter (followed up in Chapters 9 and 7 with more rigor), we present a highly abstract understanding of intelligence-in-general, and then portray human-like general intelligence as a (particularly relevant) special case. 2.1.1 What Is General Intelligence? Many attempts to characterize general intelligence have been made; Legg and Hutter [LI107a] review over 70! Our preferred abstract characterization of intelligence is: the capability of a system to choose actions maximizing its goal-achievement, based on its perceptions and memories, and making reasonably efficient use of its computational resources 19 HOUSE_OVERSIGHT_012935
20 2 What Is Human-Like General Intelligence? [Goel0c]. A general intelligence is then understood as one that can do this for a variety of complex goals in a variety of complex environments. However, apart from positing definitions, it is difficult to say anything nontrivial about gen- eral intelligence in general. Marcus Hutter [Hut05] has demonstrated, using a characterization of general intelligence similar to the one above, that a very simple algorithm called AIXI” can demonstrate arbitrarily high levels of general intelligence, if given sufficiently immense com- putational resources. This is interesting because it shows that (if we assume the universe can effectively be modeled as a computational system) general intelligence is basically a problem of computational efficiency. The particular structures and dynamics that characterize real-world general intelligences like humans arise because of the need to achieve reasonable levels of intel- ligence using modest space and time resources. The “patternist” theory of mind presented in [GoeQ6a] and briefly summarized in Chap- ter 3 below presents a number of emergent structures and dynamics that are hypothesized to characterize pragmatic general intelligence, including such things as system-wide hierarchical and heterarchical knowledge networks, and a dynamic and self-maintaining selfmodel. Much of the thinking underlying CogPrime has centered on how to make multiple learning components combine to give rise to these emergent structures and dynamics. 2.1.2 What Is Human-like General Intelligence? General principles like “complex goals in complex environments” and patternism are not suf- ficient to specify the nature of human-like general intelligence. Due to the harsh reality of computational resource restrictions, real-world general intelligences are necessarily biased to particular classes of environments. Human intelligence is biased toward the physical, social and linguistic environments in which humanity evolved, and if AI systems are to possess humanlike general intelligence they must to some extent share these biases. But what are these biases, specifically? This is a large and complex question, which we seek to answer in a theoretically grounded way in Chapter 9. However, before turning to abstract theory, one may also approach the question in a pragmatic way, by looking at the categories of things that humans do to manifest their particular variety of general intelligence. This is the task of the following section. 2.2 Commonly Recognized Aspects of Human-like Intelligence It would be nice if we could give some sort of “standard model of human intelligence” in this chapter, to set the context for our approach to artificial general intelligence — but the truth is that there isn’t any. What the cognitive science field has produced so far is better described as: a broad set of principles and platitudes, plus a long, loosely-organized list of ideas and results. Chapter 5 below constitutes an attempt to present an integrative architecture diagram for human-like general intelligence, synthesizing the ideas of a number of different AGI and cognitive theorists. However, though the diagram given there attempts to be inclusive, it nonetheless contains many features that are accepted by only a plurality of the research community. HOUSE_OVERSIGHT_012936
2.2 Commonly Recognized Aspects of Human-like Intelligence 21 The following list of key aspects of human-like intelligence has a better claim at truly being generic and representing the consensus understanding of contemporary science. It was produced by a very simple method: starting with the Wikipedia page for cognitive psychology, and then adding a few items onto it based on scrutinizing the tables of contents of some top-ranked cognitive psychology textbooks. There is some redundancy among list items, and perhaps also some minor omissions (depending on how broadly one construes some of the items), but the point is to give a broad indication of human mental functions as standardly identified in the psychology field: e Perception — General perception — Psychophysics — Pattern recognition (the ability to correctly interpret ambiguous sensory information) — Object and event recognition — Time sensation (awareness and estimation of the passage of time) e Motor Control — Motor planning — Motor execution — Sensorimotor integration e Categorization — Category induction and acquisition — Categorical judgement and classification — Category representation and structure — Similarity e Memory — Aging and memory — Autobiographical memory — Constructive memory — Emotion and memory — False memories — Memory biases — Long-term memory — Episodic memory — Semantic memory — Procedural memory — Short-term memory — Sensory memory — Working memory e Knowledge representation — Mental imagery — Propositional encoding — Imagery versus propositions as representational mechanisms HOUSE_OVERSIGHT_012937
22 2 What Is Human-Like General Intelligence? — Dualcoding theories — Mental models e Language — Grammar and linguistics — Phonetics and phonology — Language acquisition e Thinking — Choice — Concept formation — Judgment and decision making — Logic, formal and natural reasoning — Problem solving — Planning — Numerical cognition — Creativity e Consciousness — Attention and Filtering (the ability to focus mental effort on specific stimuli whilst excluding other stimuli from consideration) — Access consciousness — Phenomenal consciousness e Social Intelligence — Distributed Cognition — Empathy If there’s nothing surprising to you in the above list, I’m not surprised! If you’ve read a bit in the modern cognitive science literature, the list may even seem trivial. But it’s worth reflecting that 50 years ago, no such list could have been produced with the same level of broad acceptance. And less than 100 years ago, the Western world’s scientific understanding of the mind was dominated by Freudian thinking; and not too long after that, by behaviorist thinking, which argued that theorizing about what went on inside the mind made no sense, and science should focus entirely on analyzing external behavior. The progress of cognitive science hasn't made as many headlines as contemporaneous progress in neuroscience or computing hardware and software, but it’s certainly been dramatic. One of the reasons that AGI is more achievable now than in the 1950s and 60s when the AI field began, is that now we understand the structures and processes characterizing human thinking a lot better. In spite of all the theoretical and empirical progress in the cognitive science field, however, there is still no consensus among experts on how the various aspects of intelligence in the above “human intelligence feature list” are achieved and interrelated. In these pages, however, for the purpose of motivating CogPrime, we assume a broad integrative understanding roughly as follows: e Perception: There is significant evidence that human visual perception occurs using a spatiotemporal hierarchy of pattern recognition modules, in which higher-level modules HOUSE_OVERSIGHT_012938
2.2 Commonly Recognized Aspects of Human-like Intelligence 23 deal with broader spacetime regions, roughly as in the DeSTIN AGI architecture discussed in Chapter 4. Further, there is evidence that each module carries out temporal predictive pattern recognition as well as static pattern recognition. Audition likely utilizes a similar hierarchy. Olfaction may use something more like a Hopfield attractor neural network, as described in Chapter 13. The networks corresponding to different sense modalities have multiple cross-linkages, more at the upper levels than the lower, and also link richly into the parts of the mind dealing with other functions. e Motor Control: This appears to be handled by a spatiotemporal hierarchy as well, in which each level of the hierarchy corresponds to higher-level (in space and time) movements. The hierarchy is very tightly linked in with the perceptual hierarchies, allowing sensorimotor learning and coordination. e Memory: There appear to be multiple distinct but tightly cross-linked memory systems, corresponding to different sorts of knowledge such as declarative (facts and beliefs), proce- dural, episodic, sensorimotor, attentional and intentional (goals). e Knowledge Representation: There appear to be multiple base-level representational systems; at least one corresponding to each memory system, but perhaps more than that. Additionally there must be the capability to dynamically create new context-specific repre- sentational systems founded on the base representational system. e Language: While there is surely some innate biasing in the human mind toward learning certain types of linguistic structure, it’s also notable that language shares a great deal of structure with other aspects of intelligence like social roles [CB00] and the physical world [Cas07]. Language appears to be learned based on biases toward learning certain types of relational role systems; and language processing seems a complex mix of generic reason- ing and pattern recognition processes with specialized acoustic and syntactic processing routines. e Consciousness is pragmatically well-understood using Baars’ “global workspace” theory, in which a small subset of the mind’s content is summoned at each time into a “working memory” aka “workspace” aka “attentional focus” where it is heavily processed and used to guide action selection. e Thinking is a diverse combination of processes encompassing things like categorization, (crisp and uncertain) reasoning, concept creation, pattern recognition, and others; these processes must work well with all the different types of memory and must effectively inte- grate knowledge in the global workspace with knowledge in long-term memory. e Social Intelligence seems closely tied with language and also with self-modeling; we model ourselves in large part using the same specialized biases we use to help us model others. a Oe None of the points in the above bullet list is particularly controversial, but neither are any of them universally agreed-upon by experts. However, in order to make any progress on AGI design one must make some commitments to particular cognition-theoretic understandings, at this level and ultimately at more precise levels as well. Further, general philosophical analyses like the patternist philosophy to be reviewed in the following chapter only provide limited guidance here. Patternism provides a filter for theories about specific cognitive functions — it rules out assemblages of cognitive-function-specific theories that don’t fit together to yield a mind that could act effectively as a pattern-recognizing, goalachieving system with the right internal emergent structures. But it’s not a precise enough filter to serve as a sole guide for cognitive theory even at the high level. The above list of points leads naturally into the integrative architecture diagram presented in Chapter 5. But that generic architecture diagram is fairly involved, and before presenting HOUSE_OVERSIGHT_012939
24 2 What Is Human-Like General Intelligence? it, we will go through some more background regarding human-like intelligence (in the rest of this chapter), philosophy of mind (in Chapter 3) and contemporary AGI architectures (in Chapter4). 2.3 Further Characterizations of Humanlike Intelligence We now present a few complementary approaches to characterizing the key aspects of human- like intelligence, drawn from different perspectives in the psychology and AI literature. These different approaches all overlap substantially, which is good, yet each gives a slightly different slant. 2.3.1 Competencies Characterizing Human-like Intelligence First we give a list of key competencies characterizing human level intelligence resulting from the the AGI Roadmap Workshop held at the University of Knoxville in October 2008 ', which was organized by Ben Goertzel and Itamar Arel. In this list, each broad competency area is listed together with a number of specific competencies sub-areas within its scope: Perception: vision, hearing, touch, proprioception, crossmodal Actuation: physical skills, navigation, tool use Memory: episodic, declarative, behavioral Learning: imitation, reinforcement, interactive verbal instruction, written media, experi- mentation Reasoning: deductive, abductive, inductive, causal, physical, associational, categorization Planning: strategic, tactical, physical, social Attention: visual, social, behavioral Motivation: subgoal creation, affect-based motivation, control of emotions Emotion: expressing emotion, understanding emotion Self: self-awareness, self-control, other-awareness Social: empathy, appropriate social behavior, social communication, social inference, group play, theory of mind 12. Communication: gestural, pictorial, verbal, language acquisition, cross-modal 13. Quantitative: counting, grounded arithmetic, comparison, measurement 14. Building/Creation: concept formation, verbal invention, physical construction, social group formation BwN FHSS ONS oH —_ao Clearly this list is getting at the same things as the textbook headings given in Section 2.2, but with a different emphasis due to its origin among AGI researchers rather than cognitive 1 See http: //www.ece.utk.edu/~itamar/AGI_Roadmap. html; participants included: Sam Adams, IBM Research; Ben Goertzel, Novamente LLC; Itamar Arel, University of Tennessee; Joscha Bach, Institute of Cogni- tive Science, University of Osnabruck, Germany; Robert Coop, University of Tennessee; Rod Furlan, Singularity Institute; Matthias Scheutz, Indiana University; J. Storrs Hall, Foresight Institute; Alexei Samsonovich, George Mason University; Matt Schlesinger, Southern Illinois University; John Sowa, Vivomind Intelligence, Inc.; Stuart C. Shapiro, University at Buffalo HOUSE_OVERSIGHT_012940
2.3 Further Characterizations of Humanlike Intelligence 25 psychologists. As part of the AGI Roadmap project, specific tasks were created corresponding to each of the sub-areas in the above list; we will describe some of these tasks in Chapter 17. 2.3.2 Gardner’s Theory of Multiple Intelligences The diverse list of human-level “competencies” given above is reminiscent of Gardner’s [Gar99| multiple intelligences (MI) framework — a psychological approach to intelligence assessment based on the idea that different people have mental strengths in different high-level domains, so that intelligence tests should contain aspects that focus on each of these domains separately. MI does not contradict the “complex goals in complex environments” view of intelligence, but rather may be interpreted as making specific commitments regarding which complex tasks and which complex environments are most important for roughly human-like intelligence. MI does not seek an extreme generality, in the sense that it explicitly focuses on domains in which humans have strong innate capability as well as general-intelligence capability; there could easily be non-human intelligences that would exceed humans according to both the com- monsense human notion of “general intelligence” and the generic “complex goals in complex environments” or Hutter /Legg-style definitions, yet would not equal humans on the MI crite- ria. This strong anthropocentrism of MI is not a problem from an AGI perspective so long as one uses MI in an appropriate way, i.e. only for assessing the extent to which an AGI system displays specifically human-like general intelligence. This restrictiveness is the price one pays for having an easily articulable and relatively easily implementable evaluation framework. Table ?? summarizes the types of intelligence included in Gardner’s MI theory. Intelligence Type Aspects Linguistic Words and language, written and spoken; retention, inter- pretation and explanation of ideas and information via lan- guage; understands relationship between communication and meaning Logical-Mathematical Logical thinking, detecting patterns, scientific reasoning and deduction; analyse problems, perform mathematical calculations, understands relationship between cause and effect towards a tangible outcome Musical Musical ability, awareness, appreciation and use of sound; recognition of tonal and rhythmic patterns, understands relationship between sound and feeling Bodily-Kinesthetic Body movement control, manual dexterity, physical agility and balance; eye and body coordination Spatial-Visual Visual and spatial perception; interpretation and creation of images; pictorial imagination and expression; under- stands relationship between images and meanings, and be- tween space and effect Interpersonal Perception of other people’s feelings; relates to others; inter- pretation of behaviour and communications; understands relationships between people and their situations Table 2.1: Types of Intelligence in Gardner’s Multiple Intelligence Theory HOUSE_OVERSIGHT_012941
26 2 What Is Human-Like General Intelligence? 2.3.3 Newell’s Criteria for a Human Cognitive Architecture Finally, another related perspective is given by Alan Newell’s “functional criteria for a human cognitive architecture” [New90], which require that a humanlike AGI system should: 1. Behave as an (almost) arbitrary function of the environment 2. Operate in real time 3. Exhibit rational, i.e., effective adaptive behavior 4. Use vast amounts of knowledge about the environment 5. Behave robustly in the face of error, the unexpected, and the unknown 6. Integrate diverse knowledge 7. Use (natural) language 8. Exhibit self-awareness and a sense of self 9. Learn from its environment 10. Acquire capabilities through development 11. Arise through evolution 12. Be realizable within the brain In our view, Newell’s criterion 1 is poorly-formulated, for while universal Turing computing power is easy to come by, any finite AI system must inevitably be heavily adapted to some particular class of environments for straightforward mathematical reasons [Hut05, GPI* 10]. On the other hand, his criteria 11 and 12 are not relevant to the CogPrime approach as we are not doing biological modeling but rather AGI engineering. However, Newell’s criteria 2-10 are essential in our view, and all will be covered in the following chapters. 2.3.4 intelligence and Creativity Creativity is a key aspect of intelligence. While sometimes associated especially with genius- level intelligence in science or the arts, actually creativity is pervasive throughout intelligence, at all levels. When a child makes a flying toy car by pasting paper bird wings on his toy car, and when a bird figures out how to use a curved stick to get a piece of food out of a difficult corner — this is creativity, just as much as the invention of a new physics theory or the design of a new fashion line. The very nature of intelligence — achieving complex goals in complex environments — requires creativity for its achievement, because the nature of complex environments and goals is that they are always unveiling new aspects, so that dealing with them involves inventing things beyond what worked for previously known aspects. CogPrime contains a number of cognitive dynamics that are especially effective at creating new ideas, such as: concept creation (which synthesizes new concepts via combining aspects of previous ones), probabilistic evolutionary learning (which simulates evolution by natural selection, creating new procedures via mutation, combination and probabilistic modeling based on previous ones), and analogical inference (an aspect of the Probabilistic Logic Networks subsystems). But ultimately creativity is about how a system combines all the processes at its disposal to synthesize novel solutions to the problems posed by its goals in its environment. There are times, of course, when the same goal can be achieved in multiple ways — some more creative than others. In CogPrime this relates to the existence of multiple top-level goals, one of which may be novelty. A system with novelty as one of its goals, alongside other more HOUSE_OVERSIGHT_012942
2.4 Preschool as a View into Human-like General Intelligence 27 specific goals, will have a tendency to solve other problems in creative ways, thus fulfilling its novelty goal along with its other goals. This can be seen at the level of childlike behaviors, and also at a much more advanced level. Salvador Dali wanted to depict his thoughts and feelings, but he also wanted to do so in a striking and unusual way; this combination of aspirations spurred him to produce his amazing art. A child who is asked to draw a house, but has a goal of novelty, may draw a tower with a swimming pool on the roof rather than a typical Colonial structure. A physical motivated by novelty will seek a non-obvious solution to the equation at hand, rather than just applying tried and true methods, and perhaps discover some new phenomenon. Novelty can be measured formally in terms of information-theoretic surprisingness based upon a given basis of knowledge and experience [Sch06]; something that is novel and creative to a child may be familiar to the adult world, and a solution that seems novel and creative to a brilliant scientist today, may seem like cliche’ elementary school level work 100 years from now. Measuring creativity is even more difficult and subjective than measuring intelligence. Qual- itatively, however, we humans can recognize it; and we suspect that the qualitative emergence of dramatic, multidisciplinary computational creativity will be one of the things that makes the human population feel emotionally that advanced AGI has finally arrived. 2.4 Preschool as a View into Human-like General Intelligence One issue that arises when pursuing the grand goal of human-level general intelligence is how to measure partial progress. The classic Turing Test of imitating human conversation remains too difficult to usefully motivate immediate-term AI research (see [HF 95] [Fre90] for arguments that it has been counterproductive for the AI field). The same holds true for comparable alter- natives like the Robot College Test of creating a robot that can attend a semester of university and obtain passing grades. However, some researchers have suggested intermediary goals, that constitute partial progress toward the grand goal and yet are qualitatively different from the highly specialized problems to which most current AI systems are applied. In this vein, Sam Adams and his team at IBM have outlined a so-called “Toddler Turing Test,” in which one seeks to use AI to control a robot qualitatively displaying similar cognitive behaviors to a young human child (say, a 3 year old) [AABL02]. In fact this sort of idea has a long and venerable history in the AI field — Alan Turing’s original 1950 paper on AT [Tur50], where he proposed the Turing Test, contains the suggestion that "Instead of trying to produce a programme to simulate the adult mind, why not rather try to produce one which simulates the child’s?" We find this childlike cognition based approach promising for many reasons, including its in- tegrative nature: what a young child does involves a combination of perception, actuation, lin- guistic and pictorial communication, social interaction, conceptual problem solving and creative imagination. Specifically, inspired by these ideas, in Chapter 16 we will suggest the approach of teaching and testing early-stage AGI systems in environments that emulate the preschools used for teaching human children. Human intelligence evolved in response to the demands of richly interactive environments, and a preschool is specifically designed to be a richly interactive environment with the capability to stimulate diverse mental growth. So, we are currently exploring the use of CogPrime to control HOUSE_OVERSIGHT_012943
28 2 What Is Human-Like General Intelligence? virtual agents in preschool-like virtual world environments, as well as commercial humanoid robot platforms such as the Nao (see Figure 2.1) or Robokind (2.2) in physical preschoollike robot labs. Another advantage of focusing on childlike cognition is that child psychologists have created a variety of instruments for measuring child intelligence. In Chapter 17, we will discuss an approach to evaluating the general intelligence of human childlike AGI systems via combining tests typically used to measure the intelligence of young human children, with additional tests crafted based on cognitive science and the standard preschool curriculum. To put it differently: While our long-term goal is the creation of genius machines with general intelligence at the human level and beyond, we believe that every young child has a certain genius; and by beginning with this childlike genius, we can built a platform capable of developing into a genius machine with far more dramatic capabilities. 2.4.1 Design for an AGI Preschool More precisely, we don’t suggest to place a CogPrime system in an environment that is an exact imitation of a human preschool — this would be inappropriate since current robotic or virtual bodies are very differently abled than the body of a young human child. But we aim to place CogPrime in an environment emulating the basic diversity and educational character of a typical human preschool. We stress this now, at this early point in the book, because we will use running examples throughout the book drawn from the preschool context. The key notion in modern preschool design is the “learning center,” an area designed and outfitted with appropriate materials for teaching a specific skill. Learning centers are designed to encourage learning by doing, which greatly facilitates learning processes based on reinforcement, imitation and correction; and also to provide multiple techniques for teaching the same skills, to accommodate different learning styles and prevent overfitting and overspecialization in the learning of new skills. Centers are also designed to cross-develop related skills. A “manipulatives center,” for ex- ample, provides physical objects such as drawing implements, toys and puzzles, to facilitate development of motor manipulation, visual discrimination, and (through sequencing and clas- sification games) basic logical reasoning. A “dramatics center” cross-trains interpersonal and empathetic skills along with bodily-kinesthetic, linguistic, and musical skills. Other centers, such as art, reading, writing, science and math centers are also designed to train not just one area, but to center around a primary intelligence type while also cross-developing related areas. For specific examples of the learning centers associated with particular contemporary preschools, see [Nei98]. In many progressive, student-centered preschools, students are left largely to their own devices to move from one center to another throughout the preschool room. Generally, each center will be staffed by an instructor at some points in the day but not others, providing a variety of learning experiences. To imitate the general character of a human preschool, we will create several centers in our robot lab. The precise architecture will be adapted via experience but initial centers will likely be: e a blocks center: a table with blocks on it e a language center: a circle of chairs, intended for people to sit around and talk with the robot HOUSE_OVERSIGHT_012944
2.5 Integrative and Synergetic Approaches to Artificial General Intelligence 29 e a manipulatives center, with a variety of different objects of different shapes and sizes, intended to teach visual and motor skills e a ball play center: where balls are kept in chests and there is space for the robot to kick the balls around e a dramatics center where the robot can observe and enact various movements One Running Example As we proceed through the various component structures and dynamics of CogPrime in the following chapters, it will be useful to have a few running examples to use to explain how the various parts of the system are supposed to work. One example we will use fairly frequently is drawn from the preschool context: the somewhat open-ended task of Build me something out of blocks, that you haven’t built for me before, and then tell me what it is. This is a relatively simple task that combines multiple aspects of cognition in a richly interconnected way, and is the sort of thing that young children will naturally do in a preschool setting. 2.5 Integrative and Synergetic Approaches to Artificial General Intelligence In Chapter 1 we characterized CogPrime as an integrative approach. And we suggest that the naturalness of integrative approaches to AGI follows directly from comparing above lists of capabilities and criteria to the array of available AI technologies. No single known algorithm or data structure appears easily capable of carrying out all these functions, so if one wants to proceed now with creating a general intelligence that is even vaguely humanlike, one must integrate various AI technologies within some sort of unifying architecture. For this reason and others, an increasing amount of work in the AI community these days is integrative in one sense or another. Estimation of Distribution Algorithms integrate proba- bilistic reasoning with evolutionary learning [Pel05]. Markov Logic Networks [RD06] integrate formal logic and probabilistic inference, as does the Probabilistic Logic Networks framework [GIGHO08] utilized in CogPrime and explained further in the book, and other works in the “Progic” area such as [WW06]. Leslie Pack Kaelbling has synthesized low-level robotics methods (particle filtering) with logical inference [ZPIX07|. Dozens of further examples could be given. The construction of practical robotic systems like the Stanley system that won the DARPA Grand Challenge [Tea06] involve the integration of numerous components based on different principles. These algorithmic and pragmatic innovations provide ample raw materials for the construction of integrative cognitive architectures and are part of the reason why childlike AGI is more approachable now than it was 50 or even 10 years ago. Further, many of the cognitive architectures described in the current AI literature are “inte- grative” in the sense of combining multiple, qualitatively different, interoperating algorithms. Chapter 4 gives a high-level overview of existing cognitive architectures, dividing them into symbolic, emergentist (e.g. neural network) and Aybrid architectures. The hybrid architectures generally integrate symbolic and neural components, often with multiple subcomponents within each of these broad categories. However, we believe that even these excellent architectures are not integrative enough, in the sense that they lack sufficiently rich and nuanced interactions HOUSE_OVERSIGHT_012945
30 2 What Is Human-Like General Intelligence? between the learning components associated with different kinds of memory, and hence are un- likely to give rise to the emergent structures and dynamics characterizing general intelligence. One of the central ideas underlying CogPrime is that with an integrative cognitive architecture that combines multiple aspects of intelligence, achieved by diverse structures and algorithms, within a common framework designed specifically to support robust synergetic interactions between these aspects. The simplest way to create an integrative AI architecture is to loosely couple multiple com- ponents carrying out various functions, in such a way that the different components pass inputs and outputs amongst each other but do not interfere with or modulate each others’ internal functioning in real-time. However, the human brain appears to be integrative in a much tighter sense, involving rich real-time dynamical coupling between various components with distinct but related functions. In [Goe09a] we have hypothesized that the brain displays a property of cognitive synergy, according to which multiple learning processes can not only dispatch subproblems to each other, but also share contextual understanding in real-time, so that each one can get help from the others in a contextually savvy way. By imbuing AI ar- chitectures with cognitive synergy, we hypothesize, one can get past the bottlenecks that have plagued AT in the past. Part of the reasoning here, as elaborated in Chapter 9 and [Goe09b], is that real physical and social environments display a rich dynamic interconnection between their various aspects, so that richly dynamically interconnected integrative AI architectures will be able to achieve goals within them more effectively. And this brings us to the patternist perspective on intelligent systems, alluded to above and fleshed out further in Chapter 3 with its focus on the emergence of hierarchically and heterarchi- cally structured networks of patterns, and pattern-systems modeling self and others. Ultimately the purpose of cognitive synergy in an AGI system is to enable the various AI algorithms and structures composing the system to work together effectively enough to give rise to the right system-wide emergent structures characterizing real-world general intelligence. The underlying theory is that intelligence is not reliant on any particular structure or algorithm, but is reliant on the emergence of appropriately structured networks of patterns, which can then be used to guide ongoing dynamics of pattern recognition and creation. And the underlying hypothesis is that the emergence of these structures cannot be achieved by a loosely interconnected assem- blage of components, no matter how sensible the architecture; it requires a tightly connected, synergetic system. It is possible to make these theoretical ideas about cognition mathematically rigorous; for instance, Appendix ?? briefly presents a formal definition of cognitive synergy that has been analyzed as part of an effort to prove theorems about the importance of cognitive synergy for giving rise to emergent system properties associated with general intelligence. However, while we have found such formal analyses valuable for clarifying our designs and understanding their qualitative properties, we have concluded that, for the present, the best way to explore our hypotheses about cognitive synergy and human-like general intelligence is empirically — via building and testing systems like CogPrime. 2.5.1 Achieving Humanlike Intelligence via Cognitive Synergy Summing up: at the broadest level, there are four primary challenges in constructing an inte- grative, cognitive synergy based approach to AGI: HOUSE_OVERSIGHT_012946
2.5 Integrative and Synergetic Approaches to Artificial General Intelligence 31 1. choosing an overall cognitive architecture that possesses adequate richness and flexi- bility for the task of achieving childlike cognition. 2. Choosing appropriate AI algorithms and data structures to fulfill each of the func- tions identified in the cognitive architecture (e.g. visual perception, audition, episodic mem- ory, language generation, analogy...) 3. Ensuring that these algorithms and structures, within the chosen cognitive architecture, are able to cooperate in such a way as to provide appropriate coordinated, synergetic intelligent behavior (a critical aspect since childlike cognition is an integrated functional response to the world, rather than a loosely coupled collection of capabilities.) 4. Embedding one’s system in an environment that provides sufficiently rich stimuli and interactions to enable the system to use this cooperation to ongoingly, creatively develop an intelligent internal world-model and self-model. We argue that CogPrime provides a viable way to address these challenges. HOUSE_OVERSIGHT_012947
32 2 What Is Human-Like General Intelligence? Fig. 2.1: The Nao humanoid robot HOUSE_OVERSIGHT_012948
2.5 Integrative and Synergetic Approaches to Artificial General Intelligence 33 Fig. 2.2: The Nao humanoid robot HOUSE_OVERSIGHT_012949
HOUSE_OVERSIGHT_012950
Chapter 3 A Patternist Philosophy of Mind 3.1 Introduction In the last chapter we discussed human intelligence from a fairly down-to-earth perspective, looking at the particular intelligent functions that human beings carry out in their everyday lives. And we strongly feel this practical perspective is important: Without this concreteness, it’s too easy for AGI research to get distracted by appealing (or frightening) abstractions of various sorts. However, it’s also important to look at the nature of mind and intelligence from a more general and conceptual perspective, to avoid falling into an approach that follows the particulars of human capability but ignores the deeper structures and dynamics of mind that ultimately allow human minds to be so capable. In this chapter we very briefly review some ideas from the patternist philosophy of mind, a general conceptual framework on intelligence which has been inspirational for many key aspects of the CogPrime design, and which has been ongoingly developed by one of the authors (Ben Goertzel) during the last two decades (in a series of publications beginning in 1991, most recently The Hidden Pattern |[Goe06al). Some of the ideas described are quite broad and conceptual, and are related to CogPrime only via serving as general inspirations; others are more concrete and technical, and are actually utilized within the design itself. CogPrime is an integrative design formed via the combination of a number of different philosophical, scientific and engineering ideas. The success or failure of the design doesn’t depend on any particular philosophical understanding of intelligence. In that sense, the more abstract notions presented in this chapter should be considered “optional” rather than critical in a CogPrime context. However, due to the core role patternism has played in the development of CogPrime, understanding a few things about general patternist philosophy will be helpful for understanding CogPrime, even for those readers who are not philosophically inclined. Those readers who are philosophically inclined, on the other hand, are urged to read The Hidden Pattern and then interpret the particulars of CogPrime in this light. 3.2 Some Patternist Principles The patternist philosophy of mind is a general approach to thinking about intelligent systems. It is based on the very simple premise that mind is made of pattern — and that a mind is a 35 HOUSE_OVERSIGHT_012951
36 3 A Patternist Philosophy of Mind system for recognizing patterns in itself and the world, critically including patterns regarding which procedures are likely to lead to the achievement of which goals in which contexts. Pattern as the basis of mind is not in itself is a very novel idea; this concept is present, for instance, in the 19th-century philosophy of Charles Peirce [Pei34], in the writings of contempo- rary philosophers Daniel Dennett [Den91] and Douglas Hofstadter [Hof79, Hof96], in Benjamin Whorf’s [Who64] linguistic philosophy and Gregory Bateson’s [Bat79] systems theory of mind and nature. Bateson spoke of the Metapattern: “that it is pattern which connects.” In Goertzel’s writings on philosophy of mind, an effort has been made to pursue this theme more thoroughly than has been done before, and to articulate in detail how various aspects of human mind and mind in general can be well-understood by explicitly adopting a patternist perspective. ! In the patternist perspective, "pattern" is generally defined as "representation as something simpler." Thus, for example, if one measures simplicity in terms of bit-count, then a program compressing an image would be a pattern in that image. But if one uses a simplicity measure incorporating run-time as well as bit-count, then the compressed version may or may not be a pattern in the image, depending on how one’s simplicity measure weights the two factors. This definition encompasses simple repeated patterns, but also much more complex ones. While pattern theory has typically been elaborated in the context of computational theory, it is not intrinsically tied to computation; rather, it can be developed in any context where there is a notion of "representation" or "production" and a way of measuring simplicity. One just needs to be able to assess the extent to which f represents or produces X, and then to compare the simplicity of f and X; and then one can assess whether f is a pattern in X. A formalization of this notion of pattern is given in [Goe06a] and briefly summarized at the end of this chapter. Next, in patternism the mind of an intelligent system is conceived as the (fuzzy) set of patterns in that system, and the set of patterns emergent between that system and other systems with which it interacts. The latter clause means that the patternist perspective is inclusive of notions of distributed intelligence [[ut96]. Basically, the mind of a system is the fuzzy set of different simplifying representations of that system that may be adopted. Intelligence is conceived, similarly to in Marcus Hutter’s [ut05] recent work (and as elabo- rated informally in Chapter 2 above, and formally in Chapter 7 below), as the ability to achieve complex goals in complex environments; where complexity itself may be defined as the pos- session of a rich variety of patterns. A mind is thus a collection of patterns that is associated with a persistent dynamical process that achieves highly-patterned goals in highly-patterned environments. An additional hypothesis made within the patternist philosophy of mind is that reflection is critical to intelligence. This lets us conceive an intelligent system as a dynamical system that recognizes patterns in its environment and itself, as part of its quest to achieve complex goals. While this approach is quite general, it is not vacuous; it gives a particular structure to the tasks of analyzing and synthesizing intelligent systems. About any would-be intelligent system, we are led to ask questions such as: e How are patterns represented in the system? That is, how does the underlying infrastructure of the system give rise to the displaying of a particular pattern in the system’s behavior? e What kinds of patterns are most compactly represented within the system? e What kinds of patterns are most simply learned? 1 In some prior writings the term “psynet model of mind” has been used to refer to the application of patternist philosophy to cognitive theory, but this term has been "deprecated" in recent publications as it seemed to introduce more confusion than clarification. HOUSE_OVERSIGHT_012952
3.2 Some Patternist Principles 37 e What learning processes are utilized for recognizing patterns? e What mechanisms are used to give the system the ability to introspect (so that it can recognize patterns in itself)? Now, these same sorts of questions could be asked if one substituted the word “pattern” with other words like “knowledge” or “information”. However, we have found that asking these ques- tions in the context of pattern leads to more productive answers, avoiding unproductive byways and also tying in very nicely with the details of various existing formalisms and algorithms for knowledge representation and learning. Among the many kinds of patterns in intelligent systems, semiotic patterns are particularly interesting ones. Peirce decomposed these into three categories: @ iconic patterns, which are patterns of contextually important internal similarity between two entities (e.g. an iconic pattern binds a picture of a person to that person) e indexical patterns, which are patterns of spatiotemporal co-occurrence (e.g. an indexical pattern binds a wedding dress and a wedding) e symbolic patterns, which are patterns indicating that two entities are often involved in the same relationships (e.g. a symbolic pattern between the number “5” (the symbol) and various sets of 5 objects (the entities that the symbol is taken to represent)) Of course, some patterns may span more than one of these semiotic categories; and there are also some patterns that don’t fall neatly into any of these categories. But the semiotic patterns are particularly important ones; and symbolic patterns have played an especially large role in the history of AI, because of the radically different approaches different researchers have taken to handling them in their AI systems. Mathematical logic and related formalisms provide sophisticated mechanisms for combining and relating symbolic patterns (“symbols”), and some AI approaches have focused heavily on these, sometimes more so than on the identification of symbolic patterns in experience or the use of them to achieve practical goals. We will look fairly carefully at these differences in Chapter 4. Pursuing the patternist philosophy in detail leads to a variety of particular hypotheses and conclusions about the nature of mind. Following from the view of intelligence in terms of achieving complex goals in complex environments, comes a view in which the dynamics of a cognitive system are understood to be governed by two main forces: e selforganization, via which system dynamics cause existing system patterns to give rise to new ones e goal-oriented behavior, which will be defined more rigorously in Chapter 7, but basically amounts to a system interacting with its environment in a way that appears like an attempt to maximize some reasonably simple function Self-organized and goal-oriented behavior must be understood as cooperative aspects. If an agent is asked to build a surprising structure out of blocks and does so, this is goal-oriented. But the agent’s ability to carry out this goal-oriented task will be greater if it has previously played around with blocks a lot in an unstructured, spontaneous way. And the “nudge toward creativity” given to it by asking it to build a surprising blocks structure may cause it to explore some novel patterns, which then feed into its future unstructured blocks play. Based on these concepts, as argued in detail in [Goe06al, several primary dynamical principles may be posited, including: HOUSE_OVERSIGHT_012953
38 3 A Patternist Philosophy of Mind e Evolution , conceived as a general process via which patterns within a large population thereof are differentially selected and used as the basis for formation of new patterns, based on some “fitness function” that is generally tied to the goals of the agent — Example: If trying to build a blocks structure that will surprise Bob, an agent may simulate several procedures for building blocks structures in its “mind’s eye”, assessing for each one the expected degree to which it might surprise Bob. The search through procedure space could be conducted as a form of evolution, via an algorithm such as MOSES. e Autopoiesis: the process by which a system of interrelated patterns maintains its integrity, via a dynamic in which whenever one of the patterns in the system begins to decrease in intensity, some of the other patterns increase their intensity in a manner that causes the troubled pattern to increase in intensity again — Example: An agent’s set of strategies for building the base of a tower, and its set of strategies for building the middle part of a tower, are likely to relate autopoietically. If the system partially forgets how to build the base of a tower, then it may regenerate this missing knowledge via using its knowledge about how to build the middle part (i.e., it knows it needs to build the base in a way that will support good middle parts). Similarly if it partially forgets how to build the middle part, then it may regenerate this missing knowledge via using its knowledge about how to build the base (i.e. it knows a good middle part should fit in well with the sorts of base it knows are good). — This same sort of interdependence occurs between pattern-sets containing more than two elements — Sometimes (as in the above example) autopoietic interdependence in the mind is tied to interdependencies in the physical world, sometimes not. e Association. Patterns, when given attention, spread some of this attention to other pat- terns that they have previously been associated with in some way. Furthermore, there is Peirce’s law of mind [Pei34], which could be paraphrased in modern terms as stating that the mind is an associative memory network, whose dynamics dictate that every idea in the memory is an active agent, continually acting on those ideas with which the memory associates it. — Example: Building a blocks structure that resembles a tower, spreads attention to mem- ories of prior towers the agents has seen, and also to memories of people the agent knows have seen towers, and structures it has built at the same time as towers, structures that resemble towers in various respects, etc. e Differential attention allocation / credit assignment. Patterns that have been valu- able for goal-achievement are given more attention, and are encouraged to participate in giving rise to new patterns. — Example: Perhaps in a prior instance of the task “build me a surprising structure out of blocks,” searching through memory for non-blocks structures that the agent has played with has proved a useful cognitive strategy. In that case, when the task is posed to the agent again, it should tend to allocate disproportionate resources to this strategy. e Pattern creation. Patterns that have been valuable for goal-achievement are mutated and combined with each other to yield new patterns. HOUSE_OVERSIGHT_012954
3.2 Some Patternist Principles 39 — Example: Building towers has been useful in a certain context, but so has building structures with a large number of triangles. Why not build a tower out of triangles? Or maybe a vaguely tower-like structure that uses more triangles than a tower easily could? — Example: Building an elongated block structure resembling a table was successful in the past, as was building a structure resembling a very flat version of a chair. Generalizing, maybe building distorted versions of furniture is good. Or maybe it is building distorted version of any previously perceived objects that is good. Or maybe both, to different degrees... Next, for a variety of reasons outlined in [Goe(6a] it becomes appealing to hypothesize that the network of patterns in an intelligent system must give rise to the following large-scale emergent structures e Hierarchical network. Patterns are habitually in relations of control over other patterns that represent more specialized aspects of themselves. — Example: The pattern associated with “tall building” has some control over the pattern associated with “tower”, as the former represents a more general concept ... and “tower” has some control over “Eiffel tower”, etc. e Heterarchical network. The system retains a memory of which patterns have previously been associated with each other in any way. — Example: “Tower” and “snake” are distant in the natural pattern hierarchy, but may be associatively /heterarchically linked due to having a common elongated structure. This heterarchical linkage may be used for many things, e.g. it might inspire the creative construction of a tower with a snake’s head. e Dual network. Hierarchical and heterarchical structures are combined, with the dynamics of the two structures working together harmoniously. Among many possible ways to hier- archically organize a set of patterns, the one used should be one that causes hierarchically nearby patterns to have many meaningful heterarchical connections; and of course, there should be a tendency to search for heterarchical connections among hierarchically nearby patterns. — Example: While the set of patterns hierarchically nearby “tower” and the set of patterns heterarchically nearby “tower” will be quite different, they should still have more overlap than random pattern-sets of similar sizes. So, if looking for something else heterarchically near “tower”, using the hierarchical information about “tower” should be of some use, and vice versa. — In PLN, hierarchical relationships correspond to Atoms A and B so that InheritanceAB and InheritanceB A have highly dissimilar strength; and heterarchical relationships cor- respond to IntensionalSimilarity relationships. The dual network structure then arises when intensional and extensional inheritance approximately correlate with each other, so that inference about either kind of inheritance assists with figuring out about the other kind. e Self structure. A portion of the network of patterns forms into an approximate image of the overall network of patterns. HOUSE_OVERSIGHT_012955
40 3 A Patternist Philosophy of Mind — Example: Each time the agent builds a certain structure, it observes itself building the structure, and its role as “builder of a tall tower” (or whatever the structure is) becomes part of its self-model. Then when it is asked to build something new, it may consult its selfmodel to see if it believes itself capable of building that sort of thing (for instance, if it is asked to build something very large, its self-model may tell it that it lacks persistence for such projects, so it may reply “I can try, but I may wind up not finishing it”). As we proceed through the CogPrime design in the following pages, we will see how each of these abstract concepts arises concretely from CogPrime’s structures and algorithms. If the theory of [Goe06a] is correct, then the success of CogPrime as a design will depend largely on whether these high-level structures and dynamics can be made to emerge from the synergetic interaction of CogPrime’s representation and algorithms, when they are utilized to control an appropriate agent in an appropriate environment. 3.3 Cognitive Synergy Now we dig a little deeper and present a different sort of “general principle of feasible general intelligence”, already hinted in earlier chapters: the cognitive synergy principle 7, which is both a conceptual hypothesis about the structure of generally intelligent systems in certain classes of environments, and a design principle used to guide the design of CogPrime. Chapter 8 presents a mathematical formalization of the notion of cognitive synergy; here we present the conceptual idea informally, which makes it more easily digestible but also more vague-sounding. We will focus here on cognitive synergy specifically in the case of “multi-memory systems,” which we define as intelligent systems whose combination of environment, embodiment and motivational system make it important for them to possess memories that divide into partially but not wholly distinct components corresponding to the categories of: e Declarative memory — Examples of declarative knowledge: Towers on average are taller than buildings. I gener- ally am better at building structures I imagine, than at imitating structures I’m shown in pictures. e Procedural memory (memory about how to do certain things) — Examples of procedural knowledge: Practical know-how regarding how to pick up an elongated rectangular block, or a square one. Know-how regarding when to approach a problem by asking “What would one of my teachers do in this situation” versus by thinking through the problem from first principles. e Sensory and episodic memory — Example of sensory knowledge: memory of Bob’s face; memory of what a specific tall blocks tower looked like 2 While these points are implicit in the theory of mind given in [Goe06a], they are not articulated in this specific form there. So the material presented in this section is a new development within patternist philosophy, developed since [Goe06a] in a series of conference papers such as [Goe(9al. HOUSE_OVERSIGHT_012956
3.3 Cognitive Synergy Al — Example of episodic knowledge: memory of the situation in which the agent first met Bob; memory of a situation in which a specific tall blocks tower was built e Attentional memory (knowledge about what to pay attention to in what contexts) — Example of attentional knowledge: When involved with a new person, it’s useful to pay attention to whatever that person looks at e Intentional memory (knowledge about the system’s own goals and subgoals) — Example of intentional knowledge: If my goal is to please some person whom I don’t know that well, then a subgoal may be figuring out what makes that person smile. In Chapter 9 below we present a detailed argument as to how the requirement for a multi- memory underpinning for general intelligence emerges from certain underlying assumptions regarding the measurement of the simplicity of goals and environments. Specifically we argue that each of these memory types corresponds to certain modes of communication, so that intel- ligent agents which have to efficiently handle a sufficient variety of types of communication with other agents, are going to have to handle all these types of memory. These types of communi- cation overlap and are often used together, which implies that the different memories and their associated cognitive processes need to work together. The points made in this section do not rely on that argument regarding the relation of multiple memory types to the environmental situation of multiple communication types. What they do rely on is the assumption that, in the intelligence agent in question, the different components of memory are significantly but not wholly distinct. That is, there are significant “family resemblances” between the memories of a single type, yet there are also thoroughgoing connections between memories of different types. Repeating the above points in a slightly more organized manner and then extending them, the essential idea of cognitive synergy, in the context of multi-memory systems, may be expressed in terms of the following points 1. Intelligence, relative to a certain set of environments, may be understood as the capability to achieve complex goals in these environments. 2. With respect to certain classes of goals and environments, an intelligent system requires a “multi-memory” architecture, meaning the possession of a number of specialized yet inter- connected knowledge types, including: declarative, procedural, attentional, sensory, episodic and intentional (goal-related). These knowledge types may be viewed as different sorts of patterns that a system recognizes in itself and its environment. 3. Such a system must possess knowledge creation (i.e. pattern recognition / formation) mech- anisms corresponding to each of these memory types. These mechanisms are also called “cognitive processes.” 4, Each of these cognitive processes, to be effective, must have the capability to recognize when it lacks the information to perform effectively on its own; and in this case, to dynamically and interactively draw information from knowledge creation mechanisms dealing with other types of knowledge 5. This cross-mechanism interaction must have the result of enabling the knowledge creation mechanisms to perform much more effectively in combination than they would if operated non-interactively. This is “cognitive synergy.” Interactions as mentioned in Points 4 and 5 in the above list are the real conceptual meat of the cognitive synergy idea. One way to express the key idea here, in an AI context, is that HOUSE_OVERSIGHT_012957
42 3 A Patternist Philosophy of Mind most AI algorithms suffer from combinatorial explosions: the number of possible elements to be combined in a synthesis or analysis is just too great, and the algorithms are unable to filter through all the possibilities, given the lack of intrinsic constraint that comes along with a “general intelligence” context (as opposed to a narrow-AI problem like chess-playing, where the context is constrained and hence restricts the scope of possible combinations that needs to be considered). In an AGI architecture based on cognitive synergy, the different learning mechanisms must be designed specifically to interact in such a way as to palliate each others’ combinatorial explosions - so that, for instance, each learning mechanism dealing with a certain sort of knowledge, must synergize with learning mechanisms dealing with the other sorts of knowledge, in a way that decreases the severity of combinatorial explosion. One prerequisite for cognitive synergy to work is that each learning mechanism must rec- ognize when it is “stuck,” meaning it’s in a situation where it has inadequate information to make a confident judgment about what steps to take next. Then, when it does recognize that it’s stuck, it may request help from other, complementary cognitive mechanisms. 3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis We have discussed the need for synergetic interrelation between cognitive processes correspond- ing to different types of memory ... and the general high-level cognitive dynamics that a mind must possess (evolution, autopoiesis). The next step is to dig further into the nature of the cog- nitive processes associated with different memory types and how they give rise to the needed high-level cognitive dynamics. In this section we present a general theory of cognitive processes based on a decomposition of cognitive processes into the two categories of analysis and synthesis, and a general formulation of each of these categories 3. Specifically we focus here on what we call focused cognitive processes; that is, cognitive processes that selectively focus attention on a subset of the patterns making up a mind. In general these are not the only kind, there may also be global cognitive processes that act on every pattern in a mind. An example of a global cognitive process in CogPrime is the basic attention allocation process, which spreads “importance” among all knowledge in the system’s memory. Global cognitive processes are also important, but focused cognitive processes are subtler to understand which is why we spend more time on them here. 3.4.1 Component-Systems and Self-Generating Systems We begin with autopoesis — and, more specifically, with the concept of a “component-system”, as described in George Kampis’s book Self-Modifying Systems in Biology and Cognitive Science [IKam91], and as modified into the concept of a “self-generating system” or SGS in Goertzel’s book Chaotic Logic [Goe94]. Roughly speaking, a Kampis-style component-system consists of a set of components that combine with each other to form other compound components. The 3 While these points are highly compatible with theory of mind given in [Goe06al, they are not articulated there. The material presented in this section is a new development within patternist philosophy, presented previously only in the article [GPPGO6]. HOUSE_OVERSIGHT_012958
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis 43 metaphor Kampis uses is that of Lego blocks, combining to form bigger Lego structures. Com- pound structures may in turn be combined together to form yet bigger compound structures. A self-generating system is basically the same concept as a component-system, but understood to be computable, whereas Kampis claims that component-systems are uncomputable. Next, in SGS theory there is also a notion of reduction (not present in the Lego metaphor): sometimes when components are combined in a certain way, a “reaction” happens, which may lead to the elimination of some of the components. One relevant metaphor here is chemistry. Another is abstract algebra: for instance, if we combine a component f with its “inverse” com- ponent f~', both components are eliminated. Thus, we may think about two stages in the interaction of sets of components: combination, and reduction. Reduction may be thought of as algebraic simplification, governed by a set of rules that apply to a newly created compound component, based on the components that are assembled within it. Formally, suppose C1, C2,... is the set of components present in a discrete-time component- system at time t. Then, the components present at time t+1 are a subset of the set of components of the form Reduce( Join(C;(1), ...,Ci(r))) where Join is a joining operation, and Reduce is a reduction operator. The joining operation is assumed to map tuples of components into components, and the reduction operator is assumed to map the space of components into itself. Of course, the specific nature of a component system is totally dependent on the particular definitions of the reduction and joining operators; in following chapters we will specify these for the CogPrime system, but for the purpose of the broader theoretical discussion in this section they may be left general. What is called the “cognitive equation” in Chaotic Logic [Goe94] is the case of a SGS where the patterns in the system at time ¢ have a tendency to correspond to components of the system at future times t+ s. So, part of the action of the system is to transform implicit knowledge (patterns among system components) into explicit knowledge (specific system components). We will see one version of this phenomenon in Chapter 14 where we model implicit knowledge using mathematical structures called “derived hypergraphs”; and we will also later review several ways in which CogPrime’s dynamics explicitly encourage cognitive-equation type dynamics, e.g.: e inference, which takes conclusions implicit in the combination of logical relationships, and makes them implicit by deriving new logical relationships from them e map formation, which takes concepts that have often been active together, and creates new concepts grouping them e association learning, which creates links representing patterns of association between entities e probabilistic procedure learning, which creates new models embodying patterns regarding which procedures tend to perform well according to particular fitness functions 3.4.2 Analysis and Synthesis Now we move on to the main point of this section: the argument that all or nearly all focused cognitive processes are expressible using two general process-schemata we call synthesis and HOUSE_OVERSIGHT_012959
44 3 A Patternist Philosophy of Mind analysis 4. The notion of “focused cognitive process” will be exemplified more thoroughly below, but in essence what is meant is a cognitive process that begins with a small number of items (drawn from memory) as its focus, and has as its goal discovering something about these items, or discovering something about something else in the context of these items or in a way strongly biased by these items. This is different from a global cognitive process whose goal is more broadly-based and explicitly involves all or a large percentage of the knowledge in an intelligent system’s memory store. Among the focused cognitive processes are those governed by the so-called cognitive schematic implication Context \ Procedure — Goal where the Context involves sensory, episodic and/or declarative knowledge; and attentional knowledge is used to regulate how much resource is given to each such schematic implication in memory. Synergy among the learning processes dealing with the context, the procedure and the goal is critical to the adequate execution of the cognitive schematic using feasible computational resources. This sort of explicitly goal-driven cognition plays a significant though not necessarily dominant role in CogPrime, and is also related to production rules systems and other traditional AI systems, as will be articulated in Chapter 4. The synthesis and analysis processes as we conceive them, in the general framework of SGS theory, are as follows. First, synthesis, as shown in Figure 3.1, is defined as synthesis: Iteratively build compounds from the initial component pool using the combinators, greedily seeking compounds that seem likely to achieve the goal. Or in more detail: 1. Begin with some initial components (the initial “current pool’), an additional set of com- ponents identified as “combinators” (combination operators), and a goal function 2. Combine the components in the current pool, utilizing the combinators, to form product components in various ways, carrying out reductions as appropriate, and calculating relevant quantities associated with components as needed 3. Select the product components that seem most promising according to the goal function, and add these to the current pool (or else simply define these as the current pool) 4, Return to Step 2 And analysis, as shown in Figure 3.2, is defined as analysis: Iteratively search (the system’s long-term memory) for component-sets that com- bine using the combinators to form the initial component pool (or subsets thereof), greedily seeking component-sets that seem likely to achieve the goal or in more detail: 1. Begin with some components (the initial “current pool”) and a goal function 2. Seek components so that, if one combines them to form product components using the combinators and then performs appropriate reductions, one obtains (as many as possible of) the components in the current pool 4 In [GPPGO6], what is here called “analysis” was called “backward synthesis”, a name which has some advantages since it indicated that what’s happening is a form of creation; but here we have opted for the more traditional analysis/synthesis terminology HOUSE_OVERSIGHT_012960
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis 45 initial focus (concepts, procedures, inference rules, etc.) Fig. 3.1: The General Process of Synthesis 3. Use the newly found constructions of the components in the current pool, to update the quantitative properties of the components in the current pool, and also (via the current pool) the quantitative properties of the components in the initial pool 4, Out of the components found in Step 2, select the ones that seem most promising according to the goal function, and add these to the current pool (or else simply define these as the current pool) 5. Return to Step 2 More formally, synthesis may be specified as follows. Let X denote the set of combinators, and let Yp denote the initial pool of components (the initial focus of the cognitive process). Given Y;, let Z; denote the set Reduce(Join(C;(1), ..., Ci(r))) where the C; are drawn from Y; or from X. We may then say Yi41 = Filter(Z;) where Filter is a function that selects a subset of its arguments. Analysis, on the other hand, begins with a set W of components, and a set X of combinators, and tries to find a series Y; so that according to the process of synthesis, Y,=W. In practice, of course, the implementation of a synthesis process need not involve the explicit construction of the full set Z;. Rather, the filtering operation takes place implicitly during the construction of Y;;1. The result, however, is that one gets some subset of the compounds pro- ducible via joining and reduction from the set of components present in Y; plus the combinators XxX, HOUSE_OVERSIGHT_012961
46 3 A Patternist Philosophy of Mind q___ initial focus a (concepts, procedures, = I inference rules, etc.) aay , |] (os -_ = 7 fc | set of items set of items that that combine to yield items in initial focus nie epee Fig. 3.2: The General Process of Analysis Conceptually one may view synthesis as a very generic sort of “growth process,” and analysis as a very generic sort of “figuring out how to grow something.” The intuitive idea underlying the present proposal is that these forward-going and backward-going “growth processes” are among the essential foundations of cognitive control, and that a conceptually sound design for cognitive control should explicitly make use of this fact. To abstract away from the details, what these processes are about is: e taking the general dynamic of compound-formation and reduction as outlined in Kampis and Chaotic Logic e introducing goal-directed pruning (“filtering”) into this dynamic so as to account for the limitations of computational resources that are a necessary part of pragmatic intelligence 3.4.3 The Dynamic of Iterative Analysis and Synthesis While synthesis and analysis are both very useful on their own, they achieve their greatest power when harnessed together. It is my hypothesis that the dynamic pattern of alternating synthesis and analysis has a fundamental role in cognition. Put simply, synthesis creates new mental forms by combining existing ones. Then, analysis seeks simple explanations for the forms in the mind, including the newly created ones; and, this explanation itself then comprises additional new forms in the mind, to be used as fodder for the next round of synthesis. Or, to put it yet more simply: HOUSE_OVERSIGHT_012962
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis AT => Combine => Explain > Combine > Explain > Combine > It is not hard to express this alternating dynamic more formally, as well. Let X denote any set of components. Let F(X) denote a set of components which is the result of synthesis on X. Let B(X) denote a set of components which is the result of analysis of X. We assume also a heuristic biasing the synthesis process toward simple constructs. Let S(t) denote a set of components at time t, representing part of a system’s knowledge base. Let I(t) denote components resulting from the external environment at time t. Then, we may consider a dynamical iteration of the form This expresses the notion of alternating synthesis and analysis formally, as a dynamical iteration on the space of sets of components. We may then speak about attractors of this iteration: fixed points, limit cycles and strange attractors. One of the key hypotheses I wish to put forward here is that some key emergent cognitive structures are strange attractors of this equation. The iterative dynamic of combination and explanation leads to the emergence of certain complex structures that are, in essence, maintained when one recombines their parts and then seeks to explain the recombinations. These structures are built in the first place through iterative recombination and explanation, and then survive in the mind because they are conserved by this process. They then ongoingly guide the construction and destruction of various other temporary mental structures that are not so conserved. 3.4.4 Self and Focused Attention as Approximate Attractors of the Dynamic of Iterated Forward- Analysis As noted above, patternist philosophy argues that two key aspects of intelligence are emergent structures that may be called the “self” and the “attentional focus.” These, it is suggested, are aspects of intelligence that may not effectively be wired into the infrastructure of an intelligent system, though of course the infrastructure may be configured in such a way as to encourage their emergence. Rather, these aspects, by their nature, are only likely to be effective if they emerge from the cooperative activity of various cognitive processes acting within a broad base of knowledge. Above we have described the pattern of ongoing habitual oscillation between synthesis and analysis as a kind of “dynamical iteration.” Here we will argue that both self and attentional focus may be viewed as strange attractors of this iteration. The mode of argument is relatively informal. The essential processes under consideration are ones that are poorly understood from an empirical perspective, due to the extreme difficulty involved in studying them experimentally. For understanding self and attentional focus, we are stuck in large part with introspection, which is famously unreliable in some contexts, yet still dramatically better than having no information at all. So, the philosophical perspective on self and attentional focus given here is a synthesis of empirical and introspective notions, drawn largely from the published thinking and research of HOUSE_OVERSIGHT_012963
48 3 A Patternist Philosophy of Mind others but with a few original twists. From a CogPrime perspective, its use has been to guide the design process, to provide a grounding for what otherwise would have been fairly arbitrary choices. 3.4.4.1 Self Another high-level intelligent system pattern mentioned above is the “self”, which we here will tie in with analysis and synthesis processes. The term “self” as used here refers to the “phenomenal self” [Met04] or “selfmodel”. That is, the self is the model that a system builds internally, reflecting the patterns observed in the (external and internal) world that directly pertain to the system itself. As is well known in everyday human life, selfmodels need not be completely accurate to be useful; and in the presence of certain psychological factors, a more accurate self-model may not necessarily be advantageous. But a self-model that is too badly inaccurate will lead to a badly-functioning system that is unable to effectively act toward the achievement of its own goals. The value of a self-model for any intelligent system carrying out embodied agentive cognition is obvious. And beyond this, another primary use of the self is as a foundation for metaphors and analogies in various domains. Patterns recognized pertaining to the self are analogically extended to other entities. In some cases this leads to conceptual pathologies, such as the an- thropomorphization of trees, rocks and other such objects that one sees in some precivilized cultures. But in other cases this kind of analogy leads to robust sorts of reasoning - for instance, in reading Lakoff and Nunez’s [LN00] intriguing explorations of the cognitive foundations of mathematics, it is pretty easy to see that most of the metaphors on which they hypothesize mathematics to be based, are grounded in the mind’s conceptualization of itself as a spatiotem- porally embedded entity, which in turn is predicated on the mind’s having a conceptualization of itself (a self) in the first place. A self-model can in many cases form a self-fulfilling prophecy (to make an obvious double- entendre’!). Actions are generated based on one’s model of what sorts of actions one can and/or should take; and the results of these actions are then incorporated into one’s self-model. If a self-model proves a generally bad guide to action selection, this may never be discovered, unless said self-model includes the knowledge that semi-random experimentation is often useful. In what sense, then, may it be said that self is an attractor of iterated analysis? Analysis infers the self from observations of system behavior. The system asks: What kind of system might I be, in order to give rise to these behaviors that I observe myself carrying out? Based on asking itself this question, it constructs a model of itself, i.e. it constructs a self. Then, this self guides the system’s behavior: it builds new logical relationships its self-model and various other entities, in order to guide its future actions oriented toward achieving its goals. Based on the behaviors newly induced via this constructive, forward-synthesis activity, the system may then engage in analysis again and ask: What must I be now, in order to have carried out these new actions? And so on. Our hypothesis is that after repeated iterations of this sort, in infancy, finally during early childhood a kind of selfreinforcing attractor occurs, and we have a selfmodel that is resilient and doesn’t change dramatically when new instances of action- or explanation-generation occur. This is not strictly a mathematical attractor, though, because over a long period of time the self may well shift significantly. But, for a mature self, many hundreds of thousands or millions of forward-analysis cycles may occur before the self-model is dramatically modified. For relatively HOUSE_OVERSIGHT_012964
3.4 The General Structure of Cognitive Dynamics: Analysis and Synthesis 49 long periods of time, small changes within the context of the existing self may suffice to allow the system to control itself intelligently. Humans can also develop what are known as subselves [Row90]. A subself is a partially autonomous self-network focused on particular tasks, environments or interactions. It contains a unique model of the whole organism, and generally has its own set of episodic memories, consisting of memories of those intervals during which it was the primary dynamic mode con- trolling the organism. One common example is the creative subself — the subpersonality that takes over when a creative person launches into the process of creating something. In these times, a whole different personality sometimes emerges, with a different sort of relationship to the world. Among other factors, creativity requires a certain open-ness that is not always productive in an everyday life context, so it’s natural for the selfsystem of a highly creative person to bifurcate into one self-system for everyday life, and another for the protected context of creative activity. This sort of phenomenon might emerge naturally in CogPrime systems as well if they were exposed to appropriate environments and social situations. Finally, it is interesting to speculate regarding how self may differ in future AI systems as opposed to in humans. The relative stability we see in human selves may not exist in AI systems that can self#improve and change more fundamentally and rapidly than humans can. There may be a situation in which, as soon as a system has understood itself decently, it radically modifies itself and hence violates its existing self-model. Thus: intelligence without a long-term stable self. In this case the “attractor-ish” nature of the self holds only over much shorter time scales than for human minds or human-like minds. But the alternating process of synthesis and analysis for self-construction is still critical, even though no reasonably stable self-constituting attractor ever emerges. The psychology of such intelligent systems will almost surely be beyond human beings’ capacity for comprehension and empathy. 3.4.4.2 Attentional Focus Finally, we turn to the notion of an “attentional focus” similar to Baars’ [Baa97] notion of a Global Workspace, which will be reviewed in more detail in Chapter 4: a collection of mental entities that are, at a given moment, receiving far more than the usual share of an intelligent system’s computational resources. Due to the amount of attention paid to items in the atten- tional focus, at any given moment these items are in large part driving the cognitive processes going on elsewhere in the mind as well - because the cognitive processes acting on the items in the attentional focus are often involved in other mental items, not in attentional focus, as well (and sometimes this results in pulling these other items into attentional focus). An intelligent system must constantly shift its attentional focus from one set of entities to another based on changes in its environment and based on its own shifting discoveries. In the human mind, there is a self-reinforcing dynamic pertaining to the collection of entities in the attentional focus at any given point in time, resulting from the observation that: If A is in the attentional focus, and A and B have often been associated in the past, then odds are increased that B will soon be in the attentional focus. This basic observation has been refined tremendously via a large body of cognitive psychology work; and neurologically it follows not only from Hebb’s [Heb49] classic work on neural reinforcement learning, but also from numerous more modern refinements [SB98]. But it implies that two items A and B, if both in the attentional focus, can reinforce each others’ presence in the attentional focus, hence forming a kind of conspiracy to keep each other in the limelight. But of course, this kind of dynamic HOUSE_OVERSIGHT_012965
50 3 A Patternist Philosophy of Mind must be counteracted by a pragmatic tendency to remove items from the attentional focus if giving them attention is not providing sufficient utility in terms of the achievement of system goals. The synthesis and analysis perspective provides a more systematic perspective on this self- reinforcing dynamic. Synthesis occurs in the attentional focus when two or more items in the focus are combined to form new items, new relationships, new ideas. This happens continually, as one of the main purposes of the attentional focus is combinational. On the other hand, Analysis then occurs when a combination that has been speculatively formed is then linked in with the remainder of the mind (the “unconscious”, the vast body of knowledge that is not in the attentional focus at the given moment in time). Analysis basically checks to see what support the new combination has within the existing knowledge store of the system. Thus, forward-analysis basically comes down to “generate and test”, where the testing takes the form of attempting to integrate the generated structures with the ideas in the unconscious long- term memory. One of the most obvious examples of this kind of dynamic is creative thinking (Boden, 2003; Goertzel, 1997), where the attentional focus continually combinationally creates new ideas, which are then tested via checking which ones can be validated in terms of (built up from) existing knowledge. The analysis stage may result in items being pushed out of the attentional focus, to be replaced by others. Likewise may the synthesis stage: the combinations may overshadow and then replace the things combined. However, in human minds and functional AI minds, the attentional focus will not be a complete chaos with constant turnover: Sometimes the same set of ideas — or a shifting set of ideas within the same overall family of ideas — will remain in focus for a while. When this occurs it is because this set or family of ideas forms an approximate attractor for the dynamics of the attentional focus, in particular for the forward-analysis dynamic of speculative combination and integrative explanation. Often, for instance, a small “core set” of ideas will remain in the attentional focus for a while, but will not exhaust the attentional focus: the rest of the attentional focus will then, at any point in time, be occupied with other ideas related to the ones in the core set. Often this may mean that, for a while, the whole of the attentional focus will move around quasi-randomly through a “strange attractor” consisting of the set of ideas related to those in the core set. 3.4.5 Conclusion The ideas presented above (the notions of synthesis and analysis, and the hypothesis of self and attentional focus as attractors of the iterative forward-analysis dynamic) are quite generic and are hypothetically proposed to be applicable to any cognitive system, natural or artificial. Later chapters will discuss the manifestation of the above ideas in the context of CogPrime. We have found that the analysis/synthesis approach is a valuable tool for conceptualizing CogPrime’s cognitive dynamics, and we conjecture that a similar utility may be found more generally. Next, so as not to end the section on too blasé of a note, we will also make a stronger hypothesis: that, in order for a physical or software system to achieve intelligence that is roughly human-level in both capability and generality, using computational resources on the same order of magnitude as the human brain, this system must e manifest the dynamic of iterated synthesis and analysis, as modes of an underlying “self- generating system” dynamic HOUSE_OVERSIGHT_012966
3.5 Perspectives on Machine Consciousness 51 e do so in such a way as to lead to self and attentional focus as emergent structures that serve as approximate attractors of this dynamic, over time periods that are long relative to the basic “cognitive cycle time” of the system’s forward-analysis dynamics To prove the truth of a hypothesis of this nature would seem to require mathematics fairly far beyond anything that currently exists. Nonetheless, however, we feel it is important to formulate and discuss such hypotheses, so as to point the way for future investigations both theoretical and pragmatic. 3.5 Perspectives on Machine Consciousness Finally, we can’t let a chapter on philosophy — even a brief one — end without some discussion of the thorniest topic in the philosophy of mind: consciousness. Rather than seeking to resolve or comprehensively review this most delicate issue, we will restrict ourselves to giving it in Appendix ?? an overview of many of the common views on the subject; and here in the main text discussing the relationship between consciousness theory and patternist philosophy of cognition, the practical work of designing and building AGI. One fairly concrete idea about consciousness, that relates closely to certain aspects of the CogPrime design, is that the subjective experience of being conscious of some entity X, is corre- lated with the presence of a very intense pattern in one’s overall mind-state, corresponding to X. This simple idea is also the essence of neuroscientist Susan Greenfield’s theory of consciousness [GreO1] (but in her theory, "overall mind-state” is replaced with "brain-state"), and has much deeper historical roots in philosophy of mind which we shall not venture to unravel here. This observation relates to the idea of "moving bubbles of awareness" in intelligent systems. If an intelligent system consists of multiple processing or data elements, and during each (suf- ficiently long) interval of time some of these elements get much more attention than others, then one may view the system as having a certain "attentional focus" during each interval. The attentional focus is itself a significant pattern in the system (the pattern being "these elements habitually get more processor and memory", roughly speaking). As the attentional focus shifts over time one has a "moving bubble of pattern" which then corresponds experientially to a "moving bubble of awareness." This notion of a "moving bubble of awareness" ties in very closely to global workspace theory [Baa97] (briefly mentioned above), a cognitive theory that has broad support from neuroscience and cognitive science and has also served as the motivation for Stan Franklin’s LIDA AI system [BF09], to be discussed in Chapter ??. The global workspace theory views the mind as consisting of a large population of small, specialized processes — a society of agents. These agents organize themselves into coalitions, and coalitions that are relevant to contextually novel phenomena, or contextually important goals, are pulled into the global workspace (which is identified with consciousness). This workspace broadcasts the message of the coalition to al the unconscious agents, and recruits other agents into consciousness. Various sorts of contexts — e.g. goal contexts, perceptual contexts, conceptual contexts and cultural contexts — play a role in determining which coalitions are relevant, and form the unconscious "background" o the conscious global workspace. New perceptions are often, but not necessarily, pushed into the workspace. Some of the agents in the global workspace are concerned with action selection, i.e. with controlling and passing parameters to a population of possible actions. The contents o the workspace at any given time have a certain cohesiveness and interdependency, the so-called HOUSE_OVERSIGHT_012967
52 3 A Patternist Philosophy of Mind "unity of consciousness." In essence the contents of the global workspace form a moving bubble of attention or awareness. In CogPrime, this moving bubble is achieved largely via economic attention network (ECAN) equations [GPI~ 10] that propagate virtual currency between nodes and links representing el- ements of memories, so that the attentional focus consists of the wealthiest nodes and links. Figures 3.3 and 3.4 illustrate the existence and flow of attentional focus in OpenCog. On the other hand, in Hameroff’s recent model of the brain [Ham10], the brain’s moving bubble of attention is achieved through dendro-dendritic connections and the emergent dendritic web. Specific Objects Abstract Concepts Composite Actions, (Some corresponding to Perception Action Complex Feelings named concepts, some not) & Feeling Nodes pixel at (100,50) is RED at 1:42:01, Sept 15,2006 joint_53_actuator is ON at 2:42:01, Sept 15, 2006 Fig. 3.3: Graphical depiction of the momentary bubble of attention in the memory of an OpenCog AI system. Circles and lines represent nodes and links in OpenCogPrimes mem- ory, and stars denote those nodes with a high level of attention (represented in OpenCog by the ShortTermImportance node variable) at the particular point in time. In this perspective, self, free will and reflective consciousness are specific phenomena occur- ring within the moving bubble of awareness. They are specific ways of experiencing awareness, corresponding to certain abstract types of physical structures and dynamics, which we shall endeavor to identify in detail in Appendix ??. HOUSE_OVERSIGHT_012968
3.6 Postscript: Formalizing Pattern 53 Specific Objects Abstract Concepts Composite Actions, (Some corresponding to Perception Action Complex Feelings named concepts, some not) & . - US Feeling Nodes f pixel at (100,50) is RED at 1:42:01, Sept 15,2006 joint_53_actuator is ON at 2:42:01, Sept 15, 2006 Fig. 3.4: Graphical depiction of the momentary bubble of attention in the memory of an OpenCog AI system, a few moments after the bubble shown in Figure 3.3, indicating the mov- ing of the bubble of attention. Depictive conventions are the same as in Figure 1. This shows an idealized situation where the declarative knowledge remains invariant from one moment to the next but only the focus of attention shifts. In reality both will evolve together. 3.6 Postscript: Formalizing Pattern Finally, before winding up our very brief tour through patternist philosophy of mind, we will briefly visit patternism’s more formal side. Many of the key aspects of patternism have been rigorously formalized. Here we give only a few very basic elements of the relevant mathematics, which will be used later on in the exposition of CogPrime. (Specifically, the formal definition of pattern emerges in the CogPrime design in the definition of a fitness function for “pattern min- ing” algorithms and Occam-based concept creation algorithms, and the definition of intensional inheritance within PLN.) We give some definitions, drawn from Appendix 1 of [Goe06a];: Definition 1 Given a metric space (M,d), and two functions c: M — [0,co| (the “simplicity measure”) and F: M —> M (the “production relationship”), we say that P @ M is a pattern in X € M to the degree HOUSE_OVERSIGHT_012969
54 3 A Patternist Philosophy of Mind This degree is called the pattern intensity of P in X. It quantifies the extent to which P is a pattern in X. Supposing that F(P) = X, then the first factor in the definition equals 1, and we are left with only the second term, which measures the degree of compression obtained via representing X as the result of P rather than simply representing X directly. The greater the compression ratio obtained via using P to represent X, the greater the intensity of P as a pattern in X. The first time, in the case F(P) #4 X, adjusts the pattern intensity downwards to account for the amount of error with which F(P) approximates 4 X. If one holds the second factor fixed and thinks about varying the first factor, then: The greater the error, the lossier the compression, and the lower the pattern intensity. For instance, if one wishes one may take c to denote algorithmic information measured on some reference Turing machine, and F(X) to denote what appears on the second tape of a two-tape Turing machine ¢ time-steps after placing X on its first tape. Other more naturalistic computational models are also possible here and are discussed extensively in Appendix 1 of [Goe06a]. Definition 2 The structure of X € M is the fuzzy set Stx defined via the membership function Xstx(P) =e This lets us formalize our definition of “mind” alluded to above: the mind of X as the set of patterns associated with X. We can formalize this, for instance, by considering P to belong to the mind of X if it is a pattern in some Y that includes X. There are then two numbers to look at: .% and P(Y|X) (the percentage of Y that is also contained in X). To define the degree to which P belongs to the mind of X we can then combine these two numbers using some function f that is monotone increasing in both arguments. This highlights the somewhat arbitrary semantics of “of” in the phrase “the mind of X.” Which of the patterns binding X to its environment are part of X’s mind, and which are part of the world? This isn’t necessarily a good question, and the answer seems to depend on what perspective you choose, represented formally in the present framework by what combination function f you choose (for instance if f(a,b) = a™b?” then it depends on the choice of 0 <r < 1). Next, we can formalize the notion of a “pattern space” by positing a metric on patterns, thus making pattern space a metric space, which will come in handy in some places in later chapters: Definition 3 Assuming M is a countable space, the structural distance is a metric dst defined on M via dst(X,Y) =T(xstx,Xsty) where T is the Tanimoto distance. The Tanimoto distance between two real vectors A and B is defined as 7 A-B Al? + (|B? -— A: B and since M is countable this can be applied to fuzzy sets such as Stx via considering the latter as vectors. (As an aside, this can be generalized to uncountable M as well, but we will not require this here.) T(A, B) HOUSE_OVERSIGHT_012970
3.6 Postscript: Formalizing Pattern 55 Using this definition of pattern, combined with the formal theory of intelligence given in Chapter 7, one may formalize the various hypotheses made in the previous section, regarding the emergence of different kinds of networks and structures as patterns in intelligent systems. However, it appears quite difficult to prove the formal versions of these hypotheses given current mathematical tools, which renders such formalizations of limited use. Finally, consider the case where the metric space M has a partial ordering < on it; we may then define Definition 3.1. R € M is a subpattern in X € M to the degree tren true(R < P)diX dus R K — * Jpem This degree is called the subpattern intensity of P in X. Roughly speaking, the subpattern intensity measures the percentage of patterns in X that contain R (where "containment" is judged by the partial ordering <). But the percentage is measured using a weighted average, where each pattern is weighted by its intensity as a pattern in X. A subpattern may or may not be a pattern on its own. A nonpattern that happens to occur within many patterns may be an intense subpattern. Whether the subpatterns in X are to be considered part of the "mind" of X is a somewhat superfluous question of semantics. Here we choose to extend the definition of mind given in [Goe06a] to include subpatterns as well as patterns, because this makes it simpler to describe the relationship between hypersets and minds, as we will do in Appendix ??. HOUSE_OVERSIGHT_012971
HOUSE_OVERSIGHT_012972
Chapter 4 Brief Survey of Cognitive Architectures 4.1 Introduction While we believe CogPrime is the most thorough attempt at an architecture for advanced AGI, to date, we certainly recognize there have been many valuable attempts in the past with similar aims; and we also have great respect for other AGI efforts occurring in parallel with Cog- Prime development, based on alternative, sometimes overlapping, theoretical presuppositions and practical choices. In most of this book we will ignore these other current and historical efforts except where they are directly useful for CogPrime — there are many literature reviews already published, and this is a research treatise not a textbook. In this chapter, however, we will break from this pattern and give a rough high-level overview of the various AGI archi- tectures at play in the field today. The overview definitely has a bias toward other work with some direct relevance to CogPrime, but not an overwhelming bias; we also discuss a number of approaches that are unrelated to, and even in some cases conceptually orthogonal to, our own. CogPrime builds on prior AI efforts in a variety of ways. Most of the specific algorithms and structures in CogPrime have their roots in prior AI work; and in addition, the CogPrime cognitive architecture has been heavily inspired by some other holistic cognitive architectures, especially (but not exclusively) MicroPsi [Bac09], LIDA [BF 09] and DeSTIN [ARK09a, ARCO9J. In this chapter we will briefly review some existing cognitive architectures, with especial but not exclusive emphasis on the latter three. We will articulate some rough mappings between elements of these other architectures and elements of CogPrime — some in this chapter, and some in Chapter 5. However, these mappings will mostly be left informal and very incompletely specified. The articulation of detailed inter- architecture mappings is an important project, but would be a substantial additional project going well beyond the scope of this book. We will not give a thorough review of the similarities and differences between CogPrime and each of these architectures, but only mention some of the highlights. The reader desiring a more thorough review of cognitive architectures is referred to Wlodek Duch’s review paper from the AGI-08 conference [DOPO8]; and also to Alexei Samsonovich’s review paper [Sam 10], which compares a number of cognitive architectures in terms of a feature checklist, and was created collaboratively with the creators of the architectures. Dutch, in his survey of cognitive architectures [DOP08], divides existing approaches into three paradigms — symbolic, emergentist and hybrid — as broadly indicated in Figure 4.1. Drawing on his survey and updating slightly, we give here some key examples of each, and then explain why 57 HOUSE_OVERSIGHT_012973
58 4 Brief Survey of Cognitive Architectures CogPrime represents a significantly more effective approach to embodied human-like general intelligence. In our treatment of emergentist architectures, we pay particular attention to devel- opmental robotics architectures, which share considerably with CogPrime in terms of underlying philosophy, but differ via not integrating a symbolic “language and inference” component such as CogPrime includes. In brief, we believe that the hybrid approach is the most pragmatic one given the current state of AI technology, but that the emergentist approach gets something fundamentally right, by focusing on the emergence of complex dynamics and structures from the interactions of simple components. So CogPrime is a hybrid architecture which (according to the cognitive synergy principle) binds its components together very tightly dynamically, allowing the emergence of complex dynamics and structures in the integrated system. Most other hybrid architectures are less tightly coupled and hence seem ill-suited to give rise to the needed emergent complexity. The other hybrid architectures that do possess the needed tight coupling, such as MicroPsi [Bac09], strike us as underdeveloped and founded on insufficiently powerful learning algorithms. Cognitive architectures Memory i Memory 3 f Memory : |» Rule-based memory i -Glotalisincnoy @ Localist-distributed i @ Graph-based memory =: @ Localist memory ba Symbolic-connectionist i Learning Learning Learning * = Inductive learning «= Associative leaming iis Bottom-up learning i « =@Analytical learning : ¢ Competitive learning Top-down leaming Fig. 4.1: Duch’s simplified taxonomy of cognitive architectures. CogPrime falls into the “hy- brid” category, but differs from other hybrid architectures in its focus on synergetic interactions between components and their potential to give rise to appropriate system-wide emergent struc- tures enabling general intelligence. 4.2 Symbolic Cognitive Architectures A venerable tradition in AI focuses on the physical symbol system hypothesis [New90], which states that minds exist mainly to manipulate symbols that represent aspects of the world or themselves. A physical symbol system has the ability to input, output, store and alter symbolic entities, and to execute appropriate actions in order to reach its goals. Generally, symbolic cognitive architectures focus on “working memory” that draws on long-term memory as needed, and utilize a centralized control over perception, cognition and action. Although in principle such architectures could be arbitrarily capable (since symbolic systems have universal repre- HOUSE_OVERSIGHT_012974
4.2 Symbolic Cognitive Architectures 59 sentational and computational power, in theory), in practice symbolic architectures tend to be weak in learning, creativity, procedure learning, and episodic and associative memory. Decades of work in this tradition have not resolved these issues, which has led many researchers to explore other options. A few of the more important symbolic cognitive architectures are: e SOAR [LRN87], a classic example of expert rule-based cognitive architecture designed to model general intelligence. It has recently been extended to handle sensorimotor functions, though in a somewhat cognitively unnatural way; and is not yet strong in areas such as episodic memory, creativity, handling uncertain knowledge, and reinforcement learning. e ACT-R [AL03] is fundamentally a symbolic system, but Duch classifies it as a hybrid sys- tem because it incorporates connectionist-style activation spreading in a significant role; and there is an experimental thoroughly connectionist implementation to complement the pri- mary mainly-symbolic implementation. Its combination of SOAR-style “production rules” with large-scale connectionist dynamics allows it to simulate a variety of human psycholog- ical phenomena, but abstract reasoning, creativity and transfer learning are still missing. e EPIC [RCKO]], a cognitive architecture aimed at capturing human perceptual, cognitive and motor activities through several interconnected processors working in parallel. The system is controlled by production rules for cognitive processors and a set of perceptual (visual, auditory, tactile) and motor processors operating on symbolically coded features rather than raw sensory data. It has been connected to SOAR for problem solving, planning and learning, e ICARUS [Lan05], an integrated cognitive architecture for physical agents, with knowledge specified in the form of reactive skills, each denoting goal-relevant reactions to a class of problems. The architecture inclides a number of modules: a perceptual system, a planning system, an execution system, and several memory systems. Concurrent processing is absent, attention allocation is fairly crude, and uncertain knowledge is not thoroughly handled. e SNePS (Semantic Network Processing System) [SE07] is a logic, frame and network-based knowledge representation, reasoning, and acting system that has undergone over three decades of development. While it has been used for some interesting prototype experi- ments in language processing and virtual agent control, it has not yet been used for any large-scale or real-world application. e Cyc [LG90] is an AGI architecture based on predicate logic as a knowledge representation, and using logical reasoning techniques to answer questions and derive new knowledge from old. It has been connected to a natural language engine, and designs have been created for the connection of Cye with Albus’s 4D-RCS [AMOI]. Cyc’s most unique aspect is the large database of commonsense knowledge that Cycorp has accumulated (millions of pieces of knowledge, entered by specially trained humans in predicate logic format); part of the philosophy underlying Cyc is that once a sufficient quantity of knowledge is accumulated in the knowledge base, the problem of creating human-level general intelligence will become much less difficult due to the ability to leverage this knowledge. While these architectures contain many valuable ideas and have yielded some interesting results, we feel they are incapable on their own of giving rise to the emergent structures and dynamics required to yield humanlike general intelligence using feasible computational resources. However, we are more sanguine about the possibility of ideas and components from symbolic architectures playing a role in human-level AGI via incorporation in hybrid architectures. We now review a few symbolic architectures in slightly more detail. HOUSE_OVERSIGHT_012975
60 4 Brief Survey of Cognitive Architectures 4.2.1 SOAR The cognitive architectures best known among AI academics are probably Soar and ACT-R, both of which are explicitly being developed with the dual goals of creating human-level AGI and modeling all aspects of human psychology. Neither the Soar nor ACT-R communities feel themselves particularly near these long-term goals, yet they do take them seriously. Soar is based on IF-THEN rules, otherwise known as “production rules.” On the surface this makes it similar to old-style expert systems, but Soar is much more than an expert system; it’s at minimum a sophisticated problem-solving engine. Soar explicitly conceives problem solving as a search through solution space for a “goal state” representing a (precise or approximate) problem solution. It uses a methodology of incremental search, where each step is supposed to move the system a little closer to its problem-solving goal, and each step involves a potentially complex “decision cycle.” In the simplest case, the decision cycle has two phases: e Gathering appropriate information from the system’s long-term memory (LTM) into its working memory (WM) e A decision procedure that uses the gathered information to decide an action If the knowledge available in LTM isn’t enough to solve the problem, then the decision procedure invokes search heuristics like hill-climbing, which try to create new knowledge (new production rules) that will help move the system closer to a solution. If a solution is found by chaining together multiple production rules, then a chunking mechanism is used to combine these rules together into a single rule for future use. One could view the chunking mechanism as a way of converting explicit knowledge into implicit knowledge, similar to “map formation” in CogPrime (see Chapter 42 of Part 2), but in the current Soar design and implementation it is a fairly crude mechanism. In recent years Soar has acquired a number of additional methods and modalities, including some visual reasoning methods and some mechanisms for handling episodic and procedural knowledge. These expand the scope of the system but the basic production rule and chunking mechanisms as briefly described above remain the core “cognitive algorithm” of the system. From a CogPrime perspective, what Soar offers is certainly valuable, e.g. e heuristics for transferring knowledge from LTM into WM e chaining and chunking of implications e methods for interfacing between other forms of knowledge and implications However, a very short and very partial list of the major differences between Soar and Cog- Prime would include e CogPrime contains a variety of other core cognitive mechanisms beyond the management and chunking of implications e the variety of “chunking” type methods in CogPrime goes far beyond the sort of localized chunking done in Soar e CogPrime is committed to representing uncertainty at the base level whereas Soar’s pro- duction rules are crisp e The mechanisms for LTM-WM interaction are rather different in CogPrime, being based on complex nonlinear dynamics as represented in Economic Attention Allocation (ECAN) e Currently Soar does not contain creativity-focused heuristics like blending or evolutionary learning in its core cognitive dynamic. HOUSE_OVERSIGHT_012976
4.2 Symbolic Cognitive Architectures 61 4.2.2 ACT-R In the grand scope of cognitive architectures, ACT-R is quite similar to Soar, but there are many micro-level differences. ACT-R is defined in terms of declarative and procedural knowl- edge, where procedural knowledge takes the form of Soar-like production rules, and declarative knowledge takes the form of chunks. It contains a variety of mechanisms for learning new rules and chunks from old; and also contains sophisticated probabilistic equations for updating the activation levels associated with items of knowledge (these equations being roughly analogous in function to, though quite different from, the ECAN equations in CogPrime). Figure 4.2 displays the current architecture of ACT-R. The flow of cognition in the system is in response to the current goal, currently active information from declarative memory, informa- tion attended to in perceptual modules (vision and audition are implemented), and the current state of motor modules (hand and speech are implemented). The early work with ACT-R was based on comparing system performance to human behavior, using only behavioral measures, such as the timing of keystrokes or patterns of eye movements. Using such measures, it was not possible to test detailed assumptions about which modules were active in the performance of a task. More recently the ACT-R community has been engaged in a process of using imaging data to provide converging data on module activity. Figure 4.3 illustrates the associations they have made between the modules in Figure 4.2 and brain regions. Coordination among all of these components occurs through actions of the procedural module, which is mapped to the basal ganglia. Fig. 4.2: High-level architecture of ACT-R In practice ACT-R, even more so than Soar, seems to be used more as a programming framework for cognitive modeling than as an AI system. One can fairly easily use ACT-R to program models of specific human mental behaviors, which may then be matched against HOUSE_OVERSIGHT_012977
62 4 Brief Survey of Cognitive Architectures _{ Interior VLPFC { [ Fusiform gyrus ae a, | FTTH Visual Fig. 4.3: Conjectured Mapping Between ACT-R and the Brain psychological data. Opinions differ as to whether this sort of modeling is valuable for achieving AGI goals. CogPrime is not designed to support this kind of modeling, as it intentionally does many things very differently from humans. ACT-R in its original form did not say much about perceptual and motor operations, but recent versions have incorporated EPIC, an independent cognitive architecture focused on mod- eling these aspects of human behavior. 4.2.3 Cyc and Texai Our review of cognitive architectures would be incomplete without mentioning Cye [LG90], one of the best known and best funded AGIL-oriented projects in history. While the main focus of the Cyc project has been on the hand-coding of large amounts of declarative knowledge, there is also a cognitive architecture of sorts there. The center of Cyc is an engine for logical deduction, acting on knowledge represented in predicate logic. A natural language engine has been associated with the logic engine, which enables one to ask English questions and get English replies. Stephen Reed, while an engineer at Cycorp, designed a perceptual-motor front end for Cyc based on James Albus’ Reference Model Architecture; the ensuing system, called Cognitive- Cyc, would have been the first full-fledged cognitive architecture based on Cyc, but was not implemented. Reed left Cycorp and is now building a system called Texai, which has many similarities to Cyc (and relies upon the OpenCyc knowledge base, a subset of Cyc’s overall knowledge base), but incorporates a CognitiveCyc style cognitive architecture. HOUSE_OVERSIGHT_012978
4.2 Symbolic Cognitive Architectures 63 4.2.4 NARS Pei Wang’s NARS logic [Wan06] played a large role in the development of PLN, CogPrime’s uncertain logic component, a relationship that is discussed in depth in [GMIH08] and won’t be re-emphasized here. However, NARS is more than just an uncertain logic, it is also an overall cognitive architecture (which is centered on NARS logic, but also includes other aspects). CogPrime bears little relation to NARS except in the specific similarities between PLN logic and NARS logic, but, the other aspects of NARS are worth briefly recounting here. NARS is formulated as a system for processing tasks, where a task consists of a question or a piece of new knowledge. The architecture is focused on declarative knowledge, but some pieces of knowledge may be associated with executable procedures, which allows NARS to carry out control activities (in roughly the same way that a Prolog program can). At any given time a NARS system contains e working memory: a small set of tasks which are active, kept for a short time, and closely related to new questions and new knowledge e long-term memory: a huge set of knowledge which is passive, kept for a long time, and not necessarily related to current questions and knowledge The working and long term memory spaces of NARS may each be thought of as a set of chunks, where each chunk consists of a set of tasks and a set of knowledge. NARS’s basic cognitive process is: choose a chunk choose a task from that chunk choose a piece of knowledge from that chunk use the task and knowledge to do inference send the new tasks to corresponding chunks Depending on the nature of the task and knowledge, the inference involved may be one of the following: e if the task is a question, and the knowledge happens to be an answer to the question, a copy of the knowledge is generated as a new task backward inference revision (merging two pieces of knowledge with the same form but different truth value) forward inference execution of a procedure associated with a piece of knowledge Unlike many other systems, NARS doesn’t decide what type of inference is used to process a task when the task is accepted, but works in a data-driven way — that is, it is the task and knowledge that dynamically determine what type of inference will be carried out The “choice” processes mentioned above are done via assigning relative priorities to e chunks (where they are called activity) e tasks (where they are called urgency) e knowledge (where they are called importance) HOUSE_OVERSIGHT_012979
64 4 Brief Survey of Cognitive Architectures and then distributing the system’s resources accordingly, based on a probabilistic algorithm. (It’s interesting to note that while NARS uses probability theory as part of its control mecha- nism, the logic it uses to represent its own knowledge about the world is nonprobabilistic. This is considered conceptually consistent, in the context of NARS theory, because system control is viewed as a domain where the system’s knowledge is more complete, thus more amenable to probabilistic reasoning.) 4.2.5 GLAIR and SNePS Another logic-focused cognitive architecture, very different from NARS in detail, is Stuart Shapiro’s GLAIR cognitive architecture, which is centered on the SNePS paraconsistent logic [SEO7]. Like NARS, the core “cognitive loop” of GLAIR is based on reasoning: either thinking about some percept (e.g. linguistic input, or sense data from the virtual or physical world), or answer- ing some question. This inference based cognition process is turned into an intelligent agent control process via coupling it with an acting component, which operates according to a set of policies, each one of which tells the system when to take certain internal or external actions (including internal reasoning actions) in response to its observed internal and external situation. GLAIR contains multiple layers: e the Knowledge Layer (KL), which contains the beliefs of the agent, and is where reasoning, planning, and act selection are performed e the Sensori-Actuator Layer (SAL), contains the controllers of the sensors and effectors of the hardware or software robot. e the Perceptuo-Motor Layer (PML), which grounds the KL symbols in perceptual structures and subconscious actions, contains various registers for providing the agent’s sense of situ- atedness in the environment, and handles translation and communication between the KL and the SAL. The logical Knowledge Layer incorporates multiple memory types using a common represen- tation (including declarative, procedural, episodic, attentional and intentional knowledge, and meta-knowledge). To support this broad range of knowledge types, a broad range of logical in- ference mechanisms are used, so that the KL may be variously viewed as predicate logic based, frame based, semantic network based, or from other perspectives. What makes GLAIR more robust than most logic based AI approaches is the novel para- consistent logical formalism used in the knowledge base, which means (among other things) that uncertain, speculative or erroneous knowledge may exist in the system’s memory without leading the system to create a broadly erroneous view of the world or carry out egregiously unintelligent actions. CogPrime is not thoroughly logic-focused like GLAIR is, but in its logical aspect it seeks a similar robustness through its use of PLN logic, which embodies properties related to paraconsistency. Compared to CogPrime, we see that GLATR. has a similarly integrative approach, but that the integration of different sorts of cognition is done more strictly within the framework of logical knowledge representation. HOUSE_OVERSIGHT_012980
4.3 Emergentist Cognitive Architectures 65 4.3 Emergentist Cognitive Architectures Another species of cognitive architecture expects abstract symbolic processing to emerge from lower-level “subsymbolic” dynamics, which sometimes (but not always) are designed to simu- late neural networks or other aspects of human brain function. These architectures are typically strong at recognizing patterns in high-dimensional data, reinforcement learning and associative memory; but no one has yet shown how to achieve high-level functions such as abstract reason- ing or complex language processing using a purely subsymbolic approach. A few of the more important subsymbolic, emergentist cognitive architectures are: e DeSTIN [ARK09a, ARCO9], which is part of CogPrime, may also be considered as an autonomous AGI architecture, in which case it is emergentist and contains mechanisms to encourage language, high-level reasoning and other abstract aspects of intelligent to emerge from hierarchical pattern recognition and related self-organizing network dynamics. In CogPrime DeSTIN is used as part of a hybrid architecture, which greatly reduces the reliance on DeSTIN’s emergent properties. e Hierarchical Temporal Memory (HTM) [I06] is a hierarchical temporal pattern recognition architecture, presented as both an AI approach and a model of the cortex. So far it has been used exclusively for vision processing and we will discuss its shortcomings later in the context of our treatment of DeSTIN. e SAL [JL08], based on the earlier and related IBCA (Integrated Biologically-based Cog- nitive Architecture) is a large-scale emergent architecture that seeks to model distributed information processing in the brain, especially the posterior and frontal cortex and the hippocampus. So far the architectures in this lineage have been used to simulate various human psychological and psycholinguistic behaviors, but haven’t been shown to give rise to higher-level behaviors like reasoning or subgoaling. e NOMAD (Neurally Organized Mobile Adaptive Device) automata and its successors [KE06] are based on Edelman’s “Neural Darwinism” model of the brain, and feature large numbers of simulated neurons evolving by natural selection into configurations that carry out sensorimotor and categorization tasks. The emergence of higher-level cognition from this approach seems rather unlikely. e Ben Kuipers and his colleagues [MK07, MIX08, MIXO09]have pursued an extremely innovative research program which combines qualitative reasoning and reinforcement learning to enable an intelligent agent to learn how to act, perceive and model the world. Kuipers’ notion of “bootstrap learning” involves allowing the robot to learn almost everything about its world, including for instance the structure of 3D space and other things that humans and other animals obtain via their genetic endowments. Compared to Kuipers’ approach, CogPrime falls in line with most other approaches which provide more “hard-wired” structure, following the analogy to biological organisms that are born with more innate biases. There is also a set of emergentist architectures focused specifically on developmental robotics, which we will review below in a separate subsection, as all of these share certain common characteristics. Our general perspective on the emergentist approach is that it is philosophically correct but currently pragmatically inadequate. Eventually, some emergentist approach could surely succeed at giving rise to humanlike general intelligence — the human brain, after all, is plainly an emergentist system. However, we currently lack understanding of how the brain gives rise to abstract reasoning and complex language, and none of the existing emergentist systems HOUSE_OVERSIGHT_012981
66 4 Brief Survey of Cognitive Architectures seem remotely capable of giving rise to such phenomena. It seems to us that the creation o a successful emergentist AGI will have to wait for either a detailed understanding of how the brain gives rise to abstract thought, or a much more thorough mathematical understanding o the dynamics of complex self-organizing systems. The concept of cognitive synergy is more relevant to emergentist than to symbolic archi- tectures. In a complex emergentist architecture with multiple specialized components, much o the emergence is expected to arise via synergy between different richly interacting components. Symbolic systems, at least in the forms currently seen in the literature, seem less likely to give rise to cognitive synergy as their dynamics tend to be simpler. And hybrid systems, as we shal see, are somewhat diverse in this regard: some rely heavily on cognitive synergies and others consist of more loosely coupled components. We now review the DeSTIN emergentist architecture in more detail, and then turn to the developmental robotics architectures. 4.3.1 DeSTIN: A Deep Reinforcement Learning Approach to AGI The DeSTIN architecture, created by Itamar Arel and his colleagues, addresses the problem of general intelligence using hierarchical spatiotemporal networks designed to enable scalable perception, state inference and reinforcement-learning-guided action in real-world environments. DeSTIN has been developed with the plan of gradually extending it into a complete system for humanoid robot control, founded on the same qualitative information-processing principles as the human brain (though without striving for detailed biological realism). However, the practical work with DeSTIN to date has focused on visual and auditory processing; and in the context of the present proposal, the intention is to utilize DeSTIN for perception and actuation oriented processing, hybridizing it with CogPrime which will handle abstract cognition and language. Here we will discuss DeSTIN primarily in the perception context, only briefly mentioning the application to actuation which is conceptually similar. In DeSTIN (see Figure 4.4), perception is carried out by a deep spatiotemporal inference network, which is connected to a similarly architected critic network that provides feedback on the inference network’s performance, and an action network that controls actuators based on the activity in the inference network (Figure 4.5 depicts a standard action hierarchy, of which the hierarchy in DeSTIN is an example). The nodes in these networks perform probabilistic pattern recognition according to algorithms to be described below; and the nodes in each of the networks may receive states of nodes in the other networks as inputs, providing rich interconnectivity and synergetic dynamics. 4.3.1.1 Deep versus Shallow Learning for Perceptual Data Processing The most critical feature of DeSTIN is its uniquely robust approach to modeling the world based on perceptual data. Mimicking the efficiency and robustness by which the human brain analyzes and represents information has been a core challenge in AI research for decades. For instance, humans are exposed to massive amounts of visual and auditory data every second of every day, and are somehow able to capture critical aspects of it in a way that allows for appropriate future recollection and action selection. For decades, it has been known that the HOUSE_OVERSIGHT_012982
4.3 Emergentist Cognitive Architectures 67 Inferred state Action/ Correction | J -—— Critic “< : ~ h A Deep y Learning System Actor Pa Ee. (state inference) — A Actions Observations Rewards Environment ] Fig. 4.4: High-level architecture of DeSTIN brain is a massively parallel fabric, in which computation processes and memory storage are highly distributed. But massive parallelism is not in itself a solution — one also needs the right architecture; which DeSTIN provides, building on prior work in the area of deep learning. Humanlike intelligence is heavily adapted to the physical environments in which humans evolved; and one key aspect of sensory data coming from our physical environments is its hierarchical structure. However, most machine learning and pattern recognition systems are “shallow” in structure, not explicitly incorporating the hierarchical structure of the world in their architecture. In the context of perceptual data processing, the practical result of this is the need to couple each shallow learner with a pre-processing stage, wherein high-dimensional sensory signals are reduced to a lower-dimension feature space that can be understood by the shallow learner. The hierarchical structure of the world is thus crudely captured in the hierarchy of “preprocessor plus shallow learner.” In this sort of approach, much of the intelligence of the system shifts to the feature extraction process, which is often imperfect and always application- domain specific. Deep machine learning has emerged as a more promising framework for dealing with complex, high-dimensional real-world data. Deep learning systems possess a hierarchical structure that intrinsically biases them to recognize the hierarchical patterns present in real-world data. Thus, they hierarchically form a feature space that is driven by regularities in the observations, rather than by hand-crafted techniques. They also offer robustness to many of the distortions and transformations that characterize real-world signals, such as noise, displacement, scaling, etc. Deep belief networks [HOTO06] and Convolutional Neural Networks [LBDE90] have been demonstrated to successfully address pattern inference in high dimensional data (e.g. images). They owe their success to their underlying paradigm of partitioning large data structures into smaller, more manageable units, and discovering the dependencies that may or may not exist HOUSE_OVERSIGHT_012983
68 4 Brief Survey of Cognitive Architectures Hierarchical control system | sensor | | actuator | sensor actuator sensations] actions sensations actions y ¥ Controlled system, controlled process, or environment \ ' | Fig. 4.5: A standard, general-purpose hierarchical control architecture. DeSTIN’s control hi- erarchy exemplifies this architecture, with the difference lying mainly in the DeSTIN control hierarchy’s tight integration with the state inference (perception) and critic (reinforcement) hierarchies. between such units. However, this paradigm has its limitations; for instance, these approaches do not represent temporal information with the same ease as spatial structure. Moreover, some key constraints are imposed on the learning schemes driving these architectures, namely the need for layer-by-layer training, and oftentimes pre-training. DeSTIN overcomes the limitations of prior deep learning approaches to perception processing, and also extends beyond perception to action and reinforcement learning. 4.3.1.2 DeSTIN for Perception Processing The hierarchical architecture of DeSTIN’s spatiotemporal inference network comprises an ar- rangement into multiple layers of “nodes” comprising multiple instantiations of an identical cortical circuit. Each node corresponds to a particular spatiotemporal region, and uses a sta- tistical learning algorithm to characterize the sequences of patterns that are presented to it by nodes in the layer beneath it. More specifically, e At the very lowest layer of the hierarchy nodes receive as input raw data (e.g. pixels of an image) and continuously construct a belief state that attempts to characterize the sequences of patterns viewed. HOUSE_OVERSIGHT_012984
4.3 Emergentist Cognitive Architectures 69 e The second layer, and all those above it, receive as input the belief states of nodes at their corresponding lower layers, and attempt to construct belief states that capture regularities in their inputs. e Each node also receives as input the belief state of the node above it in the hierarchy (which constitutes “contextual” information) Feedback (contextual) P(s*|S) signals P(O|S") P(Ss"|S.C) P(s"|5,C) P(O|s’) # ™ P(O]S’) P(S"| S.C) P(S']5.C) P(S']5,C) P(S'|S,C) P(O|S*) P(O|S*) P(O!|S*) P(O]S’) Observation (e.g. 32x32 image) Fig. 4.6: Small-scale instantiation of the DeSTIN perceptual hierarchy. Each box represents a node, which corresponds to a spatiotemporal region (nodes higher in the hierarchy corresponding to larger regions). O denotes the current observation in the region, C is the state of the higher- layer node, and S and S’ denote state variables pertaining to two subsequent time steps. In each node, a statistical learning algorithm is used to predict subsequent states based on prior states, current observations, and the state of the higher-layer node. More specifically, each of the DeSTIN nodes, referring to a specific spacetime region, contains a set of state variables conceived as clusters, each corresponding to a set of previously-observed sequences of events. These clusters are characterized by centroids (and are hence assumed roughly spherical in shape), and each of them comprises a certain "spatiotemporal form" recog- nized by the system in that region. Each node then contains the task of predicting the likelihood of a certain centroid being most apropos in the near future, based on the past history of ob- servations in the node. This prediction may be done by simple probability tabulation, or via HOUSE_OVERSIGHT_012985
70 4 Brief Survey of Cognitive Architectures application of supervised learning algorithms such as recurrent neural networks. These cluster- ing and prediction processes occur separately in each node, but the nodes are linked together via bidirectional dynamics: each node feeds input to its parents, and receives "advice" from its parents that is used to condition its probability calculations in a contextual way. These processes are executed formally by the following basic belief update rule, which governs the learning process and is identical for every node in the architecture. The belief state is a probability mass function over the sequences of stimuli that the nodes learns to represent. Consequently, each node is allocated a predefined number of state variables each denoting a dynamic pattern, or sequence, that is autonomously learned. The DeSTIN update rule maps the current observation (o), belief state (b), and the belief state of a higher-layer node or context (c), to a new (updated) belief state (0’), such that Pr(s’MN on bne) Pr(onbne) b’ (s’) = Pr (s’|o, b,c) = ; (4.1) alternatively expressed as _ Pr(ols’, b, c) Pr (s'|6, c) Pr (6, c) (8) = Pr (06, ce) Pr (6, c} (4.2) Under the assumption that observations depend only on the true state, or Pr(ols’, b,c) = Pr(o|s’), we can further simplify the expression such that __ Pr(o|s’) Pr (s’|b, c) / i = A, u (s) = Pe (43) where Pr (s‘|b,c) = >> Pr (s‘|s,c)b(s), yielding the belief update rule ses Pr (o|s’) $> Pr (s'|s, c) 6(s) Pf oh ses 0) = Prop") Pre sco) Ad) sles ses where S denotes the sequence set (i.e. belief dimension) such that the denominator term is a normalization factor. One interpretation of eq. (4.4) would be that the static pattern similarity metric, Pr (o|s’), is modulated by a construct that reflects the system dynamics, Pr (s’|s,c). As such, the belief state inherently captures both spatial and temporal information. In our implementation, the belief state of the parent node, c, is chosen using the selection rule c= argmaxb,(s), 4.5 Op where 6, is the belief distribution of the parent node. A close look at eq. (4.4) reveals that there are two core constructs to be learned, Pr(o|s’) and Pr(s‘|s,c). In the current DeSTIN design, the former is learned via online clustering while the latter is learned based on experience by inductively learning a rule that predicts the next state s’ given the prior state s and c. The overall result is a robust framework that autonomously (i.e. with no human engineered pre-processing of any type) learns to represent complex data patterns, and thus serves the HOUSE_OVERSIGHT_012986
4.3 Emergentist Cognitive Architectures 7 critical role of building and maintaining a model of the state of the world. In a vision processing context, for example, it allows for powerful unsupervised classification. If shown a variety o real-world scenes, it will automatically form internal structures corresponding to the various natural categories of objects shown in the scenes, such as trees, chairs, people, etc.; and also the various natural categories of events it sees, such as reaching, pointing, falling. And, as wil be discussed below, it can use feedback from DeSTIN’s action and critic networks to further shape its internal world-representation based on reinforcement signals. Benefits of DeSTIN for Perception Processing DeSTIN’s perceptual network offers multiple key attributes that render it more powerful than other deep machine learning approaches to sensory data processing: 1. The belief space that is formed across the layers of the perceptual network inherently captures both spatial and temporal regularities in the data. Given that many applications require that temporal information be discovered for robust inference, this is a key advantage over existing schemes. 2. Spatiotemporal regularities in the observations are captured in a coherent manner (rather than being represented via two separate mechanisms) 3. All processing is both top-down and bottom-up, and both hierarchical and heterarchical, based on nonlinear feedback connections directing activity and modulating learning in mul- tiple directions through DeSTIN’s cortical circuits 4, Support for multi-modal fusing is intrinsic within the framework, yielding a powerful state inference system for real-world, partially-observable settings. 5. Each node is identical, which makes it easy to map the design to massively parallel platforms, such as graphics processing units. Points 2-4 in the above list describe how DeSTIN’s perceptual network displays its own “cognitive synergy” in a way that fits naturally into the overall synergetic dynamics of the overall CogPrime architecture. Using this cognitive synergy, DeSTIN’s perceptual network addresses a key aspect of general intelligence: the ability to robustly infer the state of the world, with which the system interacts, in an accurate and timely manner. 4.3.1.3 DeSTIN for Action and Control DeSTIN’s perceptual network performs unsupervised world-modeling, which is a critical aspect of intelligence but of course is not the whole story. DeSTIN’s action network, coupled with the perceptual network, orchestrates actuator commands into complex movements, but also carries out other functions that are more cognitive in nature. For instance, people learn to distinguish between cups and bowls in part via hearing other people describe some objects as cups and others as bowls. To emulate this kind of learning, DeSTIN’s critic network provides positive or negative reinforcement signals based on whether the action network has correctly identified a given object as a cup or a bowl, and this signal then impacts the nodes in the action network. The critic network takes a simple external “degree of success or failure” signal and turns it into multiple reinforcement signals to be fed into the multiple layers of the action network. The result is that the action network self-organizes so HOUSE_OVERSIGHT_012987
72 4 Brief Survey of Cognitive Architectures as to include an implicit “cup versus bow!” classifier, whose inputs are the outputs of some of the nodes in the higher levels of the perceptual network. This classifier belongs in the action network because it is part of the procedure by which the DeSTIN system carries out the action of identifying an object as a cup or a bowl. This example illustrates how the learning of complex concepts and procedures is divided fluidly between the perceptual network, which builds a model of the world in an unsupervised way, and the action network, which learns how to respond to the world in a manner that will receive positive reinforcement from the critic network. 4.3.2 Developmental Robotics Architectures A particular subset of emergentist cognitive architectures are sufficiently important that we consider them separately here: these are developmental robotics architectures, focused on con- trolling robots without significant “hard-wiring” of knowledge or capabilities, allowing robots to learn (and learn how to learn, etc.) via their engagement with the world. A significant focus is often placed here on “intrinsic motivation,” wherein the robot explores the world guided by internal goals like novelty or curiosity, forming a model of the world as it goes along, based on the modeling requirements implied by its goals. Many of the foundations of this research area were laid by Juergen Schmidhuber’s work in the 1990s [Sch91b, Sch9la, Sch95, Sch02], but now with more powerful computers and robots the area is leading to more impressive practical demonstrations. We mention here a handful of the important initiatives in this area: e Juyang Weng’s Dav [IIZT* 02] and SATL [WIZ* 00] projects involve mobile robots that explore their environments autonomously, and learn to carry out simple tasks by building up their own world-representations through both unsupervised and teacher-driven processing of high-dimensional sensorimotor data. The underlying philosophy is based on human child development [WT106], the knowledge representations involved are neural network based, and a number of novel learning algorithms are involved, especially in the area of vision processing. e FLOWERS [BO09], an initiative at the French research institute INRIA, led by Pierre- Yves Oudeyer, is also based on a principle of trying to reconstruct the processes of devel- opment of the human child’s mind, spontaneously driven by intrinsic motivations. Kaplan [Kap08] has taken this project in a direction closely related to our own via the creation of a “robot playroom.” Experiential language learning has also been a focus of the project [OK06], driven by innovations in speech understanding. e IM-CLEVER|, a new European project coordinated by Gianluca Baldassarre and con- ducted by a large team of researchers at different institutions, is focused on creating software enabling an iCub [MSV ~ 08] humanoid robot to explore the environment and learn to carry out human childlike behaviors based on its own intrinsic motivations. As this project is the closest to our own we will discuss it in more depth below. Like CogPrime, IM-CLEVER is a humanoid robot intelligence architecture guided by intrin- sic motivations, and using hierarchical architectures for reinforcement learning and sensory ab- lhttp://im-clever.noze.it/project /project-description HOUSE_OVERSIGHT_012988
4.4 Hybrid Cognitive Architectures 73 straction. IM-CLEVER’s motivational structure is based in part on Schmidhuber’s information- theoretic model of curiosity [Sch06]; and CogPrime’s Psi-based motivational structure utilizes probabilistic measures of novelty, which are mathematically related to Schmidhuber’s mea- sures. On the other hand, IM-CLEVER’s use of reinforcement learning follows Schmidhuber’s earlier work RL for cognitive robotics [BS04, BZGS06], Barto’s work on intrinsically motivated reinforcement learning [SB06, SM05], and Lee’s [LMC07b, LMCO07a] work on developmental reinforcement learning; whereas CogPrime’s assemblage of learning algorithms is more diverse, including probabilistic logic, concept blending and other symbolic methods (in the OCP compo- nent) as well as more conventional reinforcement learning methods (in the DeSTIN component). In many respects IM-CLEVER bears a moderately strong resemblance to DeSTIN, whose integration with CogPrime is discussed in Chapter 26 of Part 2 (although IM-CLEVER has much more focus on biological realism than DeSTIN). Apart from numerous technical differ- ences, the really big distinction between IM-CLEVER and CogPrime is that in the latter we are proposing to hybridize a hierarchical-abstraction/reinforcement-learning system (such as DeSTIN) with a more abstract symbolic cognition engine that explicitly handles probabilistic logic and language. IM-CLEVER lacks the aspect of hybridization with a symbolic system, tak- ing more of a pure emergentist strategy. Like DeSTIN considered as a standalone architecture IM-CLEVER does entail a high degree of cognitive synergy, between components dealing with perception, world-modeling, action and motivation. However, the “emergentist versus hybrid” is a large qualitative difference between the two approaches. In all, while we largely agree with the philosophy underlying developmental robotics, our intuition is that the learning and representational mechanisms underlying the current systems in this area are probably not powerful enough to lead to human child level intelligence. We expect that these systems will develop interesting behaviors but fall short of robust preschool level competency, especially in areas like language and reasoning where symbolic systems have typically proved more effective. This intuition is what impels us to pursue a hybrid approach, such as CogPrime. But we do feel that eventually, once the mechanisms underlying brains are better understood and robotic bodies are richer in sensation and more adept in actuation, some sort of emergentist, developmental-robotics approach can be successful at creating humanlike, human-level AGI. 4.4 Hybrid Cognitive Architectures In response to the complementary strengths and weaknesses of the symbolic and emergentist approaches, in recent years a number of researchers have turned to integrative, hybrid archi- tectures, which combine subsystems operating according to the two different paradigms. The combination may be done in many different ways, e.g. connection of a large symbolic subsystem with a large subsymbolic system, or the creation of a population of small agents each of which is both symbolic and subsymbolic in nature. Nils Nilsson expressed the motivation for hybrid AGI systems very clearly in his article at the AI-50 conference (which celebrated the 50’th anniversary of the AI field) [Nil09]. While affirming the value of the Physical Symbol System Hypothesis that underlies symbolic AI, he argues that “the PSSH explicitly assumes that, whenever necessary, symbols will be grounded in objects in the environment through the perceptual and effector capabilities of a physical symbol system.” Thus, he continues, HOUSE_OVERSIGHT_012989
74 4 Brief Survey of Cognitive Architectures “T grant the need for non-symbolic processes in some intelligent systems, but I think they sup- plement rather than replace symbol systems. I know of no examples of reasoning, understanding language, or generating complex plans that are best understood as being performed by systems using exclusively non-symbolic processes.... Al systems that achieve human-level intelligence will involve a combination of symbolic and non-symbolic processing.” A few of the more important hybrid cognitive architectures are: ¢ CLARION [SZ04] is a hybrid architecture that combines a symbolic component for reason- ing on “explicit knowledge” with a connectionist component for managing “implicit knowl- edge.” Learning of implicit knowledge may be done via neural net, reinforcement learning, or other methods. The integration of symbolic and subsymbolic methods is powerful, but a great deal is still missing such as episodic knowledge and learning and creativity. Learning in the symbolic and subsymbolic portions is carried out separately rather than dynamically coupled, minimizing “cognitive synergy” effects. e DUAL [NK04] is the most impressive system to come out of Marvin Minsky’s “Society of Mind” paradigm. It features a population of agents, each of which combines symbolic and connectionist representation, self-organizing to collectively carry out tasks such as percep- tion, analogy and associative memory. The approach seems innovative and promising, but it is unclear how the approach will scale to high-dimensional data or complex reasoning problems due to the lack of a more structured high-level cognitive architecture. e LIDA [BF09] is a comprehensive cognitive architecture heavily based on Bernard Baars’ “Global Workspace Theory”. It articulates a “cognitive cycle” integrating various forms of memory and intelligent processing in a single processing loop. The architecture ties in well with both neuroscience and cognitive psychology, but it deals most thoroughly with “lower level” aspects of intelligence, handling more advanced aspects like language and reasoning only somewhat sketchily. There is a clear mapping between LIDA structures and processes and corresponding structures and processing in OCP; so that it’s only a mild stretch to view CogPrime as an instantiation of the general LIDA approach that extends further both in the lower level (to enable robot action and sensation via DeSTIN) and the higher level (to enable advanced language and reasoning via OCP mechanisms that have no direct LIDA analogues). e MicroPsi [Bac(9] is an integrative architecture based on Dietrich Dorner’s Psi model of mo- tivation, emotion and intelligence. It has been tested on some practical control applications, and also on simulating artificial agents in a simple virtual world. MicroPsi’s comprehen- siveness and basis in neuroscience and psychology are impressive, but in the current version of MicroPsi, learning and reasoning are carried out by algorithms that seem unlikely to scale. OCP incorporates the Psi model for motivation and emotion, so that MicroPsi and CogPrime may be considered very closely related systems. But similar to LIDA, MicroPsi currently focuses on the “lower level” aspects of intelligence, not yet directly handling ad- vanced processes like language and abstract reasoning. e PolyScheme [Cas07] integrates multiple methods of representation, reasoning and infer- ence schemes for general problem solving. Each Polyscheme “specialist” models a different aspect of the world using specific representation and inference techniques, interacting with other specialists and learning from them. Polyscheme has been used to model infant rea- soning including object identity, events, causality, and spatial relations. The integration of HOUSE_OVERSIGHT_012990
4.4 Hybrid Cognitive Architectures 75 reasoning methods is powerful, but the overall cognitive architecture is simplistic compared to other systems and seems focused more on problem-solving than on the broader problem of intelligent agent control. e Shruti [SA93] is a fascinating biologically-inspired model of human reflexive inference, which represents in connectionist architecture relations, types, entities and causal rules using focal-clusters. However, much like Hofstadter’s earlier Copycat architecture [Hof95], Shruti seems more interesting as a prototype exploration of ideas than as a practical AGI system; at least, after a significant time of development it has not proved significantly effective in any applications e James Albus’s 4D/RCS robotics architecture shares a great deal with some of the emer- gentist architectures discussed above, e.g. it has the same hierarchical pattern recognition structure as DeSTIN and HTM, and the same three cross-connected hierarchies as DeSTIN, and shares with the developmental robotics architectures a focus on real-time adaptation to the structure of the world. However, 4D/RCS is not foundationally learning-based but relies on hard-wired architecture and algorithms, intended to mimic the qualitative structure of relevant parts of the brain (and intended to be augmented by learning, which differentiates it from emergentist approaches. As our own CogPrime approach is a hybrid architecture, it will come as no surprise that we believe several of the existing hybrid architectures are fundamentally going in the right direction. However, nearly all the existing hybrid architectures have severe shortcomings which we feel will prevent them from achieving robust humanlike AGI. Many of the hybrid architectures are in essence “multiple, disparate algorithms carrying out separate functions, encapsulated in black boxes and communicating results with each other.” For instance, PolyScheme, ACT-R and CLARION all display this “modularity” property to a significant extent. These architectures lack the rich, real-time interaction between the internal dynamics of various memory and learning processes that we believe is critical to achieving humanlike general intelligence using realistic computational resources. On the other hand, those architectures that feature richer integration — such as DUAL, Shruti, LIDA and MicroPsi— have the flaw of relying (at least in their current versions) on overly simplistic learning algorithms, which drastically limits their scalability. It does seem plausible to us that some of these hybrid architectures could be dramatically extended or modified so as to produce humanlike general intelligence. For instance, one could replace LIDA’s learning algorithms with others that interrelate with each other in a nuanced synergetic way; or one could replace MicroPsi’s simple learning and reasoning methods with much more powerful and scalable ones acting on the same data structures. However, making these changes would dramatically alter the cognitive architectures in question on multiple levels. 4.4.1 Neural versus Symbolic; Global versus Local The “symbolic versus emergentist” dichotomy that we have used to structure our review of cogni- tive architectures is not absolute nor fully precisely defined; it is more of a heuristic distinction. In this section, before plunging into the details of particular hybrid cognitive architectures, we review two other related dichotomies that are useful for understanding hybrid systems: neural versus symbolic systems, and globalist versus localist knowledge representation. HOUSE_OVERSIGHT_012991
76 4 Brief Survey of Cognitive Architectures 4.4.1.1 Neural-Symbolic Integration The distinction between neural and symbolic systems has gotten fuzzier and fuzzier in recent years, with developments such as e Logic-based systems being used to control embodied agents (hence using logical terms to deal with data that is apparently perception or actuation-oriented in nature, rather than being symbolic in the semiotic sense), see [SSO3a] and [GMIT08]. e Hybrid systems combining neural net and logical parts, or using logical or neural net com- ponents interchangeably in the same role [L Aon]. e Neural net systems being used for strongly symbolic tasks such as automated grammar learning ([Elm91], [Elm91], plus more recent work.) Figure 4.7 presents a schematic diagram of a generic neural-symbolic system, generalizing from [BI05], a paper that gives an elegant categorization of neural-symbolic AI systems. Figure 4.8 depicts several broad categories of neural-symbolic architecture. : “ F “ Interaction Interaction F Representation | ; Symbolic > Neural - Learning (Localist) (Globalist) ) Learning ~ System ‘ System Zz Fig. 4.7: Generic neural-symbolic architecture Bader and Hitzler categorize neural-symbolic systems according to three orthogonal axes: interrelation, language and usage. “Language” refers to the type of language used in the symbolic component, which may be logical, automata-based, formal grammar-based, etc. “Usage” refers to the purpose to which the neural-symbolic interrelation is put. We tend to use “learning” as an encompassing term for all forms of ongoing knowledge-creation, whereas Bader and Hitzler distinguish learning from reasoning. Of Bader and Hitzler’s three axes the one that interests us most here is “interrelation”, which refers to the way the neural and symbolic components of the architecture intersect with each other. They distinguish “hybrid” architectures which contain separate but equal, interacting neural and symbolic components; versus “integrative” architectures in which the symbolic com- ponent essentially rides piggyback on the neural component, extracting information from it and helping it carry out its learning, but playing a clearly derived and secondary role. We prefer Sun’s (2001) term “monolithic” to Bader and Hitzler’s “integrative” to describe this type of system, as the latter term seems best preserved in its broader meaning. HOUSE_OVERSIGHT_012992
4.4 Hybrid Cognitive Architectures 77 Monolithic:symbolic component "sits on top of" neural component and World < > Neural | a, = | Symbolic Hybrid:neural and symbolic components confront the world side by side a > Neural A World ¥ « » | Symbolic | Tightly interactive hybrid:neural and symbolic components interact frequently, on the same time scale as their internal learning operations Fig. 4.8: Broad categories of neural-symbolic architecture Within the scope of hybrid neural-symbolic systems, there is another axis which Bader and Hitzler do not focus on, because the main interest of their review is in monolithic systems. We call this axis "interactivity"’, and what we are referring to is the frequency of high-information- content, high-influence interaction between the neural and symbolic components in the hybrid system. In a low-interaction hybrid system, the neural and symbolic components don’t exchange large amounts of mutually influential information all that frequently, and basically act like independent system components that do their learning/reasoning /thinking periodically sending each other their conclusions. In some cases, interaction may be asymmetric: one component may frequently send a lot of influential information to the other, but not vice versa. However, our hypothesis is that the most capable neural-symbolic systems are going to be the symmetrically highly interactive ones. In a symmetric high-interaction hybrid neural-symbolic system, the neural and symbolic components exchange influential information sufficiently frequently that each one plays a major role in the other one’s learning /reasoning/thinking processes. Thus, the learning processes of each component must be considered as part of the overall dynamic of the hybrid system. The two components aren’t just feeding their outputs to each other as inputs, they’re mutually guiding each others’ internal processing. One can make a speculative argument for the relevance of this kind of architecture to neuro- science. It seems plausible that this kind of neural-symbolic system roughly emulates the kind of interaction that exists between the brain’s neural subsystems implementing localist symbolic processing, and the brain’s neural subsystems implementing globalist, classically “connection- ist” processing. It seems most likely that, in the brain, symbolic functionality emerges from an underlying layer of neural dynamics. However, it is also reasonable to conjecture that this symbolic functionality is confined to a functionally distinct subsystem of the brain, which then HOUSE_OVERSIGHT_012993
78 4 Brief Survey of Cognitive Architectures interacts with other subsystems in the brain much in the manner that the symbolic and neural components of a symmetric high-interaction neural-symbolic system interact. Neuroscience speculations aside, however, our key conjecture regarding neural-symbolic in- tegration is that this sort of neural-symbolic system presents a promising direction for artificial general intelligence research. In Chapter 26 of Volume 2 we will give a more concrete idea of what asymmetric high-interaction hybrid neural-symbolic architecture might look like, explor- ing the potential for this sort of hybridization between the OpenCogPrime AGI architecture (which is heavily symbolic in nature) and hierarchical attractor neural net based architectures such as DeSTIN. 4.5 Globalist versus Localist Representations Another interesting distinction, related to but different from “symbolic versus emergentist” and “neural versus symbolic”, may be drawn between cognitive systems (or subsystems) where memory is essentially global, and those where memory is essentially local. In this section we will pursue this distinction in various guises, along with the less familiar notion of glocal memory. This globalist/localist distinction is most easily conceptualized by reference to memories corresponding to categories of entities or events in an external environment. In an AI system that has an internal notion of “activation” — i.e. in which some of its internal elements are more active than others, at any given point in time — one can define the internal image of an external event or entity as the fuzzy set of internal elements that tend to be active when that event or entity is presented to the system’s sensors. If one has a particular set S of external entities or events of interest, then, the degree of memory localization of such an AI system relative to S may be conceived as the percentage of the system’s internal elements that have a high degree of membership in the internal image of an average element of S. Of course, this characterization of localization has its limitations, such as the possibility of ambiguity regarding what are the “system elements” of a given AI system; and the exclusive focus on internal images of external phenomena rather than representation of internal abstract concepts. However, our goal here is not to formulate an ultimate, rigorous and thorough ontology of memory systems, but only to pose a “rough and ready” categorization so as to properly frame our discussion of some specific AGI issues relevant to CogPrime. Clearly the ideas pursued here will benefit from further theoretical exploration and elaboration. In this sense, a Hopfield neural net [Ami89] would be considered “globalist” since it has a low degree of memory localization (most internal images heavily involve a large number of system elements); whereas Cyc would be considered “localist” as it has a very high degree of memory localization (most internal images are heavily focused on a small set of system elements). However, although Hopfield nets and Cyc form handy examples, the “globalist vs. localist” distinction as described above is not identical to the “neural vs. symbolic” distinction. For it is in principle quite possible to create localist systems using formal neurons, and also to create globalist systems using formal logic. And “globalist-localist” is not quite identical to “symbolic vs emergentist” either, because the latter is about coordinated system dynamics and behavior not just about knowledge representation. CogPrime combines both symbolic and (loosely) neural representations, and also combines globalist and localist representations in a way that we will call “glocal” and analyze more deeply in Chapter 13; but there are many other ways these various HOUSE_OVERSIGHT_012994
4.5 Globalist versus Localist Representations 79 properties could be manifested by AI systems. Rigorously studying the corpus of existing (or hypothetical!) cognitive architectures using these ideas would be a large task, which we do not undertake here. In the next sections we review several hybrid architectures in more detail, focusing most deeply on LIDA and MicroPsi which have been directly inspirational for CogPrime. 4.5.1 CLARION Ron Sun’s CLARION architecture (see Figure 4.9) is interesting in its combination of symbolic and neural aspects — a combination that is used in a sophisticated way to embody the distinction and interaction between implicit and explicit mental processes. From a CLARION perspective, architectures like Soar and ACT-R are severely limited in that they deal only with explicit knowledge and associated learning processes. CLARION consists of a number of distinct subsystems, each of which contains a dual rep- resentational structure, including a “rules and chunks” symbolic knowledge store somewhat similar to ACT-R, and a neural net knowledge store embodying implicit knowledge. The main subsystems are: e An action-centered subsystem to control actions; e A non-action-centered subsystem to maintain general knowledge; e A motivational subsystem to provide underlying motivations for perception, action, and cognition; e A meta-cognitive subsystem to monitor, direct, and modify the operations of all the other subsystems. Top Level action-centered explicit representation action-centered-implicit non- action-centered representsation implicit representation non-action-centered | =| explicit representation oe | | | ( | Bottom Level Fig. 4.9: The CLARION cognitive architecture. HOUSE_OVERSIGHT_012995
80 4 Brief Survey of Cognitive Architectures 4.5.2 The Society of Mind and the Emotion Machine In his influential but controversial book The Society of Mind [Min&8], Marvin Minsky described a model of human intelligence as something that is built up from the interactions of numerous simple agents. He spells out in great detail how various particular cognitive functions may be achieved via agents and their interactions. He leaves no room for any central algorithms or structures of thought, famously arguing: “What magical trick makes us intelligent? The trick is that there is no trick. The power of intelligence stems from our vast diversity, not from any single, perfect principle.” This perspective was extended in the more recent work The Emotion Machine [Min07], where Minsky argued that emotions are “ways to think” evolved to handle different “problem types” that exist in the world. The brain is posited to have rule-based mechanisms (selectors) that turns on emotions to deal with various problems. Overall, both of these works serve better as works of speculative cognitive science than as works of AI or cognitive architecture per se. As neurologist Richard Restak said in his review of Emotion Machine, “Minsky does a marvelous job parsing other complicated mental activities into simpler elements. ... But he is less effective in relating these emotional functions to what’s going on in the brain.” As Restak added, he is also not so effective at relating these emotional functions to straightforwardly implementable algorithms or data structures. Push Singh, in his PhD thesis and followup work [SBC05], did the best job so far of creating a concrete AI design based on Minsky’s ideas. While Singh’s system was certainly interesting, it was also noteworthy for its lack of any learning mechanisms, and its exclusive focus on explicit rather than implicit knowledge. Due to Singh’s tragic death, his work was never brought anywhere near completion. It seems fair to say that there has not yet been a serious cognitive architecture posed based closely on Minsky’s ideas. 4.5.3 DUAL The closest thing to a Minsky-ish cognitive architecture is probably DUAL, which takes the Society of Mind concept and adds to it a number of other interesting ideas. DUAL integrates symbolic and connectionist approaches at a deeper level than CLARION, and has been used to model various cognitive functions such as perception, analogy and judgment. Computations in DUAL emerge from the self-organized interaction of many micro-agents, each of which is a hybrid symbolic/connectionist device. Each DUAL agent plays the role of a neural network node, with an activation level and activation spreading dynamics; but also plays the role of a symbol, manipulated using formal rules. The agents exchange messages and activation via links that can be learned and modified, and they form coalitions which collectively represent concepts, episodes, and facts. The structure of the model is sketchily depicted in Figure 4.10, which covers the application of DUAL to a toy environment called TextWorld. The visual input corresponding to a stim- ulus is presented on a two-dimensional visual array representing the front end of the system. Perceptual primitives like blobs and terminations are immediately generated by cheap parallel computations. Attention is controlled at each time by an object which allocates it selectively to some area of the stimulus. A detailed symbolic representation is constructed for this area which tends to fade away as attention is withdrawn from it and allocated to another one. Cate- HOUSE_OVERSIGHT_012996
4.5 Globalist versus Localist Representations 81 gorization of visual memory contents takes place by retrieving object and scene categories from DUAL’s semantic memory and mapping them onto current visual memory representations. DUAL coalihon DUAL Semantic Memory / f tworecHumin text == i, title ~~ feft column right cole — 1 he . » * ” Fig. 4.10: The three main components of the DUAL model: the retinotopic visual array (RVA), the visual working memory (VWM) and DUAL’s semantic memory. Attention is allocated to an area of the visual array by the object in VWM controlling attention, while scene and object categories corresponding to the contents of VWM are retrieved from the semantic memory. In principle the DUAL framework seems quite powerful; using the language of CogPrime, however, it seems to us that the learning mechanisms of DUAL have not been formulated in such a way as to give rise to powerful, scalable cognitive synergy. It would likely be possible to create very powerful AGI systems within DUAL, and perhaps some very CogPrime -like systems as well. But the systems that have been created or designed for use within DUAL so far seem not to be that powerful in their potential or scope. 4.5.4 4D/RCS In a rather different direction, James Albus, while at the National Bureau of Standards, de- veloped a very thorough and impressive architecture for intelligent robotics called 4D/RCS, which was implemented in a number of machines including unmanned automated vehicles. This architecture lacks critical aspects of intelligence such as learning and creativity, but combines perception, action, planning and world-modeling in a highly effective and tightly-integrated fashion. The architecture has three hierarchies of memory/processing units: one for perception, one for action and one for modeling and guidance. Each unit has a certain spatiotemporal scope, HOUSE_OVERSIGHT_012997
82 4 Brief Survey of Cognitive Architectures and (except for the lowest level) supervenes over children whose spatiotemporal scope is a sub- set of its own. The action hierarchy takes care of decomposing tasks into subtasks; whereas the sensation hierarchy takes care of grouping signals into entities and events. The modeling /guid- ance hierarchy mediates interactions between perception and action based on its understanding of the world and the system’s goals. In his book [AMO01] Albus describes methods for extending 4D/RCS into a complete cognitive architecture, but these extensions have not been elaborated in full detail nor implemented. Plans for next 24 hours Daily schedule of tasks 500 km maps jE wo [8G |" SURROGATE BATTALION Plans for next 2 hours 50 km maps SP | war |BG | SURROGATE PLATOON Tasks relative to distant objects S a = 5 km maps SP [wa |8¢ | suRROGATE SECTION Plans for next 10 minutes Tasks relative to nearby objects 500 m maps SP | wM|BG ] veHIcLe Plans for next 50 seconds Task to be done on objects of attention Locomotion = SUBSYSTEM SP fea on Subitask on object surface Obstacle-free paths acl — [s@fwm|se]— — OPERATOR INTERFACE 0.5 second plans Steering, speed 0.05 second plans ' Actuator output SENSORS AND ACTUATORS Fig. 4.11: Albus’s 4D-RCS architecture for a single vehicle 4.5.5 PolyScheme Nick Cassimatis’s PolyScheme architecture [Cas07| shares with GLAIR the use of multiple logical reasoning methods on a common knowledge store. While its underlying ideas are quite general, currently PolyScheme is being developed in the context of the “object tracking’ domain (construed very broadly). As a logic framework PolyScheme is fairly conventional (unlike GLATR or NARS with their novel underlying formalisms), but PolyScheme has some unique conceptual aspects, for instance its connection with Cassimatis’s theory of mind, which holds that the same core set of logical concepts and relationships underlies both language and physical reasoning [Cas04]. This ties in with the use of a common knowledge store for multiple cognitive processes; for instance it suggests that e the same core relationships can be used for physical reasoning and parsing, but that each of these domains may involve some additional relationships. e language processing may be done via physical-reasoning-based cognitive processes, plus the additional activity of some language-specific processes HOUSE_OVERSIGHT_012998






































































































