Search This Blog

Wednesday 11 May 2016

Thoughts on DNN and AGI

DNN/Deep learning has impressed us with its capability to recognize image and audio with unprecedented accuracies.

Systems like AlphaGo(which beat the best human player on Go), are trained on huge cluster of powerful machines over very very large dataset. Google spend millions if not billions on such projects.

But the knowledge consumed by AlphaGo cannot be understood("do we need to?" is an ethical question though). In otherwords, we cannot instruct the system to perform a task, procedurally. Humans still have to be trained for developing a certain skillset, That is how we learn to work as a team, Read, integration with other systems.

Training the intelligence systems involves either of the following methods.
  1. Provide the system with large dataset and learning algorithms, and let the system figure out the best possible representation for knowledge.
  2. Provide the system with large dataset and learning algorithms along with suitable knowledge representation model.

Thre are two kinds of knowledge embedded in the data. The image and audio streams are raw in nature, i.e they are concrete knowledge, whereas the text and content of speech and visuals are comprised of abstract knowledge. Infact, they contain knowledge with multiple levels of abstraction.

To put  it in different perspective, if we consider image/audio recognition to be mining of a pattern from clump of data, then we can think of the recognition of abstract entities or ideas among the content of image/audio to be mining patterns within patterns within patterns and so on.

DNN distributes it knowledge over the network in the form of weights. The weights is the knowledge. This works well for concrete data like image and audio. But for abtract ideas, it may not. Even it will, it will be with great diffculty. I will explain why I think so.

Lets take how we feed the inputs to DNN. It takes a vector as the input. Images/audio, naturally gives themselves into a vector. But for, abtract content we need to represent the abstract entities in the form of vector.

For instance, in case of word embeddings, the words of the langauge are assigned an interger. How do we assign an integer to a particular word varies from you to me to another person. We train the system and made it do something useful. Now the knowledge gathered by the system is not share-able. Since the knowledge is represented by the weight matrix, a subset of the matrix outside the whole matrix is probably meaningless. For each element in the matrix, every other element sets the context.

In constrast to DNN, other AI systems such as OpenCog for instance represent, knowledge in the form of atoms in an hypergraph. The entire knowledge base is contain in what is called Atomspace. This atomspace is used to store all kinds of knowledge, declarative and procedural. The atomspace can be instructed to perform something though rewritting the graph, i.e the knowledge.

Mining patterns within patters can be done relatively easily with such represention, by scientists and test different learning algorithms and understand how they behave. i.e we can be the psycologits of AGI machines.

Although it may be possible to build human like brain with just DNNs, it will not be accessible to everyone, due to the huge cost involved. The community at present cannot afford to spend the cost, be it money, time, education. So, I believe it is better to employ an hybrid approach. Use DNNs for recognizing concrete knowledge and OpenCog like systems for more abstract level idea, like metaphors.

How do make AGI possible as a community? Setup a BOINC like infrastructure for AGI training? Distribute the Hypergraph over a P2P network like the meshnet? How do we avoid the corporate lock-in?

2 comments:

  1. "Since the knowledge is represented by the weight matrix, a subset of the matrix outside the whole matrix is probably meaningless. For each element in the matrix, every other element sets the context."
    That is true. But how is it any different from reusing knowledge from a system where each symbol(word) is represented by an atom. Also, distributed representation of symbols(words) has a lot of advantages over symbolic representations. Consider a subset of the knowledge (collection of word vectors). This subset still represents meaningful relationships among the symbols in it.
    Higher layers of abstraction can be achieved by embedding higher level symbols : sentences, documents, etc. But it isn't as organic as how a hypergraph based system does it. This is because a hypergraph based knowledge storing mechanism is built with multiple layers of abstractions in mind.
    We do need a system that represents symbols in a distributed manner and is equipped to deal with multiple layers of abstraction. We also need tools to analyze, understand and infer from the knowledge matrix/tensor/graph of the system.
    "How do make AGI possible as a community? Setup a BOINC like infrastructure for AGI training? Distribute the Hypergraph over a P2P network like the meshnet? How do we avoid the corporate lock-in?"

    I believe we need a Blockchain like system that'll let each node in the network (via internet) to gather data individually and train a distributed learning system. In exchange, each node can make use of the learning system to achieve their goals, via APIs. The access to the system can be regulated based on the individual contribution to training the system.

    ReplyDelete
  2. "But how is it any different from reusing knowledge from a system where each symbol(word) is represented by an atom."

    Taking AtomSpace as an abstract entity which is implemented over hypergraph, an atom based on the linkage to other atoms, will make more sense than by its own.

    "Also, distributed representation of symbols(words) has a lot of advantages over symbolic representations."

    I agree. it is easier to compute the response for a stimuli with this distributed representation. The representation lends itself to be computed over parallel computing platform like GPU. But the same can be said about the atoms in clustered computers which operate over multiple responses for same stimuli for a given input parallely.

    "Consider a subset of the knowledge (collection of word vectors). This subset still represents meaningful relationships among the symbols in it."

    I agree, but the subset is probably cannot integrated with other systems, at least without a layer of "something" such as APIs.

    ReplyDelete