நிரல்களம்

Thursday, 27 April 2017

MNIST Study in Pytorch - Rings and Bells

MNIST is the hello world into ML world. MNIST dataset is a collection of images of numbers 0..9 and labels. The images are of size 28x28. Lets just take few sample from the dataset to get a feel of how it looks like.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

class Args:
    pass

args = Args()
args.batch_size = 32
args.cuda = True
args.lr = 0.001
args.momentum = 0.01
args.epochs = 10
args.log_interval = 10

kwargs = {'num_workers': 1, 'pin_memory': True} if args.cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, 
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from PIL import Image
import pprint
import numpy 

num_of_samples = 5

fig = plt.figure(1,(8., 8.))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(num_of_samples, num_of_samples),   
                 axes_pad=0.1)

output = numpy.zeros(num_of_samples ** 2)
for i, (data, target) in enumerate(test_loader):
    if i < 1: #dirty trick to take just one sample
        for j in range(num_of_samples ** 2):
            grid[j].matshow(Image.fromarray(data[j][0].numpy()))
            output[j] = target[j]
    else:
        break
           

output = output.reshape(num_of_samples, num_of_sample)
plt.show()

[[ 6.  9.  9.  5.  4.]
 [ 3.  6.  5.  0.  1.]
 [ 8.  1.  3.  6.  2.]
 [ 9.  4.  8.  8.  6.]
 [ 0.  6.  4.  2.  3.]]

You can see that the image of number <> is associated with number <>. It is a list of (image of number, number). As usual we are gonna feed the neural network with image from the left and its label from the right. We will train a simple feed forward network, call it Model0.

class Model0(nn.Module):
    def __init__(self):
        super(Model0, self).__init__()
        self.output_layer = nn.Linear(28*28, 10)
       
    def forward(self, x):
        x = self.output_layer(x)
        return F.log_softmax(x)
 
class Model1(nn.Module):
    def __init__(self):
        super(Model1, self).__init__()
        self.input_layer = nn.Linear(28*28, 5)
        self.output_layer = nn.Linear(5, 10)
 
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model2(nn.Module):
    def __init__(self):
        super(Model2, self).__init__()
        self.input_layer = nn.Linear(28*28, 6)
        self.output_layer = nn.Linear(6, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model3(nn.Module):    
    def __init__(self):
        super(Model3, self).__init__()
        self.input_layer = nn.Linear(28*28, 7)
        self.output_layer = nn.Linear(7, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model4(nn.Module):
    def __init__(self):
        super(Model4, self).__init__()
        self.input_layer = nn.Linear(28*28, 8)
        self.output_layer = nn.Linear(8, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)


class Model5(nn.Module):
    def __init__(self):
        super(Model5, self).__init__()
        self.input_layer = nn.Linear(28*28, 9)
        self.output_layer = nn.Linear(9, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model6(nn.Module):    
    def __init__(self):
        super(Model6, self).__init__()
        self.input_layer = nn.Linear(28*28, 10)
        self.output_layer = nn.Linear(10, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model7(nn.Module):
    def __init__(self):
        super(Model7, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model8(nn.Module):
    def __init__(self):
        super(Model8, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.hidden_layer = nn.Linear(100, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model9(nn.Module):
    def __init__(self):
        super(Model9, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.hidden_layer = nn.Linear(100, 100)
        self.hidden_layer1 = nn.Linear(100, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        x = self.hidden_layer1(x)
        x = self.output_layer(x)        
        return F.log_softmax(x)

class Model10(nn.Module):
    def __init__(self):
        super(Model10, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.hidden_layer = nn.Linear(100, 100)
        self.hidden_layer1 = nn.Linear(100, 100)
        self.hidden_layer2 = nn.Linear(100, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        x = self.hidden_layer1(x)
        x = self.hidden_layer2(x)
        x = self.output_layer(x
        return F.log_softmax(x)

and lets train it

def train(epoch, model, print_every=10):
    optimizer = optim.SGD(model.parameters(),
           lr=args.lr, momentum=args.momentum)
    for i in range(epoch):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            if args.cuda:
                data, target = data.cuda(), target.cuda()
           
            data = data.view(args.batch_size , -1)
            data, target = Variable(data), Variable(target)
            optimizer.zero_grad()
            output = model(data)
        
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()
           
       
        if i % print_every == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    i, batch_idx * len(data), len(train_loader.dataset),
                    100. * batch_idx / len(train_loader), loss.data[0]))

for model in models:
     train(1000, model)

        
for i, model in enumerate(models):
    model.load_state_dict(torch.load('mnist_mlp_multiple_model{}.pth'.format(i)))

lets see how our network predicts the images.

[[ 6.  2.  9.  1.  8.]
 [ 5.  6.  5.  7.  5.]
 [ 4.  8.  6.  3.  0.]
 [ 6.  1.  0.  9.  3.]
 [ 7.  2.  8.  4.  4.]]

Most of the predictions look right. Lets run this over entire test dataset.

def test(model):
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        if args.cuda:
             data, target = data.cuda(), target.cuda()
        
        data = data.view(data.size()[0], -1)
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        test_loss += F.nll_loss(output, target).data[0]
        pred = output.data.max(1)[1]
        correct += pred.eq(target.data).cpu().sum()

    test_loss = test_loss
    test_loss /= len(test_loader) #
    print(' Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)
      '.format(
        test_loss,
        correct,
        len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)
      )
    )
   return 100. * correct / len(test_loader.dataset)



accuracy = []
for model in models:
    accuracy.append(test_tuts(model))

pprint.pprint(accuracy)

plt.plot(range(len(accuracy)), accuracy, linewidth=1.0)
plt.axis([0, 10, 0, 100])
plt.show()

pl.plot(range(len(accuracy)), accuracy, linewidth=1.0)
plt.axis([0, 10, 90, 93])
plt.show()

MNIST Study in PyTorch

[[ 6.  9.  9.  5.  4.]
 [ 3.  6.  5.  0.  1.]
 [ 8.  1.  3.  6.  2.]
 [ 9.  4.  8.  8.  6.]
 [ 0.  6.  4.  2.  3.]]

class Model0(nn.Module):
    def __init__(self):
        super(Model0, self).__init__()     
        self.output_layer = nn.Linear(28*28, 10)
   
    def forward(self, x):
        x = self.output_layer(x)
        return F.log_softmax(x)

and lets train it

def train(model):
    optimizer = optim.SGD(model.parameters(), 
                          lr=lr, 
                          momentum=momentum)
    model.train()
    for data, target in enumerate(train_loader):
       optimizer.zero_grad()
       data = data.view(batch_size , -1)
       data, target = Variable(data), Variable(target)    
       output = model(data)
 
       loss = F.nll_loss(output, target)
       loss.backward()
       optimizer.step()
 
model = Model0()
train(model)

lets see how our network predicts the images.

[[ 6.  2.  9.  1.  8.]
 [ 5.  6.  5.  7.  5.]
 [ 4.  8.  6.  3.  0.]
 [ 6.  1.  0.  9.  3.]
 [ 7.  2.  8.  4.  4.]]

Most of the predictions look right. Lets run this over entire test dataset.

def test_tuts(model):
    model.eval()
    test_loss, correct = 0, 0

    for data, target in test_loader:
        data = data.view(data.size()[0], -1)
        data, target = Variable(data), Variable(target)
        output = model(data)

        pred          = output.data.max(1)[1] 
        correct   += pred.eq(target.data).cpu().sum()
        test_loss += F.nll_loss(output, target).data[0]

    test_loss = test_loss
    test_loss /= len(test_loader) 
    print(
     'Avg loss: {:.4f}, Accuracy: {}/{}({:.0f}%)'
      .format(
         test_loss,
         correct,
         len(test_loader.dataset),
         100. * correct / len(test_loader.dataset)
      )
    )

Saturday, 25 March 2017

[WIP] CHERU::SecondaryBrain

What are the common activities that we do on the computer?

Read articles, books
Listen to music and watch videos
Write blogs, opinions
Use Internet to communicate with other people or other computers

But where do we keep all the information consumed? Our brain. But at the rate we consume the information, it becomes impossible to verify whether the information is true, retain all the information in our memory. No you might say, we do store documents like PDFs, pictures and docs in computer. Yes we do, but there is a huge difference in the way how our brain store and how we store the information in computer.

I said 'how we store the information in computer', because the computer does not store information by itself. What do I mean by this? The way our brain stores the information is in the form of network of linked concepts, unlike the computers where we store the information in the form of documents and images. Because of the difference between the organization of contents by our brain and the computers, we are redundantly storing information. We are under utilising the facitlities offered by the computer. The computer can do much more than what it is doing for us now. The computer can act as a secondary brain. Let me illustrate the idea with a paper instead of computer, and explain why it is uniquely suited for acting as secondary brain.

Let us say we are going for shopping long list of groceries. What do we do? We list down the items in a paper. Because it is little difficult to remember all the items in memory. I admit that, some of us can remember all the items and with some memory exercises almost all of us are capable of doing the same. But is there any use in remembering those list of groceries in memory? Or is there a point in spending time on memorising that list?

Let us take a complex example. In blindfold chess, the players are blindfolded and they say aloud the pieces to move and where to move. There is a third person who actually carryout the moves. No think about how players have to keep track of the piece positions and simulate the moves before shouting out the next move. Compare that with how easy it would be to look at the board and carry out the calculations of moves.

In the above examples, we offload the unnecessary things onto outside elements like paper and chess board. leaving room for more important things in brain. I think it is safe to assume that now you might have understood the usefullness of very simple tools like papers and chess boards. Imagine what computer can do, and what we can do with computers. Unlike papers and chess boards, the computer can carry out calculations on their own(computer play chess too), be as simple as they seem when compared to our brain. This makes it an effective tool to act as a secondary brain. What I mean by secondary brain will become apparent as we travel along.

This document will serve as an informal specification of how I image the secondary brain might work. See you on next post.

#CHERU::SecondaryBrain

Friday, 24 March 2017

[WIP] Deep Learning hardware for the Commons

Deep learning required huge amount of processing power. Building such in all households is infeasible in near future (even if there is a breakthrough in hardware technologies for ML, it will mostly be costlier than already available options).

All ML systems consists of two phases, training and prediction(or classification, in general application). Fortunately, actual prediction process requires lesser processing power than training.

It might be useful to cooperatively design/build and run a powerful machine for training purposes and then use the model developed from training in less powerful machines like raspberry-pi.

Note that the training process need not end once and for all. We can collect new data sample after deployment, even though not every application area cater to this process, but certainly there are areas we can harness this.

Large System with all the rings and bells.

A compute cluster with heterogenous nodes, nodes with different combinations of cpu/gpu, nodes with different kinds of FPGA based accelerators (along with tensorflow model graph to FPGA compiler)

FPGA can also be used to run trained models for deployment.

Tuesday, 7 February 2017

MeshNetwork: The Hardware Hunt

In my first post on the community mesh network, I mentioned that we are at disadvantage politically and at relatively a better place technologically.

Well I was wrong. Though we have gathered some knowledge over mesh network over the period, we are still years behind in installing a city or even a street scale mesh network.

The harware was the problem. We are not in a position to afford a dual band router(which operated over both the bands 2.4Ghz and 5Ghz, simultaneously/concurrently) because the single band routers degrades the bandwidth geometrically with the diameter of the network. The price of a openwrt compatible concurrent dual band router available in India starts around Rs.11000, which is too expensive for just a router.

So we had to look at alternatives options. How to build a concurrent dual band Wifi device(we haven't looked other physical layer protocols as of now, I think we should). The following are the alternatives that we have come up with.

Wifi-dongle with RaspberryPi[2]
Project Turris
Gl-inet devices
A normal PC with PCI(e) Wifi adapter
Software Defined Radio

Wifi-dongle with RPi (Rs.5000)

Our survey over the internet forums have suggested that the RPi will not be able to handle the needs of a router. We had planned to test this claim. We have not pursued it, because we do not have access to dual band Wifi dongle yet(Yes, we are broke)

Project Turris (Not available for sale in India)

This is a project to develop open source router, an NAS server. They have designed their own hardware and distribute it with a custom operating system called, surprise surprise - Turris OS. It is not available for sale in India.

Gl-inet AR300MD (Not yet in production)

They manufacture openwrt compatible networking devices. We are interested in their AR300MD variants which are not available yet. Yes, D stands for dual band. We have asked them about it launch, but there is no definitive idea on that yet.

Wifi-adapter/PC (Rs.11000)

The cheaper dual core intel or amd desktops paired with a dual band wifi adapter makes viable candidate for meshnet. The desktops costs 5k and the wifi-dapter costs 6.5k, so approximately the total costs is around 10k to 12k. I think it makes sense to spend 11k on a full desktop instead of just a router box. Of course it will consume more power than a router. If we can make use of power optimizations (like dynamic clocking) in linux, this is a good bet.

SDR (no clear idea)

What is SDR? go to Wikipedia. It is ultimately the best platform for community meshnet given its versatile capabilities. It comes in two varieties, where radio components that have been typically implemented in hardware are instead implemented by means of software on a personal computer or embedded system.

We just had a meetup today where many of its merits surfaced. Imagine a device every 10meters capable of monitoring itself and its environment, which can adapts itself to the situations so the communication is never brought down due to technical and political nuances. Would it be possible to transport IP packets over FM radio? It might be. I don't know yet.

Which one to choose?

Forgive me for this boring essay so far. "Why are you stating these obvious things?" you may ask. The actual post starts here. That is just the prelude. Please bear with me for few more words. I can assure you that it will be worth your time(if you're into mesh-net)

Our idea is to kickstart this mesh-community with PC based mesh nodes. A desktop with wifi-card can serve as a beasty router. Well not just a router. It can do much more than that. We can host content in the desktop - every node in the mesh is a web-server now. It can act as an SDR, practically providing us with some(Wifi-card is not a programmable component) of the benefits of SDR(what if could design a SDR card that could be plugged into a desktop?)

At this point, "come on man, who will buy a full desktop instead of a cheap router, especially when they already have a computer at home." you will say.

Well you're right. That is why I think we should impart his idea into school students. Also we can talk to computer sales centers, to see if we can cut a deal in order to proliferate this device. But again its merits weigh high in my balance.

Up until this point the merits I have been talking in the sense of single device. The desktop could run a lot more sophisticated protocols and algorithms to accomplish much more than what a router can do. The mesh nodes can balance loads and schedule operations intelligently. A node operator can inform the network that how long this particular node will be up, and the nework shall adapt itself according to such instructions(or clues?).

As for the content distribution, a torrent like protocol can used to share commons' content. Anyone could author articles. Journalism - decentralized. A social network which we envision(a new post coming up) designed run over such mesh network will make an awesome tool for journalism.

Couple of caveats - power consumption, and bulky. That is it on the hardware side.

We also started to work on documentation and training sessions for beginners. Ganesh and Mugil are kind enough to offer us a session on introduction on communicaion engineering, and Roopak and his friends from Trichy are coming to Pondicherry for the same. Awaiting the day.

As most of my essays, this one also might be random ramblings, please let me know your thoughts

[1] So far we haven't looked into linux distros other than OpenWRT
[2] Or any other embedded platform which can run linux

Thursday, 7 July 2016

The First One on Chatbots

Chatbot is a computer program that talks like a human to the user usually via a chat-room/messenger interface. It is daunting task for a computer to understand what a user is saying and respond to him/her in a sensible manner, since our languages are complex and we do not follow strict grammar in a casual conversation. But there are some shortcuts through which it can fool the user into believing that the responder on the other side is human and not a robot.

Sane and even some insane humans have enough intelligence infrastructure in their brain, to understand what other humans are saying(assuming they speak the same language) and respond to them with appropriate response. But how can a computer do that? It is obvious that the AI scientists are trying to mimic/embody the human or human like intelligence in computer for a while and with varying measures of success. How can then chatbots can accomplish its goal - talk like a human? Well, as mentioned above, there are tricks and shortcuts with we can "program" chatbots to be intelligent enough.

There are various ways devised and discovered from our experience with the field of artificial intelligence to embody intelligence into chatbots. The very old one(to my knowledge) is pattern-matching with some level of context management and that is what we going to look into. For instance,

Kayalvizhi: What is your name?
Edinburgh: My name is Edinburgh.
Edinburgh: What is your name?
Kayalvizhi: My name is Kayalvizhi.

I admit that is a lame example, but it illustrates the point. Our usual conversations tend to be in a pattern. Usually the human languages have lot more words than how many are used in day-to-day life. With this fact on hand we can employ pattern matching to construct a meaningful message to user. Let's see how we can build a bot for the above conversation in RiveScript.

RiveScript

[From Rivescript website] RiveScript is a simple scripting language for chatbots with a friendly, easy to learn syntax. RiveScript exposes a simple plain text scripting language that's easy to learn and begin writing in quickly. RiveScript has a handful of simple rules that can be combined in powerful ways to build an impressive chatbot personality. Write triggers in a simplified regular expression format to match complex sets of word patterns in one go. RiveScript takes a "Unix-like" approach to its development: the core library is small and self-contained and it does one thing very well—takes human input and gives an intelligent response. This flexibility enables RiveScript to be used how you need it to.

Basically RiveScripts consists of triggers and responses and something called topic.

Triggers are messages typed into, by the user. The chatbot read the messages, and it matches with the list triggers(already programmed) and finds a best match and then responds with the response string from the matched trigger.

one.rive   -- our first script.
+ my name is edinburgh
- that is nice name

When ever you say that "my name is edinburgh" the bot will reply "that is a nice name." This is a very dumb bot, and it understands only one sentence - "my name is edinburgh".

Lets modify it a little bit, so that anyone can talk to our bot.
two.rive
+ my name is *
- hi, there, how are you?

Conversation 2
Kuzhali: my name is kuzhali
Bot       : hi there, how are you?

Cheran: my name is cheran
Bot      : hi there, how are you?

Mark   : my name is mark
Bot      : hi there, how are you?

Toyota : my name is toyota
Bot      : hi there, how are you?

Wednesday, 11 May 2016

Thoughts on DNN and AGI

DNN/Deep learning has impressed us with its capability to recognize image and audio with unprecedented accuracies.

Systems like AlphaGo(which beat the best human player on Go), are trained on huge cluster of powerful machines over very very large dataset. Google spend millions if not billions on such projects.

But the knowledge consumed by AlphaGo cannot be understood("do we need to?" is an ethical question though). In otherwords, we cannot instruct the system to perform a task, procedurally. Humans still have to be trained for developing a certain skillset, That is how we learn to work as a team, Read, integration with other systems.

Training the intelligence systems involves either of the following methods.

Provide the system with large dataset and learning algorithms, and let the system figure out the best possible representation for knowledge.
Provide the system with large dataset and learning algorithms along with suitable knowledge representation model.

Thre are two kinds of knowledge embedded in the data. The image and audio streams are raw in nature, i.e they are concrete knowledge, whereas the text and content of speech and visuals are comprised of abstract knowledge. Infact, they contain knowledge with multiple levels of abstraction.

To put it in different perspective, if we consider image/audio recognition to be mining of a pattern from clump of data, then we can think of the recognition of abstract entities or ideas among the content of image/audio to be mining patterns within patterns within patterns and so on.

DNN distributes it knowledge over the network in the form of weights. The weights is the knowledge. This works well for concrete data like image and audio. But for abtract ideas, it may not. Even it will, it will be with great diffculty. I will explain why I think so.

Lets take how we feed the inputs to DNN. It takes a vector as the input. Images/audio, naturally gives themselves into a vector. But for, abtract content we need to represent the abstract entities in the form of vector.

For instance, in case of word embeddings, the words of the langauge are assigned an interger. How do we assign an integer to a particular word varies from you to me to another person. We train the system and made it do something useful. Now the knowledge gathered by the system is not share-able. Since the knowledge is represented by the weight matrix, a subset of the matrix outside the whole matrix is probably meaningless. For each element in the matrix, every other element sets the context.

In constrast to DNN, other AI systems such as OpenCog for instance represent, knowledge in the form of atoms in an hypergraph. The entire knowledge base is contain in what is called Atomspace. This atomspace is used to store all kinds of knowledge, declarative and procedural. The atomspace can be instructed to perform something though rewritting the graph, i.e the knowledge.

Mining patterns within patters can be done relatively easily with such represention, by scientists and test different learning algorithms and understand how they behave. i.e we can be the psycologits of AGI machines.

Although it may be possible to build human like brain with just DNNs, it will not be accessible to everyone, due to the huge cost involved. The community at present cannot afford to spend the cost, be it money, time, education. So, I believe it is better to employ an hybrid approach. Use DNNs for recognizing concrete knowledge and OpenCog like systems for more abstract level idea, like metaphors.

How do make AGI possible as a community? Setup a BOINC like infrastructure for AGI training? Distribute the Hypergraph over a P2P network like the meshnet? How do we avoid the corporate lock-in?