Search This Blog

Thursday 27 April 2017

MNIST Study in Pytorch - Rings and Bells

MNIST is the hello world into ML world. MNIST  dataset is a collection of images of numbers 0..9 and labels. The images are of size 28x28. Lets just take few sample from the dataset to get a feel of how it looks like.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

class Args:
    pass

args = Args()
args.batch_size = 32
args.cuda = True
args.lr = 0.001
args.momentum = 0.01
args.epochs = 10
args.log_interval = 10

kwargs = {'num_workers': 1, 'pin_memory': True} if args.cuda else {}
train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, 
                    transform=transforms.Compose([
                        transforms.ToTensor(),
                        transforms.Normalize((0.1307,), (0.3081,))
                    ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from PIL import Image
import pprint
import numpy 

num_of_samples = 5

fig = plt.figure(1,(8., 8.))
grid = ImageGrid(fig, 111,
                 nrows_ncols=(num_of_samples, num_of_samples),   
                 axes_pad=0.1)

output = numpy.zeros(num_of_samples ** 2)
for i, (data, target) in enumerate(test_loader):
    if i < 1: #dirty trick to take just one sample
        for j in range(num_of_samples ** 2):
            grid[j].matshow(Image.fromarray(data[j][0].numpy()))
            output[j] = target[j]
    else:
        break
           

output = output.reshape(num_of_samples, num_of_sample)
plt.show()


[[ 6.  9.  9.  5.  4.]
 [ 3.  6.  5.  0.  1.]
 [ 8.  1.  3.  6.  2.]
 [ 9.  4.  8.  8.  6.]
 [ 0.  6.  4.  2.  3.]]


You can see that the image of number <> is associated with number <>. It is a list of (image of number, number). As usual we are gonna feed the neural network with image from the left and its label from the right. We will train a simple feed forward network, call it Model0.


class Model0(nn.Module):
    def __init__(self):
        super(Model0, self).__init__()
        self.output_layer = nn.Linear(28*28, 10)
       
    def forward(self, x):
        x = self.output_layer(x)
        return F.log_softmax(x)
 
class Model1(nn.Module):
    def __init__(self):
        super(Model1, self).__init__()
        self.input_layer = nn.Linear(28*28, 5)
        self.output_layer = nn.Linear(5, 10)
 
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model2(nn.Module):
    def __init__(self):
        super(Model2, self).__init__()
        self.input_layer = nn.Linear(28*28, 6)
        self.output_layer = nn.Linear(6, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model3(nn.Module):    
    def __init__(self):
        super(Model3, self).__init__()
        self.input_layer = nn.Linear(28*28, 7)
        self.output_layer = nn.Linear(7, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model4(nn.Module):
    def __init__(self):
        super(Model4, self).__init__()
        self.input_layer = nn.Linear(28*28, 8)
        self.output_layer = nn.Linear(8, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)


class Model5(nn.Module):
    def __init__(self):
        super(Model5, self).__init__()
        self.input_layer = nn.Linear(28*28, 9)
        self.output_layer = nn.Linear(9, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model6(nn.Module):    
    def __init__(self):
        super(Model6, self).__init__()
        self.input_layer = nn.Linear(28*28, 10)
        self.output_layer = nn.Linear(10, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model7(nn.Module):
    def __init__(self):
        super(Model7, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model8(nn.Module):
    def __init__(self):
        super(Model8, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.hidden_layer = nn.Linear(100, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        x = self.output_layer(x)
        return F.log_softmax(x)

class Model9(nn.Module):
    def __init__(self):
        super(Model9, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.hidden_layer = nn.Linear(100, 100)
        self.hidden_layer1 = nn.Linear(100, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        x = self.hidden_layer1(x)
        x = self.output_layer(x)        
        return F.log_softmax(x)

class Model10(nn.Module):
    def __init__(self):
        super(Model10, self).__init__()
        self.input_layer = nn.Linear(28*28, 100)
        self.hidden_layer = nn.Linear(100, 100)
        self.hidden_layer1 = nn.Linear(100, 100)
        self.hidden_layer2 = nn.Linear(100, 100)
        self.output_layer = nn.Linear(100, 10)
        
    def forward(self, x):
        x = self.input_layer(x)
        x = self.hidden_layer(x)
        x = self.hidden_layer1(x)
        x = self.hidden_layer2(x)
        x = self.output_layer(x
        return F.log_softmax(x)

and lets train it

def train(epoch, model, print_every=10):
    optimizer = optim.SGD(model.parameters(),
           lr=args.lr, momentum=args.momentum)
    for i in range(epoch):
        model.train()
        for batch_idx, (data, target) in enumerate(train_loader):
            if args.cuda:
                data, target = data.cuda(), target.cuda()
           
            data = data.view(args.batch_size , -1)
            data, target = Variable(data), Variable(target)
            optimizer.zero_grad()
            output = model(data)
        
            loss = F.nll_loss(output, target)
            loss.backward()
            optimizer.step()
           
       
        if i % print_every == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                    i, batch_idx * len(data), len(train_loader.dataset),
                    100. * batch_idx / len(train_loader), loss.data[0]))

for model in models:
     train(1000, model)

        
for i, model in enumerate(models):
    model.load_state_dict(torch.load('mnist_mlp_multiple_model{}.pth'.format(i)))
lets see how our network predicts the images.
[[ 6.  2.  9.  1.  8.]
 [ 5.  6.  5.  7.  5.]
 [ 4.  8.  6.  3.  0.]
 [ 6.  1.  0.  9.  3.]
 [ 7.  2.  8.  4.  4.]]
Most of the predictions look right. Lets run this over entire test dataset.
def test(model):
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        if args.cuda:
             data, target = data.cuda(), target.cuda()
        
        data = data.view(data.size()[0], -1)
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        test_loss += F.nll_loss(output, target).data[0]
        pred = output.data.max(1)[1]
        correct += pred.eq(target.data).cpu().sum()

    test_loss = test_loss
    test_loss /= len(test_loader) #
    print(' Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)
      '.format(
        test_loss,
        correct,
        len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)
      )
    )
   return 100. * correct / len(test_loader.dataset)



accuracy = []
for model in models:
    accuracy.append(test_tuts(model))

pprint.pprint(accuracy)

plt.plot(range(len(accuracy)), accuracy, linewidth=1.0)
plt.axis([0, 10, 0, 100])
plt.show()


pl.plot(range(len(accuracy)), accuracy, linewidth=1.0)
plt.axis([0, 10, 90, 93])
plt.show()

MNIST Study in PyTorch

MNIST is the hello world into ML world. MNIST  dataset is a collection of images of numbers 0..9 and labels. The images are of size 28x28. Lets just take few sample from the dataset to get a feel of how it looks like. 

[[ 6.  9.  9.  5.  4.]
 [ 3.  6.  5.  0.  1.]
 [ 8.  1.  3.  6.  2.]
 [ 9.  4.  8.  8.  6.]
 [ 0.  6.  4.  2.  3.]]


You can see that the image of number <> is associated with number <>. It is a list of (image of number, number). As usual we are gonna feed the neural network with image from the left and its label from the right. We will train a simple feed forward network, call it Model0.

class Model0(nn.Module):
    def __init__(self):
        super(Model0, self).__init__()     
        self.output_layer = nn.Linear(28*28, 10)
   
    def forward(self, x):
        x = self.output_layer(x)
        return F.log_softmax(x)
and lets train it

def train(model):
    optimizer = optim.SGD(model.parameters(), 
                          lr=lr, 
                          momentum=momentum)
    model.train()
    for data, target in enumerate(train_loader):
       optimizer.zero_grad()
       data = data.view(batch_size , -1)
       data, target = Variable(data), Variable(target)    
       output = model(data)
 
       loss = F.nll_loss(output, target)
       loss.backward()
       optimizer.step()
 
model = Model0()
train(model)

lets see how our network predicts the images.

[[ 6.  2.  9.  1.  8.]
 [ 5.  6.  5.  7.  5.]
 [ 4.  8.  6.  3.  0.]
 [ 6.  1.  0.  9.  3.]
 [ 7.  2.  8.  4.  4.]]
Most of the predictions look right. Lets run this over entire test dataset.
def test_tuts(model):
    model.eval()
    test_loss, correct = 0, 0

    for data, target in test_loader:
        data = data.view(data.size()[0], -1)
        data, target = Variable(data), Variable(target)
        output = model(data)

        pred          = output.data.max(1)[1] 
        correct   += pred.eq(target.data).cpu().sum()
        test_loss += F.nll_loss(output, target).data[0]

    test_loss = test_loss
    test_loss /= len(test_loader) 
    print(
     'Avg loss: {:.4f}, Accuracy: {}/{}({:.0f}%)'
      .format(
         test_loss,
         correct,
         len(test_loader.dataset),
         100. * correct / len(test_loader.dataset)
      )
    )

   

Saturday 25 March 2017

[WIP] CHERU::SecondaryBrain

What are the common activities that we do on the computer?
  • Read articles, books
  • Listen to music and watch videos
  • Write blogs, opinions
  • Use Internet to communicate with other people or other computers
But where do we keep all the information consumed? Our brain. But at the rate we consume the information, it becomes impossible to verify whether the information is true, retain all the information in our memory. No you might say, we do store documents like PDFs, pictures and docs in computer. Yes we do, but there is a huge difference in the way how our brain store and how we store the information in computer.

I said 'how we store the information in computer', because the computer does not store information by itself. What do I mean by this? The way our brain stores the information is in the form of network of linked concepts, unlike the computers where we store the information in the form of documents and images. Because of the difference between the organization of contents by our brain and the computers, we are redundantly storing information. We are under utilising the facitlities offered by the computer. The computer can do much more than what it is doing for us now. The computer can act as a secondary brain. Let me illustrate the idea with a paper instead of computer, and explain why it is uniquely suited for acting as secondary brain.

Let us say we are going for shopping long list of groceries. What do we do? We list down the items in a paper. Because it is little difficult to remember all the items in memory. I admit that, some of us can remember all the items and with some memory exercises almost all of us are capable of doing the same. But is there any use in remembering those list of groceries in memory? Or is there a point in spending time on memorising that list?

Let us take a complex example. In blindfold chess, the players are blindfolded and they say aloud the pieces to move and where to move. There is a third person who actually carryout the moves. No think about how players have to keep track of the piece positions and simulate the moves before shouting out the next move. Compare that with how easy it would be to look at the board and carry out the calculations of moves.

In the above examples, we offload the unnecessary things onto outside elements like paper and chess board. leaving room for more important things in brain. I think it is safe to assume that now you might have understood the usefullness of very simple tools like papers and chess boards. Imagine what computer can do, and what we can do with computers. Unlike papers and chess boards, the computer can carry out calculations on their own(computer play chess too), be as simple as they seem when compared to our brain. This makes it an effective tool to act as a secondary brain. What I mean by secondary brain will become apparent as we travel along.

This document will serve as an informal specification of how I image the secondary brain might work.  See you on next post.

#CHERU::SecondaryBrain

Friday 24 March 2017

[WIP] Deep Learning hardware for the Commons

Deep learning required huge amount of processing power. Building such in all households is infeasible in near future (even if there is a breakthrough in hardware technologies for ML, it will mostly be costlier than already available options).

All ML systems consists of two phases, training and prediction(or classification, in general application). Fortunately, actual prediction process requires lesser processing power than training.

It might be useful to cooperatively design/build and run a powerful machine for training purposes and then use the model developed from training in less powerful machines like raspberry-pi.

Note that the training process need not end once and for all. We can collect new data sample after deployment, even though not every application area cater to this process, but certainly there are areas we can harness this.
 
Large System with all the rings and bells.

A compute cluster with heterogenous nodes, nodes with different combinations of cpu/gpu, nodes with different kinds of FPGA based accelerators (along with tensorflow model graph to FPGA compiler)

FPGA can also be used to run trained models for deployment. 


Tuesday 7 February 2017

MeshNetwork: The Hardware Hunt


In my first post on the community mesh network, I mentioned that we are at disadvantage politically and at relatively a better place technologically.

Well I was wrong. Though we have gathered some knowledge over mesh network over the period, we are still years behind in installing a city or even a street scale mesh network.

The harware was the problem. We are not in a position to afford a dual band router(which operated over both the bands 2.4Ghz and 5Ghz, simultaneously/concurrently) because the single band routers degrades the bandwidth geometrically with the diameter of the network. The price of a  openwrt compatible concurrent dual band router available in India starts around Rs.11000, which is too expensive for just a router.

So we had to look at alternatives options. How to build a concurrent dual band Wifi device(we haven't looked other physical layer protocols as of now, I think we should). The following are the alternatives that we have come up with.

  1. Wifi-dongle with RaspberryPi[2]
  2. Project Turris
  3. Gl-inet devices
  4. A normal PC with PCI(e) Wifi adapter
  5. Software Defined Radio

Wifi-dongle with RPi (Rs.5000)

  Our survey over the internet forums have suggested that the RPi will not be able to handle the needs of a router. We had planned to test this claim. We have not pursued it, because we do not have access to dual band Wifi dongle yet(Yes, we are broke)

Project Turris (Not available for sale in India)

 This is a project to develop open source router, an NAS server. They have designed their own hardware and distribute it with a custom operating system called, surprise surprise - Turris OS. It is not available for sale in India.

Gl-inet AR300MD (Not yet in production)

 They manufacture openwrt compatible networking devices. We are interested in their AR300MD variants which are not available yet. Yes, D stands for dual band. We have asked them about it launch, but there is no definitive idea on that yet.

Wifi-adapter/PC (Rs.11000)

The cheaper dual core intel or amd desktops paired with a dual band wifi adapter makes viable candidate for meshnet. The desktops costs 5k and the wifi-dapter costs 6.5k, so approximately the total costs is around 10k to 12k. I think it makes sense to spend 11k on a full desktop instead of just a router box. Of course it will consume more power than a router. If we can make use of power optimizations (like dynamic clocking) in linux, this is a good bet.

SDR (no clear idea)

What is SDR? go to Wikipedia. It is ultimately the best platform for community meshnet given its versatile capabilities. It comes in two varieties, where radio components that have been typically implemented in hardware are instead implemented by means of software on a personal computer or embedded system.

We just had a meetup today where many of its merits surfaced. Imagine a device every 10meters capable of monitoring itself and its environment, which can adapts itself to the situations so the communication is never brought down due to technical and political nuances. Would it be possible to transport IP packets over FM radio? It might be. I don't know yet.

Which one to choose?

Forgive me for this boring essay so far. "Why are you stating these obvious things?" you may ask. The actual post starts here. That is just the prelude. Please bear with me for few more words. I can assure you that it will be worth your time(if you're into mesh-net)

Our idea is to kickstart this mesh-community with PC based mesh nodes. A desktop with wifi-card can serve as a beasty router. Well not just a router. It can do much more than that. We can host content in the desktop - every node in the mesh is a web-server now. It can act as an SDR, practically providing us with some(Wifi-card is not a programmable component) of the benefits of SDR(what if could design a SDR card that could be plugged into a desktop?)

At this point, "come on man, who will buy a full desktop instead of a cheap router, especially when they already have a computer at home." you will say.

Well you're right. That is why I think we should impart his idea into school students. Also we can talk to computer sales centers, to see if we can cut a deal in order to proliferate this device. But again its merits weigh high in my balance.

Up until this point the merits I have been talking in the sense of single device. The desktop could run a lot more sophisticated protocols and algorithms to accomplish much more than what a router can do. The mesh nodes can balance loads and schedule operations intelligently. A node operator can inform the network that how long this particular node will be up, and the nework shall adapt itself according to such instructions(or clues?).

As for the content distribution, a torrent like protocol can used to share commons' content. Anyone could author articles. Journalism - decentralized. A social network which we envision(a new post coming up) designed run over such mesh network will make an awesome tool for journalism.

Couple of caveats - power consumption, and bulky. That is it on the hardware side.

We also started to work on documentation and training sessions for beginners. Ganesh and Mugil are kind enough to offer us a session on introduction on communicaion engineering, and Roopak and his friends from Trichy are coming to Pondicherry for the same. Awaiting the day.

As most of my essays, this one also might be random ramblings, please let me know your thoughts

[1] So far we haven't looked into linux distros other than OpenWRT
[2] Or any other embedded platform which can run linux