My DS Coding Bolg

Monday, September 17, 2018

Install lightGBM

If you don't mind doing a conda install, try:

import sys
!conda install --yes --prefix {sys.prefix} -c conda-forge lightgbm

This resolved the problem for me (run in a jupyter notebook cell)

brew install openssl

brew install openssl gcc@4.9

HOMEBREW_BUILD_FROM_SOURCE=1 brew install gcc --without-glibc

brew search gcc

gcc-4.4 --version

gcc-4.4 (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

Copyright (C) 2010 Free Software Foundation, Inc.

This is free software; see the source for copying conditions.  There is NO

warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

[ling@dev6-ling-4105472d.bf2 A:DEVEL ~]$ brew install openssl

==> Downloading https://www.openssl.org/source/openssl-1.0.2p.tar.gz

######################################################################## 100.0%

==> perl ./Configure --prefix=/home/ling/.linuxbrew/Cellar/openssl/1.0.2p --openssldir=/home/ling/.linuxbrew/etc/openssl no-ssl2 no-ssl3 no-zlib shared enable-cms -Wa,--noexecstack 

==> make depend

==> make

==> make test

==> make install MANDIR=/home/ling/.linuxbrew/Cellar/openssl/1.0.2p/share/man MANSUFFIX=ssl

==> Downloading https://curl.haxx.se/ca/cacert-2017-01-18.pem

######################################################################## 100.0%

==> Caveats

A CA file has been bootstrapped using certificates from the SystemRoots

keychain. To add additional certificates (e.g. the certificates added in

the System keychain), place .pem files in

  /home/ling/.linuxbrew/etc/openssl/certs

and run

  /home/ling/.linuxbrew/opt/openssl/bin/c_rehash

==> Summary

🍺  /home/ling/.linuxbrew/Cellar/openssl/1.0.2p: 1,792 files, 14.1MB, built in 5 minutes 33 seconds

[ling@dev6-ling-4105472d.bf2 A:DEVEL ~]$ ruby --version

ruby 2.2.0p0 (2014-12-25 revision 49005) [x86_64-linux]

https://github.com/Linuxbrew/brew/wiki/Symlink-GCC

The system is: Linux - 2.6.32-573.8.1.el6.x86_64 - x86_64

sudo pip install  glibc

brew install gcc
brew install cmake git clone --recursive https://github.com/Microsoft/LightGBM ; cd LightGBM export CXX=g++-7 CC=gcc-7 mkdir build ; cd build cmake .. make -j4 pip install lightGBM

cd LightGBM/python-package

export CXX=g++-7 CC=gcc-7

python setup.py install

conda install cmake

conda install glibc

/home/ling/.linuxbrew/bin/gcc-4.4

ftp http://www.cmake.org/files/v2.8/cmake-2.8.7.tar.gz

https://github.com/Microsoft/LightGBM/issues/701

yum install firefox

yum -y install firefox

yum remove firefox

yum -y install firefox

yum update mysql

yum list openssh

yum search vsftpd

yum info firefox

yum list | less

yum list installed | less

yum provides /etc/http/donf/httpd.conf

yum check-update

yum update

yum grouplist

yum group install 'MYSQL Database'

yum group update 'DNS Name Server'

yum group remove 'DNS Name Server'

yum repolist

yum repolist all

yum clean all

yum history

Wednesday, August 1, 2018

git clone https://github.com/twitter/scalding.git
Install homebrew -
brewsh -c "$(curl -fsSL https://raw.githubusercontent.com/Linuxbrew/install/master/install.sh)"

Or in Mac

/usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
sbt assembly
#mv ~/.sbt/repositories ~
mv ~/repositories ~/.sbt/

vim tutorial/WONDERLAND.md

To build and launch the repl:
```
./sbt scalding-repl/assembly
./scripts/scald.rb --repl --local
```
chmod +x run-repl.sh

**What is SBT?**

When you write small programs that consist of only one, or just two or three source files, then it's easy enough to compile those source files by typing scalac MyProgram.scala in the command line. But when you start working on a bigger project with dozens or maybe even hundreds of source files, then it becomes too tedious to compile all those source files manually. You will then want to use a build tool to manage compiling all those source files.

sbt is such a tool. There are other tools too, some other well-known build tools that come from the Java world are Ant and Maven.

How it works is that you create a project file that describes what your project looks like; when you use sbt, this file will be called build.sbt. That file lists all the source files your project consists of, along with other information about your project. Sbt will read the file and then it knows what to do to compile the complete project.

Besides managing your project, some build tools, including sbt, can automatically manage dependencies for you. This means that if you need to use some libraries written by others, sbt can automatically download the right versions of those libraries and include them in your project for you.

Further Reading about SBT Directories:

- https://www.scala-sbt.org/1.x/docs/Directories.html

**Steps to run a job?**

Step 1: Compile code using sbt compile in the directory that contains the build.sbt file (first green arrow in image)

What does sbt compile actually do?

* compile Compiles the main sources (in the src/main/scala directory) into the "target" directory

- You can see the target dir in your project folders

- Step 2: Enter sbt command

- What does the sbt command actually do?

- Starts an sbt server

- Allows you to run sbt commands

- Examples

- Compile

- Test

- Verify

- runMain

- Step 3: Change sbt projects (3rd arrow)

- Read more about projects here: https://alvinalexander.com/scala/how-to-create-sbt-projects-with-subprojects

- This lets sbt know which set of "src" files to use

- Step 4: runMain (4th arrow)

- Execute the main function in a package

**http://twitter.github.io/scala_school/basics.html**

## About this class

The first few weeks will cover basic syntax and concepts, then we’ll start to open it up with more exercises.

Some examples will be given as if written in the interpreter and others as if written in a source file.

Having an interpreter available makes it easy to explore a problem space.

### Why Scala?

- Expressive

- First-class functions

- Closures

- Concise

- Type inference

- Literal syntax for function creation

- Java interoperability

- Can reuse java libraries

- Can reuse java tools

- No performance penalty

### How Scala?

- Compiles to java bytecode

- Works with any standard

JVM

- Or even some non-standard JVMs like Dalvik

- Scala compiler written by author of Java compiler

### Think Scala

Scala is not just a nicer Java. You should learn it with a fresh mind- you will get more out of these classes.

### Get Scala

Scala School’s examples work with [Scala 2.9.x](https://www.scala-lang.org/download/2.9.3.html) . If you use Scala 2.10.x or newer, *most* examples work OK, but not all.

### Start the Interpreter

Start the included `sbt console`.

```

$ sbt console

[...]

Welcome to Scala version 2.8.0.final (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_20).

Type in expressions to have them evaluated.

Type :help for more information.

scala>

```

## Expressions

```

scala> 1 + 1

res0: Int = 2

```

res0 is an automatically created value name given by the interpreter to the result of your expression. It has the type Int and contains the Integer 2.

(Almost) everything in Scala is an expression.

## Values

You can give the result of an expression a name.

```

scala> val two = 1 + 1

two: Int = 2

```

You cannot change the binding to a val.

### Variables

If you need to change the binding, you can use a `var` instead.

```

scala> var name = "steve"

name: java.lang.String = steve

scala> name = "marius"

name: java.lang.String = marius

```

## Functions

You can create functions with def.

```

scala> def addOne(m: Int): Int = m + 1

addOne: (m: Int)Int

```

In Scala, you need to specify the type signature for function parameters. The interpreter happily repeats the type signature back to you.

```

scala> val three = addOne(2)

three: Int = 3

```

You can leave off parens on functions with no arguments.

```

scala> def three() = 1 + 2

three: ()Int

scala> three()

res2: Int = 3

scala> three

res3: Int = 3

```

### Anonymous Functions

You can create anonymous functions.

```

scala> (x: Int) => x + 1

res2: (Int) => Int = <function1>

```

This function adds 1 to an Int named x.

```

scala> res2(1)

res3: Int = 2

```

You can pass anonymous functions around or save them into vals.

```

scala> val addOne = (x: Int) => x + 1

addOne: (Int) => Int = <function1>

scala> addOne(1)

res4: Int = 2

```

If your function is made up of many expressions, you can use {} to give yourself some breathing room.

```

def timesTwo(i: Int): Int = {

println("hello world")

i * 2

}

```

This is also true of an anonymous function.

```

scala> { i: Int =>

println("hello world")

i * 2

}

res0: (Int) => Int = <function1>

```

You will see this syntax often used when passing an anonymous function as an argument.

### Partial application

You can partially apply a function with an underscore, which gives you another function. Scala uses the underscore to mean different things in different contexts, but you can usually think of it as an unnamed magical wildcard. In the context of `{ _ + 2 }` it means an unnamed parameter. You can use it like so:

```

scala> def adder(m: Int, n: Int) = m + n

adder: (m: Int,n: Int)Int

scala> val add2 = adder(2, _:Int)

add2: (Int) => Int = <function1>

scala> add2(3)

res50: Int = 5

```

You can partially apply any argument in the argument list, not just the last one.

### Curried functions

Sometimes it makes sense to let people apply some arguments to your function now and others later.

Here’s an example of a function that lets you build multipliers of two numbers together. At one call site, you’ll decide which is the multiplier and at a later call site, you’ll choose a multiplicand.

```

scala> def multiply(m: Int)(n: Int): Int = m * n

multiply: (m: Int)(n: Int)Int

```

You can call it directly with both arguments.

```

scala> multiply(2)(3)

res0: Int = 6

```

You can fill in the first parameter and partially apply the second.

```

scala> val timesTwo = multiply(2) _

timesTwo: (Int) => Int = <function1>

scala> timesTwo(3)

res1: Int = 6

```

You can take any function of multiple arguments and curry it. Let’s try with our earlier `adder`

```

scala> val curriedAdd = (adder _).curried

curriedAdd: Int => (Int => Int) = <function1>

scala> val addTwo = curriedAdd(2)

addTwo: Int => Int = <function1>

scala> addTwo(4)

res22: Int = 6

```

### Variable length arguments

There is a special syntax for methods that can take parameters of a repeated type. To apply String’s `capitalize` function to several strings, you might write:

```

def capitalizeAll(args: String*) = {

args.map { arg =>

arg.capitalize

}

scala> capitalizeAll("rarity", "applejack")

res2: Seq[String] = ArrayBuffer(Rarity, Applejack)

```

## Classes

```

scala> class Calculator {

| val brand: String = "HP"

| def add(m: Int, n: Int): Int = m + n

| }

defined class Calculator

scala> val calc = new Calculator

calc: Calculator = Calculator@e75a11

scala> calc.add(1, 2)

res1: Int = 3

scala> calc.brand

res2: String = "HP"

```

Contained are examples defining methods with def and fields with val. Methods are just functions that can access the state of the class.

### Constructor

Constructors aren’t special methods, they are the code outside of method definitions in your class. Let’s extend our Calculator example to take a constructor argument and use it to initialize internal state.

```

class Calculator(brand: String) {

/**

* A constructor.

val color: String = if (brand == "TI") {

"blue"

} else if (brand == "HP") {

"black"

} else {

"white"

}

// An instance method.

def add(m: Int, n: Int): Int = m + n

}

```

Note the two different styles of comments.

You can use the constructor to construct an instance:

```

scala> val calc = new Calculator("HP")

calc: Calculator = Calculator@1e64cc4d

scala> calc.color

res0: String = black

```

### Expressions

Our Calculator example gave an example of how Scala is expression-oriented. The value color was bound based on an if/else expression. Scala is highly expression-oriented: most things are expressions rather than statements.

### Aside: Functions vs Methods

Functions and methods are largely interchangeable. Because functions and methods are so similar, you might not remember whether that *thing* you call is a function or a method. When you bump into a difference between methods and functions, it might confuse you.

```

scala> class C {

| var acc = 0

| def minc = { acc += 1 }

| val finc = { () => acc += 1 }

| }

defined class C

scala> val c = new C

c: C = C@1af1bd6

scala> c.minc // calls c.minc()

scala> c.finc // returns the function as a value:

res2: () => Unit = <function0>

```

When you can call one “function” without parentheses but not another, you might think *Whoops, I thought I knew how Scala functions worked, but I guess not. Maybe they sometimes need parentheses?* You might understand functions, but be using a method.

In practice, you can do great things in Scala while remaining hazy on the difference between methods and functions. If you’re new to Scala and read [explanations of the differences](https://www.google.com/search?q=difference+scala+function+method), you might have trouble following them. That doesn’t mean you’re going to have trouble using Scala. It just means that the difference between functions and methods is subtle enough such that explanations tend to dig into deep parts of the language.

## Inheritance

```

class ScientificCalculator(brand: String) extends Calculator(brand) {

def log(m: Double, base: Double) = math.log(m) / math.log(base)

}

```

**See Also** Effective Scala points out that a [Type alias](https://twitter.github.com/effectivescala/#Types and Generics-Type aliases) is better than `extends` if the subclass isn’t actually different from the superclass. A Tour of Scala describes [Subclassing](https://www.scala-lang.org/node/125).

### Overloading methods

```

class EvenMoreScientificCalculator(brand: String) extends ScientificCalculator(brand) {

def log(m: Int): Double = log(m, math.exp(1))

}

```

### Abstract Classes

You can define an *abstract class*, a class that defines some methods but does not implement them. Instead, subclasses that extend the abstract class define these methods. You can’t create an instance of an abstract class.

```

scala> abstract class Shape {

| def getArea():Int // subclass should define this

| }

defined class Shape

scala> class Circle(r: Int) extends Shape {

| def getArea():Int = { r * r * 3 }

| }

defined class Circle

scala> val s = new Shape

<console>:8: error: class Shape is abstract; cannot be instantiated

val s = new Shape

scala> val c = new Circle(2)

c: Circle = Circle@65c0035b

```

## Traits

`traits` are collections of fields and behaviors that you can extend or mixin to your classes.

```

trait Car {

val brand: String

}

trait Shiny {

val shineRefraction: Int

}

class BMW extends Car {

val brand = "BMW"

}

```

One class can extend several traits using the `with` keyword:

```

class BMW extends Car with Shiny {

val brand = "BMW"

val shineRefraction = 12

}

```

**See Also** Effective Scala has opinions about [trait](https://twitter.github.com/effectivescala/#Object oriented programming-Traits).

**When do you want a Trait instead of an Abstract Class?** If you want to define an interface-like type, you might find it difficult to choose between a trait or an abstract class. Either one lets you define a type with some behavior, asking extenders to define some other behavior. Some rules of thumb:

- Favor using traits. It’s handy that a class can extend several traits; a class can extend only one class.

- If you need a constructor parameter, use an abstract class. Abstract class constructors can take parameters; trait constructors can’t. For example, you can’t say `trait t(i: Int) {}`; the `i` parameter is illegal.

You are not the first person to ask this question. See fuller answers at [stackoverflow:Scala traits vs abstract classes](https://stackoverflow.com/questions/1991042/scala-traits-vs-abstract-classes), [Difference between Abstract Class and Trait](https://stackoverflow.com/questions/2005681/difference-between-abstract-class-and-trait), and [Programming in Scala: To trait, or not to trait?](https://www.artima.com/pins1ed/traits.html#12.7)

## Types

Earlier, you saw that we defined a function that took an `Int` which is a type of Number. Functions can also be generic and work on any type. When that occurs, you’ll see a type parameter introduced with the square bracket syntax. Here’s an example of a Cache of generic Keys and Values.

```

trait Cache[K, V] {

def get(key: K): V

def put(key: K, value: V)

def delete(key: K)

}

```

Methods can also have type parameters introduced.

```

def remove[K](key: K)

```

Built at [@twitter](https://twitter.com/twitter) by [@stevej](https://twitter.com/stevej), [@marius](https://twitter.com/marius), and [@lahosken](https://twitter.com/lahosken) with much help from [@evanm](https://twitter.com/evanm), [@sprsquish](https://twitter.com/sprsquish), [@kevino](https://twitter.com/kevino), [@zuercher](https://twitter.com/zuercher), [@timtrueman](https://twitter.com/timtrueman), [@wickman](https://twitter.com/wickman), [@mccv](https://twitter.com/mccv) and [@garciparedes](https://github.com/garciparedes); Russian translation by [appigram](https://github.com/appigram); Chinese simple translation by [jasonqu](https://github.com/jasonqu); Korean translation by [enshahar](https://github.com/enshahar);

Tuesday, November 14, 2017

sublime 3

https://pypi.python.org/pypi/rsub/1.0.2
ssh -R 52698:localhost:52698 ling@ling.bf2.tumblr.net
rsub -f root_setup.sh

(base) new-host-2:~ ling$ open /Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl
(base) new-host-2:~ ling$ ln -s "/Applications/Sublime Text.app/Contents/SharedSupport/bin/subl" /usr/local/bin/sublime

export PATH=/usr/local/bin:$PATH
open ~/.bash_profile

Monday, August 14, 2017

Why your Neural Network is not working

New activation function

Adaptive learning rate

Early stopping

Regularization

Dropout

Training not good

- activation function

sigmoid

too many layers, vanishing gradient problem (for smaller gradients, learn very slowly, almost random,. for larger gradients, learn very fast, already converged), which leads to bad performance in the training

an intuitive way to compute the derivatives, map (negative infinity to positive infinity ) to (0, 1), after many layers, the gradient is small

relu

- fast to compute, biological reason, infinite sigmoid with different bias, handle vanishing gradient issues

leaky relu

maxout - learn activation functions

- adaptive learning rate

Error surface can be very complex when training NN.

Smaller learning rate is in some areas, but larger learning rate is in some areas, maybe could do RMSProp for learning rate

Adam - RMSProp + Momentum

Evaluation not good

- early stop

total loss decreasing in the training but the total loss increases in the evaluation. training should stop when loss minimized in both datasets

- regularization

new loss function to be minimized when finding a set of weights no only minimizing original costs but also close to zero.

1 Start with a simple model that is known to work for this type of data (for example, VGG for images). Use a standard loss if possible.

2 Turn off all bells and whistles, e.g. regularization and data augmentation.

3 If finetuning a model, double check the preprocessing, for it should be the same as the original model’s training.

4 Verify that the input data is correct.

5 Start with a really small dataset (2–20 samples). Overfit on it and gradually add more data.

6 Start gradually adding back all the pieces that were omitted: augmentation/regularization, custom loss functions, try more complex models.

Why your Neural Network is not working

1. Check your input data

Check if the input data you are feeding the network makes sense. For example, I’ve more than once mixed the width and the height of an image. Sometimes, I would feed all zeroes by mistake. Or I would use the same batch over and over. So print/display a couple of batches of input and target output and make sure they are OK.

2. Try random input

Try passing random numbers instead of actual data and see if the error behaves the same way. If it does, it’s a sure sign that your net is turning data into garbage at some point. Try debugging layer by layer /op by op/ and see where things go wrong.

3. Check the data loader

Your data might be fine but the code that passes the input to the net might be broken. Print the input of the first layer before any operations and check it.

4. Make sure input is connected to output

Check if a few input samples have the correct labels. Also make sure shuffling input samples works the same way for output labels.

5. Is the relationship between input and output too random?

Maybe the non-random part of the relationship between the input and output is too small compared to the random part (one could argue that stock prices are like this). I.e. the input are not sufficiently related to the output. There isn’t an universal way to detect this as it depends on the nature of the data.

6. Is there too much noise in the dataset?

This happened to me once when I scraped an image dataset off a food site. There were so many bad labels that the network couldn’t learn. Check a bunch of input samples manually and see if labels seem off.

The cutoff point is up for debate, as this paper got above 50% accuracy on MNIST using 50% corrupted labels.

7. Shuffle the dataset

If your dataset hasn’t been shuffled and has a particular order to it (ordered by label) this could negatively impact the learning. Shuffle your dataset to avoid this. Make sure you are shuffling input and labels together.

8. Reduce class imbalance

Are there a 1000 class A images for every class B image? Then you might need to balance your loss function or try other class imbalance approaches.

9. Do you have enough training examples?

If you are training a net from scratch (i.e. not finetuning), you probably need lots of data. For image classification, people say you need a 1000 images per class or more.

10. Make sure your batches don’t contain a single label

This can happen in a sorted dataset (i.e. the first 10k samples contain the same class). Easily fixable by shuffling the dataset.

11. Reduce batch size

This paper points out that having a very large batch can reduce the generalization ability of the model.

12. Standardize the features

Did you standardize your input to have zero mean and unit variance?

13. Do you have too much data augmentation?

Augmentation has a regularizing effect. Too much of this combined with other forms of regularization (weight L2, dropout, etc.) can cause the net to underfit.

14. Check the preprocessing of your pretrained model

If you are using a pretrained model, make sure you are using the same normalization and preprocessing as the model was when training. For example, should an image pixel be in the range [0, 1], [-1, 1] or [0, 255]?

15. Check the preprocessing for train/validation/test set

“… any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation/test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. “

Also, check for different preprocessing in each sample or batch.

16. Try solving a simpler version of the problem

This will help with finding where the issue is. For example, if the target output is an object class and coordinates, try limiting the prediction to object class only.

17. Look for correct loss “at chance”

Again from the excellent CS231n: Initialize with small parameters, without regularization. For example, if we have 10 classes, at chance means we will get the correct class 10% of the time, and the Softmax loss is the negative log probability of the correct class so: -ln(0.1) = 2.302.

After this, try increasing the regularization strength which should increase the loss.

18. Check your loss function

If you implemented your own loss function, check it for bugs and add unit tests. Often, my loss would be slightly incorrect and hurt the performance of the network in a subtle way.

19. Verify loss input

If you are using a loss function provided by your framework, make sure you are passing to it what it expects. For example, in PyTorch I would mix up the NLLLoss and CrossEntropyLoss as the former requires a softmax input and the latter doesn’t.

20. Adjust loss weights

If your loss is composed of several smaller loss functions, make sure their magnitude relative to each is correct. This might involve testing different combinations of loss weights.

21. Monitor other metrics

Sometimes the loss is not the best predictor of whether your network is training properly. If you can, use other metrics like accuracy.

22. Test any custom layers

Did you implement any of the layers in the network yourself? Check and double-check to make sure they are working as intended.

23. Check for “frozen” layers or variables

Check if you unintentionally disabled gradient updates for some layers/variables that should be learnable.

24. Increase network size

Maybe the expressive power of your network is not enough to capture the target function. Try adding more layers or more hidden units in fully connected layers.

25. Check for hidden dimension errors

If your input looks like (k, H, W) = (64, 64, 64) it’s easy to miss errors related to wrong dimensions. Use weird numbers for input dimensions (for example, different prime numbers for each dimension) and check how they propagate through the network.

26. Explore Gradient checking

If you implemented Gradient Descent by hand, gradient checking makes sure that your backpropagation works like it should. More info: 1 2 3.

27. Solve for a really small dataset

Overfit a small subset of the data and make sure it works. For example, train with just 1 or 2 examples and see if your network can learn to differentiate these. Move on to more samples per class.

28. Check weights initialization

If unsure, use Xavier or He initialization. Also, your initialization might be leading you to a bad local minimum, so try a different initialization and see if it helps.

29. Change your hyperparameters

Maybe you using a particularly bad set of hyperparameters. If feasible, try a grid search.

30. Reduce regularization

Too much regularization can cause the network to underfit badly. Reduce regularization such as dropout, batch norm, weight/bias L2 regularization, etc. In the excellent “Practical Deep Learning for coders” course, Jeremy Howard advises getting rid of underfitting first. This means you overfit the training data sufficiently, and only then addressing overfitting.

31. Give it time

Maybe your network needs more time to train before it starts making meaningful predictions. If your loss is steadily decreasing, let it train some more.

32. Switch from Train to Test mode

Some frameworks have layers like Batch Norm, Dropout, and other layers behave differently during training and testing. Switching to the appropriate mode might help your network to predict properly.

33. Visualize the training

Monitor the activations, weights, and updates of each layer. Make sure their magnitudes match. For example, the magnitude of the updates to the parameters (weights and biases) should be 1-e3.

Consider a visualization library like Tensorboard and Crayon. In a pinch, you can also print weights/biases/activations.

Be on the lookout for layer activations with a mean much larger than 0. Try Batch Norm or ELUs.

Deeplearning4j points out what to expect in histograms of weights and biases:

“For weights, these histograms should have an approximately Gaussian (normal) distribution, after some time. For biases, these histograms will generally start at 0, and will usually end up being approximately Gaussian (One exception to this is for LSTM). Keep an eye out for parameters that are diverging to +/- infinity. Keep an eye out for biases that become very large. This can sometimes occur in the output layer for classification if the distribution of classes is very imbalanced.”

Check layer updates, they should have a Gaussian distribution.

34. Try a different optimizer

Your choice of optimizer shouldn’t prevent your network from training unless you have selected particularly bad hyperparameters. However, the proper optimizer for a task can be helpful in getting the most training in the shortest amount of time. The paper which describes the algorithm you are using should specify the optimizer. If not, I tend to use Adam or plain SGD with momentum.

Check this excellent post by Sebastian Ruder to learn more about gradient descent optimizers.

35. Exploding / Vanishing gradients

Check layer updates, as very large values can indicate exploding gradients. Gradient clipping may help.

Check layer activations. From Deeplearning4j comes a great guideline: “A good standard deviation for the activations is on the order of 0.5 to 2.0. Significantly outside of this range may indicate vanishing or exploding activations.”

36. Increase/Decrease Learning Rate

A low learning rate will cause your model to converge very slowly.

A high learning rate will quickly decrease the loss in the beginning but might have a hard time finding a good solution.

Play around with your current learning rate by multiplying it by 0.1 or 10.

37. Overcoming NaNs

Getting a NaN (Non-a-Number) is a much bigger issue when training RNNs (from what I hear). Some approaches to fix it:

Decrease the learning rate, especially if you are getting NaNs in the first 100 iterations.

NaNs can arise from division by zero or natural log of zero or negative number.

Russell Stewart has great pointers on how to deal with NaNs.

Try evaluating your network layer by layer and see where the NaNs appear.

Machine Learning Steps

Typical big data jobs include programming, resource provisioning, handling growing scale, reliability, deployment & configuration, utilization improvements, performance tuning, monitoring.

Extracting, loading, transformating, cleaning and validating data for use in analytics
Designing pipelines and architectures form data processing
Creating and maintaining machine learning and statistical models
Querying datasets, visualizing query results and creating reports

Sample

- partitioning of data sets (into train, test, and validate data sets)

- random sampling

Explore

- Data visualization

- Clustering, Associations

Modify

- Variable Selection

- Data transformation

- Data imputation

Outlier detection

residual analysis for outlier detection

https://towardsdatascience.com/a-bunch-of-tips-and-tricks-for-training-deep-neural-networks-3ca24c31ddc8

Wednesday, August 9, 2017

d3

 brew install n

 n lts

 sudo n lts

 node --version

 npm --version

 mkdir d3-project

 cd d3-project

 npm init -y

 npm install "babel-core@^6" "babel-loader@^6" "babel-preset-es2017@^6" "babel-preset-stage-0@^6" "webpack@^2" "webpack-dev-server@^2" css-loader style-loader json-loader --save-dev

python -m http.server 8000

Collapsible Tee -
https://bl.ocks.org/mbostock/4339083

Tuesday, June 20, 2017

Data Innovation

Data innovation will provide competitive differentiation and enhance the end user experience

Delivering compelling, differentiated customer experiences in our key brands is key to winning market share, and many of these experiences require extensive, innovative use of data. With the mail property as an example, we look in this section at that property’s roadmap for 2017, and highlight the key roadmap features that will require significant data support. The degree to which data is linked to differentiated capabilities is not because mail as a property is different than other applications. The innovative use of data when fully factored into strategy is expected to play a central role in promoting growth in most areas of the business.

* Rich Compose - This feature helps users take advantage of rich content when sending messages, which could include attachments, GIFs, images, and video. Content recommendations would be data driven, requiring understanding of the user and their immediate intent.

* Contacts - Building up an understanding of a mail user’s contacts requires sophisticated analysis of that user’s emails, including who they send mail to or receive mail from. Additionally, messages sent to the user from social network contacts can generate emails to the user from the social network which can be analyzed to gain knowledge of these social network contacts. Deriving the contact details associated with a comprehensive set of contacts (name, relationship, email, phone number, address) from the data in each inbox brings powerful benefits to the mail user. Additionally, from the mobile mail application we can often inherit contact lists, call-logs and message histories, with associated caller-ID. This data can improve the handling of unknown or undesired callers. Advanced CallerID experiences require data analysis, digestion and recommendations, particularly for B2C experiences, for example when your airline calls you. Finally, we have the opportunity to construct a “global graph” from all of our users which can be leveraged both to protect as well as to provide differentiated features for each individual user.

* Coupons - Many users care a great deal about deals that have the potential to reduce the cost of goods and services. Our mailboxes contain an enormous number of discount offers made available to our customers. Organizing these coupons and making them available to customers at the appropriate time, based on relevance and expiration date for example, has the potential to delight our customers and to increase mail usage. Developing powerful coupon recommendation capabilities will require understanding when the coupons are set to expire, where geographically they are valid, and when they are of highest value (for example we could alert users when they are near the store where their coupon is valid). Finally, our goal is to develop a global coupon extraction system, so all of our users can receive recommendations that tap into the overall Coupon pool.

* Photos - Understanding the content within a photo has significant benefits to enhancing search relevance. A mail user who searches for “beach vacation” would be delighted to find the sought after email where the only relevant information was an attached photo that shows a tropical beach. Leveraging vision systems to analyze user photo’s enables a set of compelling use-cases (including search, auto-tagging, and object recognition). providing the ability to group photos by similarity, date, or included objects has the potential to allow powerful new ways for users to leverage their mail.

* Content Organization - To assist our users in organizing their mailboxes so that the content can be more easily accessed, we are working to provide differentiated experiences. Email classification underpins some of the planned improvements. Our goal is to to build browsing experiences for common use cases, such as: flights, hotels, subscriptions, and finance. Providing a simplified way to unsubscribe from a subscription would be an example of a specific goal.

* Personalization - Characterizing our users and giving them a personalized mail experience based on their usage has the potential to make mail more usable and efficient. For example, users who are running a business on our mail system have very different needs and expectations than grandparents who are staying in touch with their families. Recognizing and categorizing users and personalizing their experience has the potential to drive higher satisfaction and retention.

* Notifications - For all of our planned scenarios, we also need to consider when to trigger notifications, at the right time, at the right interval. It is imperative that we not overload the users with such signals. We need to analyze all available data to understand when notification will be welcome, and will trigger engagement, and not have the opposite effect. The collection of GPS signals can be very helpful to generating a notification at the optimal time to inform a user about a relevant local deals or coupon.

Great brands are powered by cutting-edge technology.

Technology fuels the mobile experiences people love—from the software that delivers the content you crave to the code that powers your favorite communication apps.
We’re engineering one of the industry’s most comprehensive ad technology stack across mobile, video, search, native and programmatic to maximize results for our partners.
Our research and engineering talent solves some of the industry’s biggest challenges in infrastructure, data, AI, machine learning and more.
We’re building on our rich foundation of technical innovation as we break new ground with Verizon.

We design for consumers first.

We listen to consumers through research, user engagement and user feedback to build the experiences they love.
We build mobile experiences for everyone. Products that are developed with every individual in mind, including users with disabilities, are better products.
We abide by the highest standards of accountability and transparency to protect our consumers’ and customers’ data.
We benefit from Verizon’s experience and resources to further strengthen our security.

We build technology for scale.

We build products that reach one billion users, and share our technologies with the world.

Every part of our business is fueled by machine learning, improving our brands and products with better image recognition, advertising targeting, search rankings, content personalization and e-commerce recommendations.
We’re a partner to all. We frequently work with the open source community, tech industry counterparts and academic peers to build the best possible products.

Sunday, June 4, 2017

Keras

import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (7,7) # Make the figures a bit bigger

from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils

# ## Load Training Data
nb_classes = 10 # number of outputs = number of digits

# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print("X_train original shape", X_train.shape)
print("y_train original shape", y_train.shape)

for i in range(9):
plt.subplot(3,3,i+1)
plt.imshow(X_train[i], cmap='gray', interpolation='none')
plt.title("Class {}".format(y_train[i]))

# ## Format the data for training

## reshape 28*28 = 784
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')

## normalize
X_train /= 255
X_test /= 255
print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)

# In[6]:
## one-hot format - convert class vectors to binary class matrix
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

# ## Build the NN
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()

# ## Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# ## Train the model
history = model.fit(X_train, Y_train,
batch_size=128,
epochs=4,
verbose=1,
validation_data=(X_test, Y_test))

# ## Evaluate the performance
score = model.evaluate(X_test, Y_test, verbose=1)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])

# ## Inspecting the output
predicted_classes = model.predict_classes(X_test)

# Check which items we got right / wrong
correct_indices = np.nonzero(predicted_classes == y_test)[0]
incorrect_indices = np.nonzero(predicted_classes != y_test)[0]

plt.figure()
for i, correct in enumerate(correct_indices[:9]):
plt.subplot(3,3,i+1)
plt.imshow(X_test[correct].reshape(28,28), cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_test[correct]))
plt.show()

plt.figure()
for i, incorrect in enumerate(incorrect_indices[:9]):
plt.subplot(3,3,i+1)
plt.imshow(X_test[incorrect].reshape(28,28), cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test[incorrect]))

plt.show()

# ## List all data in history

print(history.history.keys())

# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

https://github.com/wxs/keras-mnist-tutorial/blob/master/MNIST%20in%20Keras.ipynb