Tuesday, November 14, 2017

sublime 3

https://pypi.python.org/pypi/rsub/1.0.2
ssh -R 52698:localhost:52698  ling@ling.bf2.tumblr.net
rsub -f root_setup.sh

(base) new-host-2:~ ling$ open /Applications/Sublime\ Text.app/Contents/SharedSupport/bin/subl
(base) new-host-2:~ ling$ ln -s "/Applications/Sublime Text.app/Contents/SharedSupport/bin/subl" /usr/local/bin/sublime


export PATH=/usr/local/bin:$PATH
open ~/.bash_profile



Monday, August 14, 2017

Why your Neural Network is not working

New activation function
Adaptive learning rate
Early stopping
Regularization
Dropout

Training not good

- activation function
sigmoid 
too many layers, vanishing gradient problem (for smaller gradients, learn very slowly, almost random,. for larger gradients, learn very fast, already converged), which leads to bad performance in the training
an intuitive way to compute the derivatives,  map (negative infinity to positive infinity ) to (0, 1), after many layers, the gradient is small
relu
- fast to compute, biological reason, infinite sigmoid with different bias, handle vanishing gradient issues
leaky relu
maxout - learn activation functions

- adaptive learning rate
Error surface can be very complex when training NN.
Smaller learning rate is in some areas, but larger learning rate is in some areas, maybe could do RMSProp for learning rate
Adam - RMSProp + Momentum

Evaluation not good

- early stop
total loss decreasing in the training but the total loss increases in the evaluation. training should stop when loss minimized in both datasets

- regularization
new loss function to be minimized when finding a set of weights no only minimizing original costs but also close to zero.

1 Start with a simple model that is known to work for this type of data (for example, VGG for images). Use a standard loss if possible.
2 Turn off all bells and whistles, e.g. regularization and data augmentation.
3 If finetuning a model, double check the preprocessing, for it should be the same as the original model’s training.
4 Verify that the input data is correct.
5 Start with a really small dataset (2–20 samples). Overfit on it and gradually add more data.
6 Start gradually adding back all the pieces that were omitted: augmentation/regularization, custom loss functions, try more complex models.

Why your Neural Network is not working
1. Check your input data
Check if the input data you are feeding the network makes sense. For example, I’ve more than once mixed the width and the height of an image. Sometimes, I would feed all zeroes by mistake. Or I would use the same batch over and over. So print/display a couple of batches of input and target output and make sure they are OK.
2. Try random input
Try passing random numbers instead of actual data and see if the error behaves the same way. If it does, it’s a sure sign that your net is turning data into garbage at some point. Try debugging layer by layer /op by op/ and see where things go wrong.
3. Check the data loader
Your data might be fine but the code that passes the input to the net might be broken. Print the input of the first layer before any operations and check it.
4. Make sure input is connected to output
Check if a few input samples have the correct labels. Also make sure shuffling input samples works the same way for output labels.
5. Is the relationship between input and output too random?
Maybe the non-random part of the relationship between the input and output is too small compared to the random part (one could argue that stock prices are like this). I.e. the input are not sufficiently related to the output. There isn’t an universal way to detect this as it depends on the nature of the data.
6. Is there too much noise in the dataset?
This happened to me once when I scraped an image dataset off a food site. There were so many bad labels that the network couldn’t learn. Check a bunch of input samples manually and see if labels seem off.
The cutoff point is up for debate, as this paper got above 50% accuracy on MNIST using 50% corrupted labels.
7. Shuffle the dataset
If your dataset hasn’t been shuffled and has a particular order to it (ordered by label) this could negatively impact the learning. Shuffle your dataset to avoid this. Make sure you are shuffling input and labels together.
8. Reduce class imbalance
Are there a 1000 class A images for every class B image? Then you might need to balance your loss function or try other class imbalance approaches.
9. Do you have enough training examples?
If you are training a net from scratch (i.e. not finetuning), you probably need lots of data. For image classification, people say you need a 1000 images per class or more.
10. Make sure your batches don’t contain a single label
This can happen in a sorted dataset (i.e. the first 10k samples contain the same class). Easily fixable by shuffling the dataset.
11. Reduce batch size
This paper points out that having a very large batch can reduce the generalization ability of the model.
12. Standardize the features
Did you standardize your input to have zero mean and unit variance?
13. Do you have too much data augmentation?
Augmentation has a regularizing effect. Too much of this combined with other forms of regularization (weight L2, dropout, etc.) can cause the net to underfit.
14. Check the preprocessing of your pretrained model
If you are using a pretrained model, make sure you are using the same normalization and preprocessing as the model was when training. For example, should an image pixel be in the range [0, 1], [-1, 1] or [0, 255]?
15. Check the preprocessing for train/validation/test set
“… any preprocessing statistics (e.g. the data mean) must only be computed on the training data, and then applied to the validation/test data. E.g. computing the mean and subtracting it from every image across the entire dataset and then splitting the data into train/val/test splits would be a mistake. “
Also, check for different preprocessing in each sample or batch.
16. Try solving a simpler version of the problem
This will help with finding where the issue is. For example, if the target output is an object class and coordinates, try limiting the prediction to object class only.
17. Look for correct loss “at chance”
Again from the excellent CS231n: Initialize with small parameters, without regularization. For example, if we have 10 classes, at chance means we will get the correct class 10% of the time, and the Softmax loss is the negative log probability of the correct class so: -ln(0.1) = 2.302.
After this, try increasing the regularization strength which should increase the loss.
18. Check your loss function
If you implemented your own loss function, check it for bugs and add unit tests. Often, my loss would be slightly incorrect and hurt the performance of the network in a subtle way.
19. Verify loss input
If you are using a loss function provided by your framework, make sure you are passing to it what it expects. For example, in PyTorch I would mix up the NLLLoss and CrossEntropyLoss as the former requires a softmax input and the latter doesn’t.
20. Adjust loss weights
If your loss is composed of several smaller loss functions, make sure their magnitude relative to each is correct. This might involve testing different combinations of loss weights.
21. Monitor other metrics
Sometimes the loss is not the best predictor of whether your network is training properly. If you can, use other metrics like accuracy.
22. Test any custom layers
Did you implement any of the layers in the network yourself? Check and double-check to make sure they are working as intended.
23. Check for “frozen” layers or variables
Check if you unintentionally disabled gradient updates for some layers/variables that should be learnable.
24. Increase network size
Maybe the expressive power of your network is not enough to capture the target function. Try adding more layers or more hidden units in fully connected layers.
25. Check for hidden dimension errors
If your input looks like (k, H, W) = (64, 64, 64) it’s easy to miss errors related to wrong dimensions. Use weird numbers for input dimensions (for example, different prime numbers for each dimension) and check how they propagate through the network.
26. Explore Gradient checking
If you implemented Gradient Descent by hand, gradient checking makes sure that your backpropagation works like it should. More info: 1 2 3.
27. Solve for a really small dataset
Overfit a small subset of the data and make sure it works. For example, train with just 1 or 2 examples and see if your network can learn to differentiate these. Move on to more samples per class.
28. Check weights initialization
If unsure, use Xavier or He initialization. Also, your initialization might be leading you to a bad local minimum, so try a different initialization and see if it helps.
29. Change your hyperparameters
Maybe you using a particularly bad set of hyperparameters. If feasible, try a grid search.
30. Reduce regularization
Too much regularization can cause the network to underfit badly. Reduce regularization such as dropout, batch norm, weight/bias L2 regularization, etc. In the excellent “Practical Deep Learning for coders” course, Jeremy Howard advises getting rid of underfitting first. This means you overfit the training data sufficiently, and only then addressing overfitting.
31. Give it time
Maybe your network needs more time to train before it starts making meaningful predictions. If your loss is steadily decreasing, let it train some more.
32. Switch from Train to Test mode
Some frameworks have layers like Batch Norm, Dropout, and other layers behave differently during training and testing. Switching to the appropriate mode might help your network to predict properly.
33. Visualize the training
Monitor the activations, weights, and updates of each layer. Make sure their magnitudes match. For example, the magnitude of the updates to the parameters (weights and biases) should be 1-e3.
Consider a visualization library like Tensorboard and Crayon. In a pinch, you can also print weights/biases/activations.
Be on the lookout for layer activations with a mean much larger than 0. Try Batch Norm or ELUs.
Deeplearning4j points out what to expect in histograms of weights and biases:
“For weights, these histograms should have an approximately Gaussian (normal) distribution, after some time. For biases, these histograms will generally start at 0, and will usually end up being approximately Gaussian (One exception to this is for LSTM). Keep an eye out for parameters that are diverging to +/- infinity. Keep an eye out for biases that become very large. This can sometimes occur in the output layer for classification if the distribution of classes is very imbalanced.”
Check layer updates, they should have a Gaussian distribution.
34. Try a different optimizer
Your choice of optimizer shouldn’t prevent your network from training unless you have selected particularly bad hyperparameters. However, the proper optimizer for a task can be helpful in getting the most training in the shortest amount of time. The paper which describes the algorithm you are using should specify the optimizer. If not, I tend to use Adam or plain SGD with momentum.
Check this excellent post by Sebastian Ruder to learn more about gradient descent optimizers.
35. Exploding / Vanishing gradients
Check layer updates, as very large values can indicate exploding gradients. Gradient clipping may help.
Check layer activations. From Deeplearning4j comes a great guideline: “A good standard deviation for the activations is on the order of 0.5 to 2.0. Significantly outside of this range may indicate vanishing or exploding activations.”
36. Increase/Decrease Learning Rate
A low learning rate will cause your model to converge very slowly.
A high learning rate will quickly decrease the loss in the beginning but might have a hard time finding a good solution.
Play around with your current learning rate by multiplying it by 0.1 or 10.
37. Overcoming NaNs
Getting a NaN (Non-a-Number) is a much bigger issue when training RNNs (from what I hear). Some approaches to fix it:
Decrease the learning rate, especially if you are getting NaNs in the first 100 iterations.
NaNs can arise from division by zero or natural log of zero or negative number.
Russell Stewart has great pointers on how to deal with NaNs.
Try evaluating your network layer by layer and see where the NaNs appear.

Machine Learning Steps

Typical big data jobs include programming, resource provisioning, handling growing scale, reliability, deployment & configuration, utilization improvements, performance tuning, monitoring.

Extracting, loading, transformating, cleaning and validating data for use in analytics
Designing pipelines and architectures form data processing
Creating and maintaining machine learning and statistical models
Querying datasets, visualizing query results and creating reports

Sample
- partitioning of data sets (into train, test, and validate data sets)
- random sampling

Explore
- Data visualization
- Clustering, Associations

Modify
- Variable Selection
- Data transformation
- Data imputation

Outlier detection
 residual analysis for outlier detection

https://towardsdatascience.com/a-bunch-of-tips-and-tricks-for-training-deep-neural-networks-3ca24c31ddc8

Wednesday, August 9, 2017

d3

 brew install n
 n lts
 sudo n lts
 node --version
 npm --version
 mkdir d3-project
 cd d3-project
 npm init -y

 npm install "babel-core@^6" "babel-loader@^6" "babel-preset-es2017@^6" "babel-preset-stage-0@^6" "webpack@^2" "webpack-dev-server@^2" css-loader style-loader json-loader --save-dev


python -m http.server 8000

Collapsible Tee -
https://bl.ocks.org/mbostock/4339083

Tuesday, June 20, 2017

Data Innovation



Data innovation will provide competitive differentiation and enhance the end user experience

Delivering compelling, differentiated customer experiences in our key brands is key to winning market share, and many of these experiences require extensive, innovative use of data. With the mail property as an example, we look in this section at that property’s roadmap for 2017, and highlight the key roadmap features that will require significant data support. The degree to which data is linked to differentiated capabilities is not because mail as a property is different than other applications. The innovative use of data when fully factored into strategy is expected to play a central role in promoting growth in most areas of the business.

* Rich Compose - This feature helps users take advantage of rich content when sending messages, which could include attachments, GIFs, images, and video. Content recommendations would be data driven, requiring understanding of the user and their immediate intent.

* Contacts - Building up an understanding of a mail user’s contacts requires sophisticated analysis of that user’s emails, including who they send mail to or receive mail from. Additionally, messages sent to the user from social network contacts can generate emails to the user from the social network which can be analyzed to gain knowledge of these social network contacts. Deriving the contact details associated with a comprehensive set of contacts (name, relationship, email, phone number, address) from the data in each inbox brings powerful benefits to the mail user. Additionally, from the mobile mail application we can often inherit contact lists, call-logs and message histories, with associated caller-ID. This data can improve the handling of unknown or undesired callers. Advanced CallerID experiences require data analysis, digestion and recommendations, particularly for B2C experiences, for example when your airline calls you. Finally, we have the opportunity to construct a “global graph” from all of our users which can be leveraged both to protect as well as to provide differentiated features for each individual user.

* Coupons - Many users care a great deal about deals that have the potential to reduce the cost of goods and services. Our mailboxes contain an enormous number of discount offers made available to our customers. Organizing these coupons and making them available to customers at the appropriate time, based on relevance and expiration date for example, has the potential to delight our customers and to increase mail usage. Developing powerful coupon recommendation capabilities will require understanding when the coupons are set to expire, where geographically they are valid, and when they are of highest value (for example we could alert users when they are near the store where their coupon is valid). Finally, our goal is to develop a global coupon extraction system, so all of our users can receive recommendations that tap into the overall Coupon pool.

* Photos - Understanding the content within a photo has significant benefits to enhancing search relevance. A mail user who searches for “beach vacation” would be delighted to find the sought after email where the only relevant information was an attached photo that shows a tropical beach. Leveraging vision systems to analyze user photo’s enables a set of compelling use-cases (including search, auto-tagging, and object recognition). providing the ability to group photos by similarity, date, or included objects has the potential to allow powerful new ways for users to leverage their mail.

* Content Organization - To assist our users in organizing their mailboxes so that the content can be more easily accessed, we are working to provide differentiated experiences. Email classification underpins some of the planned improvements. Our goal is to to build browsing experiences for common use cases, such as: flights, hotels, subscriptions, and finance. Providing a simplified way to unsubscribe from a subscription would be an example of a specific goal.

* Personalization - Characterizing our users and giving them a personalized mail experience based on their usage has the potential to make mail more usable and efficient. For example, users who are running a business on our mail system have very different needs and expectations than grandparents who are staying in touch with their families. Recognizing and categorizing users and personalizing their experience has the potential to drive higher satisfaction and retention.

* Notifications - For all of our planned scenarios, we also need to consider when to trigger notifications, at the right time, at the right interval. It is imperative that we not overload the users with such signals. We need to analyze all available data to understand when notification will be welcome, and will trigger engagement, and not have the opposite effect. The collection of GPS signals can be very helpful to generating a notification at the optimal time to inform a user about a relevant local deals or coupon.

Great brands are powered by cutting-edge technology.
  • Technology fuels the mobile experiences people love—from the software that delivers the content you crave to the code that powers your favorite communication apps.
  • We’re engineering one of the industry’s most comprehensive ad technology stack across mobile, video, search, native and programmatic to maximize results for our partners.
  • Our research and engineering talent solves some of the industry’s biggest challenges in infrastructure, data, AI, machine learning and more.
  • We’re building on our rich foundation of technical innovation as we break new ground with Verizon.

We design for consumers first.
  • We listen to consumers through research, user engagement and user feedback to build the experiences they love.
  • We build mobile experiences for everyone. Products that are developed with every individual in mind, including users with disabilities, are better products.
  • We abide by the highest standards of accountability and transparency to protect our consumers’ and customers’ data.
  • We benefit from Verizon’s experience and resources to further strengthen our security.

We build technology for scale.
  • We build products that reach one billion users, and share our technologies with the world.
  • Every part of our business is fueled by machine learning, improving our brands and products with better image recognition, advertising targeting, search rankings, content personalization and e-commerce recommendations.
  • We’re a partner to all. We frequently work with the open source community, tech industry counterparts and academic peers to build the best possible products.

Sunday, June 4, 2017

Keras

import numpy as np
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (7,7) # Make the figures a bit bigger


from keras.datasets import mnist
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.utils import np_utils

# ## Load Training Data
nb_classes = 10 # number of outputs = number of digits

# the data, shuffled and split between tran and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
print("X_train original shape", X_train.shape)
print("y_train original shape", y_train.shape)


for i in range(9):
plt.subplot(3,3,i+1)
plt.imshow(X_train[i], cmap='gray', interpolation='none')
plt.title("Class {}".format(y_train[i]))

# ## Format the data for training

## reshape 28*28 = 784
X_train = X_train.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)
X_train = X_train.astype('float32')
X_test = X_test.astype('float32')


## normalize
X_train /= 255
X_test /= 255
print("Training matrix shape", X_train.shape)
print("Testing matrix shape", X_test.shape)


# In[6]:
## one-hot format - convert class vectors to binary class matrix
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

# ## Build the NN
model = Sequential()
model.add(Dense(512, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.2))

model.add(Dense(10))
model.add(Activation('softmax'))
model.summary()

# ## Compile the model
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
# ## Train the model
history = model.fit(X_train, Y_train,
batch_size=128,
epochs=4,
verbose=1,
validation_data=(X_test, Y_test))

# ## Evaluate the performance
score = model.evaluate(X_test, Y_test, verbose=1)
print("\nTest score:", score[0])
print('Test accuracy:', score[1])

# ## Inspecting the output
predicted_classes = model.predict_classes(X_test)
# Check which items we got right / wrong
correct_indices = np.nonzero(predicted_classes == y_test)[0]
incorrect_indices = np.nonzero(predicted_classes != y_test)[0]
plt.figure()
for i, correct in enumerate(correct_indices[:9]):
plt.subplot(3,3,i+1)
plt.imshow(X_test[correct].reshape(28,28), cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_test[correct]))
plt.show()

plt.figure()
for i, incorrect in enumerate(incorrect_indices[:9]):
plt.subplot(3,3,i+1)
plt.imshow(X_test[incorrect].reshape(28,28), cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_test[incorrect]))

plt.show()

# ## List all data in history
print(history.history.keys())

# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()


https://github.com/wxs/keras-mnist-tutorial/blob/master/MNIST%20in%20Keras.ipynb



Wednesday, May 24, 2017

Bayesian using Stan

#########################################################
## stan
#########################################################
data{
  int<lower=0> N;
  vector[N] Value;
  vector[N] SqFt;
}
parameters
{
  real alpha;
  real beta;
  real<lower=0> sigma;
}

model
{
  Value ~ normal(alpha + beta*SqFt, sigma);
}



#########################################################
## R
#########################################################
housing <- read.table("housing1.csv", sep=",", header=TRUE, stringsAsFactors = FALSE)
head(housing)
mod1 <- lm(ValuePerSqFt ~ SqFt, data=housing)
summary(mod1)

fit = stan(file='y.stan', data=list(N=nrow(housing),
           Value=housing$Value,
           SqFt=housing$SqFt), iter=10)


Monday, May 8, 2017

Windowing in Hive

1 Partition specification: It includes a column reference from the table. It could not be any aggregation or other window specification.

- SELECT fname,ip, COUNT(pid) OVER (PARTITION BY ip) FROM sales;
- SELECT fname,ip,zip,pid, COUNT(pid) OVER (PARTITION BY ip, zip) FROM sales;
select user_id, 
  client_session_id,
  event_name,
  count(event_name) over (partition by user_id, client_session_id)
from tmp_churn_mar2017_little_sister
where user_id = 255177207;

2 Order specification: It comprises a combination of one or more columns. The ordering could be ASC or DESC, which by default is ASC.

- SELECT fname,pid, COUNT(pid) OVER (PARTITION BY ip ORDER BY fname) FROM sales;
- SELECT fname,ip,pid, COUNT(pid) OVER (PARTITION BY ip, pid ORDER BY fname) FROM sales;
- select user_id, 
  client_session_id,
  event_name,
  count(event_name) over (partition by user_id, client_session_id order by event_timestamp)
from tmp_churn_mar2017_little_sister
where user_id = 255177207;

3 Window frame: A frame has a start boundary and an optional end boundary. Frame type: Window frames could be any of the following types: ROW, RANGE

- SELECT fname, ip, COUNT(pid) OVER (PARTITION BY ip ORDER BY fname ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM sales;
select user_id, 
  client_session_id,
  event_name,
  count(event_name) over (partition by user_id, client_session_id order by event_timestamp ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
from tmp_churn_mar2017_little_sister
where user_id = 255177207;

Frame boundary: A frame is associated with a direction or an amount. A direction value could be PRECEDING or FOLLOWING and the amount could be an integer value or keyword UNBOUNDED.

Effective window frames:
BETWEEN <start boundary> AND CURRENT ROW: When only the start boundary of a frame is specified.
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: When only the order is specified but no window frame is specified.
ROW BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING: When no order and no window frame are specified.

- SELECT fname, ip, COUNT(pid) OVER (PARTITION BY ip ORDER BY fname ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) FROM sales;
- SELECT fname, ip ,COUNT(pid) OVER (PARTITION BY ip ORDER BY fname ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) FROM sales;
- SELECT fname, ip, COUNT(pid) OVER (PARTITION BY ip ORDER BY fname ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)FROM sales;

4 Source name for window definition: A row R in the input table belongs to a partition as defined in the partition specification.

SELECT fname,pid, LEAD(pid) OVER (PARTITION BY ip ORDER BY ip) FROM sales;
SELECT fname,pid, LAG(pid) OVER (PARTITION BY ip ORDER BY ip) FROM sales;

select user_id,
  client_session_id,
  event_name,
  event_timestamp,
  LAG(event_name) OVER (PARTITION BY user_id, client_session_id order by event_timestamp) previous_event_name,
  count(event_name) over (partition by user_id, client_session_id order by event_timestamp) event_ts_order
from tmp_churn_mar2017_little_sister
where user_id = 255177207;

5 Second largest
select max(Salary)
from Employee
where Salary < (select max(salary) from Employee)

6 Nth largest
select a.Salary
from Employee a
join Employee b
where a.Salary <= b.Salary
group by a.Salary
having count(distinct b.Salary) = N 

7 Top three Salaries
select a.dept, a.Salary
from Employee a
join Employee b
where a.Salary <= b.Salary
group by a.dept, a.Salary
having count(distinct b.Salary) <=3

8 Median
select Id, Company, Salary
from Employee e
where abs(
select(count(*) from Employee e1 where e.company=e1.company and e.Salary>=e1.Salary) -
select(count(*) from Employee e2 where e.company=e2.company and e.Salary<=e2.Salary) -
 )<=1
group by Id, Company

Sunday, April 30, 2017

Basic Boostrap

Q: How to derive distribution estimates based on relatively small sample?

A: Using bootstrap to derive a statistic or model parameter.

The basic idea of bootstrapping is that inference about a population from sample data, (sample → population), can be modeled by resampling the sample data and performing inference about a sample from resampled data, (resampled → sample).

Bootstrap does not necessarily involve any assumptions about the data, or the sample statistic, being normally distributed. In practice, it is not necessary to actually replicate the sample a huge number of times. We simply replace each observation after each draw—we sample with replacement. In this way, we effectively create an infinite population in which the probability of an element being drawn remains unchanged from draw to draw. The algorithm for a bootstrap resampling of the mean is as follows, for a sample of size N:
  1.  Draw a sample value, record, replace it
  2.  Repeat N times
  3.  Record the mean of the N resampled values
  4.  Repeat steps 1-3 B times
  5.  Use the B results to:
    1.  Calculate their standard deviation (this estimates sample mean standard error)
    2.  Produce a histogram or boxplot
    3.  Find a confidence interval

Thursday, March 30, 2017

Facebook Prophet

pip install pystan

pip install fbprophet


import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt


df = pd.read_csv("~/downloads/res2.csv", index_col=[0])
df.head(2)

data = pd.DataFrame({'ds': df.dt,
                   'y': np.log(df['amount'])})
data.head(2)


m = Prophet()
m.fit(data)


future = m.make_future_dataframe(periods=4)
future.tail()


forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

m.plot(forecast)

m.plot_components(forecast);
plt.show()




Friday, February 24, 2017

Neural Networks and Deep Learning 3 - DNN

import numpy as np
import numpy.random as rnd
import os

import matplotlib
import matplotlib.pyplot as plt

Activation Functions

def logit(z):
    return 1/(1+np.exp(-z))

z = np.linspace(-5, 5, 200)

plt.plot([-5, 5], [0, 0], 'k-')
plt.plot([-5, 5], [1, 1], 'k--')
plt.plot([0, 0], [-0.2, 1.2], 'k-')
plt.plot([-5, 5], [-3/4, 7/4], 'g--')
plt.plot(z, logit(z), "b-", linewidth=2)
props = dict(facecolor='black', shrink=0.1)
plt.annotate('Saturating', xytext=(3.5, 0.7), xy=(5, 1), arrowprops=props, fontsize=14, ha="center")
plt.annotate('Saturating', xytext=(-3.5, 0.3), xy=(-5, 0), arrowprops=props, fontsize=14, ha="center")
plt.annotate('Linear', xytext=(2, 0.2), xy=(0, 0.5), arrowprops=props, fontsize=14, ha="center")
plt.grid(True)
plt.title("Sigmoid activation function", fontsize=14)
plt.axis([-5, 5, -0.2, 1.2])
plt.show()

def leaky_relu(z, alpha=0.01):
    return np.maximum(alpha*z, z)

plt.plot(z, leaky_relu(z, 0.05), "b-", linewidth=2)
plt.plot([-5, 5], [0, 0], 'k-')
plt.plot([0, 0], [-0.5, 4.2], 'k-')
plt.grid(True)
props = dict(facecolor='black', shrink=0.1)
plt.annotate('Leak', xytext=(-3.5, 0.5), xy=(-5, -0.2), arrowprops=props, fontsize=14, ha="center")
plt.title("Leaky ReLU activation function", fontsize=14)
plt.axis([-5, 5, -0.5, 4.2])
plt.show()

def elu(z, alpha=1):
    return np.where(z<0, alpha*(np.exp(z)-1), z)

plt.plot(z, elu(z), "b-", linewidth=2)
plt.plot([-5, 5], [0, 0], 'k-')
plt.plot([-5, 5], [-1, -1], 'k--')
plt.plot([0, 0], [-2.2, 3.2], 'k-')
plt.grid(True)
props = dict(facecolor='black', shrink=0.1)
plt.title(r"ELU activation function ($\alpha=1$)", fontsize=14)
plt.axis([-5, 5, -2.2, 3.2])
plt.show()


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")


def leaky_relu(z, name=None):
    return tf.maximum(0.01*z, z, name=name)

import tensorflow as tf

from tensorflow.contrib.layers import fully_connected

tf.reset_default_graph()

n_inputs = 28*28  # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = fully_connected(X, n_hidden1, activation_fn=leaky_relu, scope="hidden1")
    hidden2 = fully_connected(hidden1, n_hidden2, activation_fn=leaky_relu, scope="hidden2")
    logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope="outputs")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
   
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 100

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(len(mnist.test.labels)//batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

    save_path = saver.save(sess, "my_model_final.ckpt")

Batch Normalization

from tensorflow.contrib.layers import fully_connected, batch_norm
from tensorflow.contrib.framework import arg_scope

tf.reset_default_graph()

n_inputs = 28 * 28  # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
learning_rate = 0.01
momentum = 0.25

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')

with tf.name_scope("dnn"):
    he_init = tf.contrib.layers.variance_scaling_initializer()
    batch_norm_params = {
        'is_training': is_training,
        'decay': 0.9,
        'updates_collections': None,
        'scale': True,
    }

    with arg_scope(
            [fully_connected],
            activation_fn=tf.nn.elu,
            weights_initializer=he_init,
            normalizer_fn=batch_norm,
            normalizer_params=batch_norm_params):
        hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
        hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
        logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope="outputs")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
   
init = tf.global_variables_initializer()
saver = tf.train.Saver()


n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(len(mnist.test.labels)//batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

save_path = saver.save(sess, "my_model_final.ckpt")


tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')

with tf.name_scope("dnn"):
    he_init = tf.contrib.layers.variance_scaling_initializer()
    batch_norm_params = {
        'is_training': is_training,
        'decay': 0.9,
        'updates_collections': None,
        'scale': True,
    }

    with arg_scope(
            [fully_connected],
            activation_fn=tf.nn.elu,
            weights_initializer=he_init,
            normalizer_fn=batch_norm,
            normalizer_params=batch_norm_params,
            weights_regularizer=tf.contrib.layers.l1_regularizer(0.01)):
        hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
        hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
        logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope="outputs")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    base_loss = tf.reduce_mean(xentropy, name="base_loss")
    loss = tf.add(base_loss, reg_losses, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
   
init = tf.global_variables_initializer()
saver = tf.train.Saver()

Gradient Clipping

n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(len(mnist.test.labels)//batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

    save_path = saver.save(sess, "my_model_final.ckpt")

[v.name for v in tf.global_variables()]


with tf.variable_scope("", default_name="", reuse=True):  # root scope
    weights1 = tf.get_variable("hidden1/weights")
    weights2 = tf.get_variable("hidden2/weights")

tf.reset_default_graph()

x = tf.constant([0., 0., 3., 4., 30., 40., 300., 400.], shape=(4, 2))
c = tf.clip_by_norm(x, clip_norm=10)
c0 = tf.clip_by_norm(x, clip_norm=350, axes=0)
c1 = tf.clip_by_norm(x, clip_norm=10, axes=1)

with tf.Session() as sess:
    xv = x.eval()
    cv = c.eval()
    c0v = c0.eval()
    c1v = c1.eval()

print(xv)

print(cv)

print(np.linalg.norm(cv))

print(c0v)

print(np.linalg.norm(c0v, axis=0))

print(c1v)

print(np.linalg.norm(c1v, axis=1))


tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')

def max_norm_regularizer(threshold, axes=1, name="max_norm", collection="max_norm"):
    def max_norm(weights):
        clip_weights = tf.assign(weights, tf.clip_by_norm(weights, clip_norm=threshold, axes=axes), name=name)
        tf.add_to_collection(collection, clip_weights)
        return None # there is no regularization loss term
    return max_norm

with tf.name_scope("dnn"):
    with arg_scope(
            [fully_connected],
            weights_regularizer=max_norm_regularizer(1.5)):
        hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
        hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
        logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope="outputs")

clip_all_weights = tf.get_collection("max_norm")
       
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)
    threshold = 1.0
    grads_and_vars = optimizer.compute_gradients(loss)
    capped_gvs = [(tf.clip_by_value(grad, -threshold, threshold), var)
                  for grad, var in grads_and_vars]
    training_op = optimizer.apply_gradients(capped_gvs)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
   
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(len(mnist.test.labels)//batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

    save_path = saver.save(sess, "my_model_final.ckpt")

Dropout

from tensorflow.contrib.layers import dropout

tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")
is_training = tf.placeholder(tf.bool, shape=(), name='is_training')

initial_learning_rate = 0.1
decay_steps = 10000
decay_rate = 1/10
global_step = tf.Variable(0, trainable=False)
learning_rate = tf.train.exponential_decay(initial_learning_rate, global_step,
                                           decay_steps, decay_rate)

keep_prob = 0.5

with tf.name_scope("dnn"):
    he_init = tf.contrib.layers.variance_scaling_initializer()
    with arg_scope(
            [fully_connected],
            activation_fn=tf.nn.elu,
            weights_initializer=he_init):
        X_drop = dropout(X, keep_prob, is_training=is_training)
        hidden1 = fully_connected(X_drop, n_hidden1, scope="hidden1")
        hidden1_drop = dropout(hidden1, keep_prob, is_training=is_training)
        hidden2 = fully_connected(hidden1_drop, n_hidden2, scope="hidden2")
        hidden2_drop = dropout(hidden2, keep_prob, is_training=is_training)
        logits = fully_connected(hidden2_drop, n_outputs, activation_fn=None, scope="outputs")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.MomentumOptimizer(learning_rate, momentum)
    training_op = optimizer.minimize(loss, global_step=global_step)  

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
   
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(len(mnist.test.labels)//batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={is_training: True, X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={is_training: False, X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={is_training: False, X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

    save_path = saver.save(sess, "my_model_final.ckpt")

train_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES,
                               scope="hidden[2]|outputs")

training_op2 = optimizer.minimize(loss, var_list=train_vars)


for i in tf.global_variables():
    print(i.name)

for i in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES):
    print(i.name)


for i in train_vars:
    print(i.name)



Thursday, February 23, 2017

Neural Networks and Deep Learning 2 - ANN

import numpy as np
import numpy.random as rnd
import os
import matplotlib
import matplotlib.pyplot as plt


Perceptrons

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data[:,(2,3)]
y = (iris.target == 0).astype(np.int)

from sklearn.linear_model import Perceptron

per_clf = Perceptron(random_state=42)
per_clf.fit(X, y)
y_pred = per_clf.predict([[2, 0.5]])
y_pred

a = -per_clf.coef_[0][0] / per_clf.coef_[0][1]
b = -per_clf.intercept_ / per_clf.coef_[0][1]

axes = [0, 5, 0, 2]

x0, x1 = np.meshgrid(
        np.linspace(axes[0], axes[1], 500).reshape(-1, 1),
        np.linspace(axes[2], axes[3], 200).reshape(-1, 1),
    )
X_new = np.c_[x0.ravel(), x1.ravel()]
y_predict = per_clf.predict(X_new)
zz = y_predict.reshape(x0.shape)

plt.figure(figsize=(10, 4))
plt.plot(X[y==0, 0], X[y==0, 1], "bs", label="Not Iris-Setosa")
plt.plot(X[y==1, 0], X[y==1, 1], "yo", label="Iris-Setosa")

plt.plot([axes[0], axes[1]], [a * axes[0] + b, a * axes[1] + b], "k-", linewidth=3)
from matplotlib.colors import ListedColormap
custom_cmap = ListedColormap(['#9898ff', '#fafab0'])

plt.contourf(x0, x1, zz, cmap=custom_cmap, linewidth=5)
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.legend(loc="lower right", fontsize=14)
plt.axis(axes)
plt.show()

Activation Functions

def logit(z):
    return 1/(1+np.exp(-z))
def relu(z):
    return np.maximum(0,z)
def derivative(f, z, eps=0.000001):
    return (f(z + eps) - f(z - eps))/(2 * eps)
def tanh(z):
    return np.tanh(z)

z = np.linspace(-5, 5, 200)

plt.figure(figsize=(11,4))

plt.subplot(121)
plt.plot(z, np.sign(z), "r-", linewidth=2, label="Step")
plt.plot(z, logit(z), "g--", linewidth=2, label="Logit")
plt.plot(z, tanh(z), "b-", linewidth=2, label="Tanh")
plt.plot(z, relu(z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
plt.legend(loc="center right", fontsize=14)
plt.title("Activation functions", fontsize=14)
plt.axis([-5, 5, -1.2, 1.2])

plt.subplot(122)
plt.plot(z, derivative(np.sign, z), "r-", linewidth=2, label="Step")
plt.plot(0, 0, "ro", markersize=5)
plt.plot(0, 0, "rx", markersize=10)
plt.plot(z, derivative(logit, z), "g--", linewidth=2, label="Logit")
plt.plot(z, derivative(np.tanh, z), "b-", linewidth=2, label="Tanh")
plt.plot(z, derivative(relu, z), "m-.", linewidth=2, label="ReLU")
plt.grid(True)
#plt.legend(loc="center right", fontsize=14)
plt.title("Derivatives", fontsize=14)
plt.axis([-5, 5, -0.2, 1.2])

plt.show()


def heaviside(z):
    return (z >= 0).astype(z.dtype)

def sigmoid(z):
    return 1/(1+np.exp(-z))

def mlp_xor(x1, x2, activation=heaviside):
    return activation(-activation(x1 + x2 - 1.5) + activation(x1 + x2 - 0.5) - 0.5)

x1s = np.linspace(-0.2, 1.2, 100)
x2s = np.linspace(-0.2, 1.2, 100)
x1, x2 = np.meshgrid(x1s, x2s)

z1 = mlp_xor(x1, x2, activation=heaviside)
z2 = mlp_xor(x1, x2, activation=sigmoid)

plt.figure(figsize=(10,4))

plt.subplot(121)
plt.contourf(x1, x2, z1)
plt.plot([0, 1], [0, 1], "gs", markersize=20)
plt.plot([0, 1], [1, 0], "y^", markersize=20)
plt.grid(True)

plt.subplot(122)
plt.contourf(x1, x2, z2)
plt.plot([0, 1], [0, 1], "gs", markersize=20)
plt.plot([0, 1], [1, 0], "y^", markersize=20)
plt.grid(True)
plt.show()

Use TF.Learn 

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("/tmp/data/")
X_train = mnist.train.images
X_test = mnist.test.images
y_train = mnist.train.labels.astype("int")
y_test = mnist.test.labels.astype("int")

import tensorflow as tf

feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)
dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300, 100], n_classes=10,
                                         feature_columns=feature_columns)
dnn_clf.fit(x=X_train, y=y_train, batch_size=50, steps=40000)


from sklearn.metrics import accuracy_score

y_pred = list(dnn_clf.predict(X_test))
accuracy = accuracy_score(y_test, y_pred)
accuracy

from sklearn.metrics import log_loss

y_pred_proba = list(dnn_clf.predict_proba(X_test))
log_loss(y_test, y_pred_proba)

dnn_clf.evaluate(X_test, y_test)

Use Plain TensorFlor

import tensorflow as tf

def neuron_layer(X, n_neurons, name, activation=None):
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 1 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev)
        W = tf.Variable(init, name="weights")
        b = tf.Variable(tf.zeros([n_neurons]), name="biases")
        Z = tf.matmul(X, W) + b
        if activation=="relu":
            return tf.nn.relu(Z)
        else:
            return Z

tf.reset_default_graph()

n_inputs = 28*28  # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, "hidden1", activation="relu")
    hidden2 = neuron_layer(hidden1, n_hidden2, "hidden2", activation="relu")
    logits = neuron_layer(hidden2, n_outputs, "output")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
 
init = tf.global_variables_initializer()
saver = tf.train.Saver()


n_epochs = 20
batch_size = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)

    save_path = saver.save(sess, "./my_model_final.ckpt")


with tf.Session() as sess:
    saver.restore(sess, save_path) #"my_model_final.ckpt")
    X_new_scaled = mnist.test.images[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    print(np.argmax(Z, axis=1))
    print(mnist.test.labels[:20])

Use Fully connected 

tf.reset_default_graph()

from tensorflow.contrib.layers import fully_connected

n_inputs = 28*28  # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int64, shape=(None), name="y")

with tf.name_scope("dnn"):
    hidden1 = fully_connected(X, n_hidden1, scope="hidden1")
    hidden2 = fully_connected(hidden1, n_hidden2, scope="hidden2")
    logits = fully_connected(hidden2, n_outputs, activation_fn=None, scope="outputs")

with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))
 
init = tf.global_variables_initializer()
saver = tf.train.Saver()

n_epochs = 20
n_batches = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for iteration in range(mnist.train.num_examples // batch_size):
            X_batch, y_batch = mnist.train.next_batch(batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_train = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_test = accuracy.eval(feed_dict={X: mnist.test.images, y: mnist.test.labels})
        print(epoch, "Train accuracy:", acc_train, "Test accuracy:", acc_test)
    save_path = saver.save(sess, "./my_model_final.ckpt")




Tuesday, February 21, 2017

Neural Networks and Deep Learning 1 - Up and Running with TensorFlow


import tensorflow as tf

tf.reset_default_graph()

x = tf.Variable(3, name='x')
y = tf.Variable(4, name='y')
f = x*x*y + y + 2
f

sess = tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
print(sess.run(f))
sess.close()

with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result = f.eval()
result

init = tf.global_variables_initializer()
sess = tf.InteractiveSession()
init.run()
result = f.eval()
sess.close()
result

Managing Graphs

tf.reset_default_graph()
x1 = tf.Variable(1)
x1.graph is tf.get_default_graph()

graph = tf.Graph()
with graph.as_default():
    x2 = tf.Variable(2)
x2.graph is tf.get_default_graph()


x2.graph is graph
w = tf.constant(3)
x = w + 2
y = x + 5
z = x * 3

with tf.Session() as sess:
    print(x.eval())
    print(y.eval())
    print(z.eval())

with tf.Session() as sess:
    y_val, z_val = sess.run([y,z])
    print(y)
    print(z)


Linear Regression with TensorFlow

import numpy as np
import numpy.random as rnd
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
m,n = housing.data.shape
housing_data_plus_bias = np.c_[np.ones((m, 1)), housing.data]

Manually computing the gradient
tf.reset_default_graph()
X = tf.constant(housing_data_plus_bias, dtype=tf.float64, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float64, name="y")
XT = tf.transpose(X)
theta = tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT, X)), XT), y)

with tf.Session() as sess:
    result = theta.eval()
print(result)

Compare with pure Numpy
X = housing_data_plus_bias
y = housing.target.reshape(-1, 1)
theta_numpy = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)
print(theta_numpy)

Compare with Scikit-learn
from sklearn.linear_model import LinearRegression
lin_reg = LinearRegression()
lin_reg.fit(housing.data, housing.target.reshape(-1, 1))
print(np.r_[lin_reg.intercept_.reshape(-1, 1), lin_reg.coef_.T])

Using Batch Gradient Descent

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaled_housing_data = scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias = np.c_[np.ones((m, 1)), scaled_housing_data]

print(scaled_housing_data_plus_bias.shape)
print(scaled_housing_data_plus_bias.mean())
print(scaled_housing_data_plus_bias.mean(axis=0))
print(scaled_housing_data_plus_bias.mean(axis=1))

Manually Computing the Gradients

tf.reset_default_graph()
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = 2/m * tf.matmul(tf.transpose(X), error)
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
print("Best theta:")
print(best_theta)


tf.reset_default_graph()
n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
gradients = tf.gradients(mse, [theta])[0]
training_op = tf.assign(theta, theta - learning_rate * gradients)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
print("Best theta:")
print(best_theta)

Using the Gradient Descent Optimizer

tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
        sess.run(training_op)
    best_theta = theta.eval()
print("Best theta:")
print(best_theta)

Using a momentum optimizer
tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate, momentum=0.25)
training_op = optimizer.minimize(mse)


init = tf.global_variables_initializer()
with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        sess.run(training_op)
   
    best_theta = theta.eval()
print("Best theta:")
print(best_theta)

Feeding data to the training algorithm

Placeholder nodes
tf.reset_default_graph()
A = tf.placeholder(tf.float32, shape=(None, 3))
B = A + 5
with tf.Session() as sess:
    B_val_1 = B.eval(feed_dict={A: [[1, 2, 3]]})
    B_val_2 = B.eval(feed_dict={A: [[4, 5, 6], [7, 8, 9]]})

print(B_val_1)
print(B_val_2)

Mini-batch Gradient Descent

tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

def fetch_batch(epoch, batch_index, batch_size):
    rnd.seed(epoch * n_batches + batch_index)
    indices = rnd.randint(m, size=batch_size)
    X_batch = scaled_housing_data_plus_bias[indices]
    y_batch = housing.target.reshape(-1, 1)[indices]
    return X_batch, y_batch

n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()
print("Best theta:")
print(best_theta)

Saving and restoring a model

tf.reset_default_graph()

n_epochs = 1000
learning_rate = 0.01

X = tf.constant(scaled_housing_data_plus_bias, dtype=tf.float32, name="X")
y = tf.constant(housing.target.reshape(-1, 1), dtype=tf.float32, name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()
saver = tf.train.Saver()

Visualizing the graph

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        if epoch % 100 == 0:
            print("Epoch", epoch, "MSE =", mse.eval())
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
   
    best_theta = theta.eval()
    save_path = saver.save(sess, "my_model_final.ckpt")

print("Best theta:")
print(best_theta)

tf.reset_default_graph()

from datetime import datetime

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
error = y_pred - y
mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
summary_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())


n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                summary_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

summary_writer.flush()
summary_writer.close()
print("Best theta:")
print(best_theta)

Modularity

tf.reset_default_graph()

now = datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir = "tf_logs"
logdir = "{}/run-{}/".format(root_logdir, now)

n_epochs = 1000
learning_rate = 0.01

X = tf.placeholder(tf.float32, shape=(None, n + 1), name="X")
y = tf.placeholder(tf.float32, shape=(None, 1), name="y")
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0, seed=42), name="theta")
y_pred = tf.matmul(X, theta, name="predictions")
with tf.name_scope('loss') as scope:
    error = y_pred - y
    mse = tf.reduce_mean(tf.square(error), name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init = tf.global_variables_initializer()

mse_summary = tf.summary.scalar('MSE', mse)
summary_writer = tf.summary.FileWriter(logdir, tf.get_default_graph())

n_epochs = 10
batch_size = 100
n_batches = int(np.ceil(m / batch_size))

with tf.Session() as sess:
    sess.run(init)

    for epoch in range(n_epochs):
        for batch_index in range(n_batches):
            X_batch, y_batch = fetch_batch(epoch, batch_index, batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                summary_writer.add_summary(summary_str, step)
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})

    best_theta = theta.eval()

summary_writer.flush()
summary_writer.close()
print("Best theta:")
print(best_theta)

print(error.op.name)

print(mse.op.name)

tf.reset_default_graph()

a1 = tf.Variable(0, name="a")      # name == "a"
a2 = tf.Variable(0, name="a")      # name == "a_1"

with tf.name_scope("param"):       # name == "param"
    a3 = tf.Variable(0, name="a")  # name == "param/a"

with tf.name_scope("param"):       # name == "param_1"
    a4 = tf.Variable(0, name="a")  # name == "param_1/a"

for node in (a1, a2, a3, a4):
    print(node.op.name)


tf.reset_default_graph()

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")

w1 = tf.Variable(tf.random_normal((n_features, 1)), name="weights1")
w2 = tf.Variable(tf.random_normal((n_features, 1)), name="weights2")
b1 = tf.Variable(0.0, name="bias1")
b2 = tf.Variable(0.0, name="bias2")

linear1 = tf.add(tf.matmul(X, w1), b1, name="linear1")
linear2 = tf.add(tf.matmul(X, w2), b2, name="linear2")

relu1 = tf.maximum(linear1, 0, name="relu1")
relu2 = tf.maximum(linear1, 0, name="relu2")  # Oops, cut&paste error! Did you spot it?

output = tf.add_n([relu1, relu2], name="output")



tf.reset_default_graph()

def relu(X):
    w_shape = int(X.get_shape()[1]), 1
    w = tf.Variable(tf.random_normal(w_shape), name="weights")
    b = tf.Variable(0.0, name="bias")
    linear = tf.add(tf.matmul(X, w), b, name="linear")
    return tf.maximum(linear, 0, name="relu")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")
summary_writer = tf.summary.FileWriter("logs/relu1", tf.get_default_graph())



tf.reset_default_graph()

def relu(X):
    with tf.name_scope("relu"):
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        linear = tf.add(tf.matmul(X, w), b, name="linear")
        return tf.maximum(linear, 0, name="max")

n_features = 3
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

summary_writer = tf.summary.FileWriter("logs/relu2", tf.get_default_graph())


summary_writer.close()


tf.reset_default_graph()

def relu(X, threshold):
    with tf.name_scope("relu"):
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        linear = tf.add(tf.matmul(X, w), b, name="linear")
        return tf.maximum(linear, threshold, name="max")

threshold = tf.Variable(0.0, name="threshold")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X, threshold) for i in range(5)]
output = tf.add_n(relus, name="output")


tf.reset_default_graph()

def relu(X):
    with tf.name_scope("relu"):
        if not hasattr(relu, "threshold"):
            relu.threshold = tf.Variable(0.0, name="threshold")
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        linear = tf.add(tf.matmul(X, w), b, name="linear")
        return tf.maximum(linear, relu.threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")


tf.reset_default_graph()

def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        linear = tf.add(tf.matmul(X, w), b, name="linear")
        return tf.maximum(linear, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"):
    threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
relus = [relu(X) for i in range(5)]
output = tf.add_n(relus, name="output")

summary_writer = tf.summary.FileWriter("logs/relu6", tf.get_default_graph())
summary_writer.close()


tf.reset_default_graph()

def relu(X):
    with tf.variable_scope("relu"):
        threshold = tf.get_variable("threshold", shape=(), initializer=tf.constant_initializer(0.0))
        w_shape = int(X.get_shape()[1]), 1
        w = tf.Variable(tf.random_normal(w_shape), name="weights")
        b = tf.Variable(0.0, name="bias")
        linear = tf.add(tf.matmul(X, w), b, name="linear")
        return tf.maximum(linear, threshold, name="max")

X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("", default_name="") as scope:
    first_relu = relu(X)     # create the shared variable
    scope.reuse_variables()  # then reuse it
    relus = [first_relu] + [relu(X) for i in range(4)]
output = tf.add_n(relus, name="output")

summary_writer = tf.summary.FileWriter("logs/relu8", tf.get_default_graph())
summary_writer.close()


tf.reset_default_graph()

with tf.variable_scope("param"):
    x = tf.get_variable("x", shape=(), initializer=tf.constant_initializer(0.))
    #x = tf.Variable(0., name="x")
with tf.variable_scope("param", reuse=True):
    y = tf.get_variable("x")

with tf.variable_scope("", default_name="", reuse=True):
    z = tf.get_variable("param/x", shape=(), initializer=tf.constant_initializer(0.))

print(x is y)
print(x.op.name)
print(y.op.name)
print(z.op.name)