My DS Coding Bolg: November 2016

Monday, November 28, 2016

Image Recognition in Python

Classification using Deep Learning

The training phase for an image classification problem has 2 main steps:

Feature HoG and SIFT are examples of features used in image classification.
Extraction: In this phase, we utilize domain knowledge to extract new features that will be used by the machine learning algorithm.
Model Training: In this phase, we utilize a clean dataset composed of the images' features and the corresponding labels to train the machine learning model.

In the predicition phase, we apply the same feature extraction process to the new images and we pass the features to the trained machine learning algorithm to predict the label.

The main difference between traditional machine learning and deep learning algorithms is in the feature engineering. In traditional machine learning algorithms, we need to hand-craft the features. By contrast, in deep learning algorithms feature engineering is done automatically by the algorithm. Feature engineering is difficult, time-consuming and requires domain expertise. The promise of deep learning is more accurate machine learning algorithms compared to traditional machine learning with less or no feature engineering.

Artificial neurons are inspired by biological neurons, and try to formulate the model explained above in a computational form. An artificial neuron has a finite number of inputs with weights associated to them, and an activation function (also called transfer function). The output of the neuron is the result of the activation function applied to the weighted sum of inputs. Artificial neurons are connected with each others to form artificial neural networks.

Feedforward Neural Networks

Feedforward Neural Networks are the simplest form of Artificial Neural Networks.

These networks have 3 types of layers: Input layer, hidden layer and output layer. In these networks, data moves from the input layer through the hidden nodes (if any) and to the output nodes.

Below is an example of a fully-connected feedforward neural network with 2 hidden layers. "Fully-connected" means that each node is connected to all the nodes in the next layer.

Note that, the number of hidden layers and their size are the only free parameters. The larger and deeper the hidden layers, the more complex patterns we can model in theory.

Activation Functions

Activation functions transform the weighted sum of inputs that goes into the artificial neurons. These functions should be non-linear to encode complex patterns of the data. The most popular activation functions are Sigmoid, Tanh and ReLU. ReLU is the most popular activation function in deep neural networks.

Training Artificial Neural Networks

The goal of the training phase is to learn the network's weights. We need 2 elements to train an artificial neural network:

Training data: In the case of image classification, the training data is composed of images and the corresponding labels.

Loss function: A function that measures the inaccuracy of predictions.

Once we have the 2 elements above, we train the ANN using an algorithm called backpropagation together with gradient descent (or one of its derivatives).

Convolutional Neural Networks

Convolutional neural networks are a special type of feed-forward networks. These models are designed to emulate the behaviour of a visual cortex. CNNs perform very well on visual recognition tasks. CNNs have special layers called convolutional layers and pooling layers that allow the network to encode certain images properties.

Convolution Layer

This layer consists of a set of learnable filters that we slide over the image spatially, computing dot products between the entries of the filter and the input image. The filters should extend to the full depth of the input image. For example, if we want to apply a filter of size 5x5 to a colored image of size 32x32, then the filter should have depth 3 (5x5x3) to cover all 3 color channels (Red, Green, Blue) of the image. These filters will activate when they see same specific structure in the images.

Pooling Layer

Pooling is a form of non-linear down-sampling. The goal of the pooling layer is to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. There are several functions to implement pooling among which max pooling is the most common one. Pooling is often applied with filters of size 2x2 applied with a stride of 2 at every depth slice. A pooling layer of size 2x2 with stride of 2 shrinks the input image to a 1/4 of its original size.

Convolutional Neural Networks Architecture

The simplest architecture of a convolutional neural networks starts with an input layer (images) followed by a sequence of convolutional layers and pooling layers, and ends with fully-connected layers. The convolutional layers are usually followed by one layer of ReLU activation functions.

The convolutional, pooling and ReLU layers act as learnable features extractors, while the fully connected layers acts as a machine learning classifier. Furthermore, the early layers of the network encode generic patterns of the images, while later layers encode the details patterns of the images.

Note that only the convolutional layers and fully-connected layers have weights. These weights are learned in the training phase.

Caffe Overview

Caffe is a deep learning framework developed by the Berkeley Vision and Learning Center (BVLC). It is written in C++ and has Python and Matlab bindings.

There are 4 steps in training a CNN using Caffe:

Step 1 - Data preparation: In this step, we clean the images and store them in a format that can be used by Caffe. We will write a Python script that will handle both image pre-processing and storage.

Step 2 - Model definition: In this step, we choose a CNN architecture and we define its parameters in a configuration file with extension .prototxt.

Step 3 - Solver definition: The solver is responsible for model optimization. We define the solver parameters in a configuration file with extension .prototxt.

Step 4 - Model training: We train the model by executing one Caffe command from the terminal. After training the model, we will get the trained model in a file with extension .caffemodel.

After the training phase, we will use the .caffemodel trained model to make predictions of new unseen data. We will write a Python script to this.

Friday, November 25, 2016

Basic Github

Typical development workflow
- developer established a local environment
- developer initializes git local environment
- developer clones common repo to create a local version
- on the local repo, the developer creates a branch to work within
- developer creates new code files or changes existing files while syncing with remote trunk
- developer commits any new code and changes within the branch container
- developer pushes the new branch to the project's remote repo
- developer performs a pull request
- new code is determined either need additional work or is deemed acceptable
- If accepted, new code is merged into remote trunk

1 Create the remote repository, and get the URL such as

git@github.com:/youruser/somename.git or https://github.com/youruser/somename.git

If your local GIT repo is already set up, skips steps 2 and 3

2 Locally, at the root directory of your source,
git init

3 Locally, add and commit what you want in your initial repo (for everything,
git add . then
git commit -m 'initial commit comment')
git clone git@github.com:githubteacher/**********.git

to attach your remote repo with the name 'origin' (like cloning would do)

git remote add origin git@github.kohls.com:tkmaemd/XXX.git

Execute
git pull origin master to pull the remote branch so that they are in sync.

to push up your master branch (change master to something else for a different branch):
git remote -v

git push origin master

git pull --rebase origin master
git push -u origin master

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch users.csv'

### Git workflow

Untracked, unstaged, staged

### Basic workflow
git status
git add .
git status
git commit --m ‘update models’
git status
git pull origin master
git push origin master

ls -lah
git branch dev
git checkout dev
git add .
git status
git commit -a -m ‘***’
git checkout master
git pull origin master
git merge dev
git push origin master

### More advanced workflow

mkdir log_update
cd log_update
git init
git clone ……git_test.repo.git
cd git_test.repo.git
git branch
git branch log_update
git branch
git checkout log_update
git status
git add .
git commit —-m ‘create new features’

git checkout master
git pull
git pull
git checkout login_upgrade
git merge master —-m ‘merging new login’
git push
# click ‘compare and pull request’ to ask for reviews
# click ‘merge pull request’

linghduoduo

Stat828828

Github tricks

#1 在GitHub.com上编辑代码
#2 粘贴图片
#3 美化代码 https://github.com/github/linguist/blob/fc1404985abb95d5bc33a0eba518724f1c3c252e/vendor/README.md
```jsx
```
#4 在PRs中巧妙关闭issues - https://help.github.com/articles/closing-issues-using-keywords/
#5 链接到评论 - 点击评论框用户名旁边的时间，就可以得到链接了
#6 链接到代码 - 打开一个文件，点击代码左边的行号，或者按住shift选择多行。
#7 灵活使用GitHub地址栏 - 你想跳转到一个分支，看下它与主干的区别，就可以直接在你仓库的后面入/compare/branch-name：与主干对比，两个分支对比，输入/compare/**integration-branch...**my-branch
#8 创建复选框列表
#9 在GitHub中进行项目管理 - https://help.github.com/articles/searching-issues-and-pull-requests/
#10 GitHub wiki - https://github.com/davidgilbertson/about-github/wiki
#11 静态博客 - https://github.com/davidgilbertson/about-github
#12 用GitHub作为CMS(内容管理系统) -
https://www.npmjs.com/package/marked
https://chrome.google.com/webstore/detail/octotree/bkhaagjahfmjljalopjnoealnfndnagc?hl=en-US

% Day 1

% 1 Configuration
git --version

git config --global user.email "user email"

git config --global user.email "***@kohls.com"

git config --global user.name "user name"
git config --global user.name "***"
git config --local user.name "user name"
git config --local user.name "***"

git config -- local user.email "user email"
git config --local user.email "***@kohls.com"

mkdir 'google trend api'
touch index.html
touch index.css
touch about-us.html
touch about-us.css

Pull Requests

% 2 Initializing a Repository in an Existing Directory
-- Initialize the local directory as a Git repository.

git init

-- Add the files in your new local repository. This stages them for the first commit.git add ind*
-- Git Commit moves files from staging area into local repo's history. Commit command creates a new version/snapshot of the project in the repo

git status
git add .
git status
git commit --m “add about page with css.”

git log
git show 9065
git config --global alias.lg log --online --decorate --graph --all -10
git config --global alias.lg “log”

specify some files git ignore for configuration
touch .gitignore
node-modules/

subl index.html
git add .
git status
git commit --m "edit some info to the index page"
subl about-us.html
git commit --am "add some info for about-us"

-- Generating a new SSH key
ssh-keygen -t rsa -b 4096 -C "***@kohls.com"

-- Adding your SSH key to the ssh-agent
eval "$(ssh-agent -s)"

-- Adding your SSH key to the ssh-agent
ssh-add ~/.ssh/id_rsa

-- Adding a new SSH key to your GitHub account
pbcopy < ~/.ssh/id_rsa.pub

-- At the top of your GitHub repository's Quick Setup page, click to copy the remote repository URL.

-- Check remote origin
git remote -v

git push origin master
-- In Terminal, add the URL for the remote repository where your local repository will be pushed.

git remote add origin (remote repository url)

git remote add origin git@github.kohls.com:tkmaemd/Google-Trend-API.git

clear

git log
git config --global alias.lg "log --decorate --grahp --all -10"

% 3 Change file names
git mv index.html home.htm
git status
git commit --m "rename index.html to home.htm"

change index.css --> home.css
git status
git add .
git status
git add -A

% 4 Branching (Add & Delete)
Branch is a version of a project removed from master
Merge is a code from a branch 'merged' with code in the master version

git branch cart
touch cart.htm
touch cart.css
git commit -am "add info in the cart file"
git lg
git status
git add .
git lg
ls
git checkout cart
git branch

git checkout master
git merge cart
git branch
git branch -d cart
git branch _D cart

% 5 Diff
git diff
git diff --staged
git diff --stat
git diff --color_words

% 6 Reset & Conflicts
git status
git reset --hard
git status

git checkout master
git branch
git merge history
git diff --star

% 7 Clone; Pull & Push
git clone https://github.com/githubteacher/******.git
rm poetry

## generate the public/private ssh key
cd ~/.ssh
ssh-keygen -t rsa -C "**********@gmail.com"
cat ~/.ssh/id_rsa.pub > my.txt

git clone git@github.com:githubteacher/**********.git
git clone git@github.kohls.com:tkmaemd/Google-Trend-API.git
git commit --m "my new addition to the poetry"

git remote add origin git@github.kohls.com:tkmaemd/Trend-Report.git

Push the changes in your local repository to GitHub.git remote add origin git@github.com:git@github.com:*********/**********.git
git push -u origin master

git pull origin master
or
git branch --set-upstream-to=origin/master master
git pull

git clone git@github.com:linghduoduo/Test.git
%git remote add origin git@github.com:git@github.com:*********/**********.git
git remote set-url origin git@github.com:git@github.com:*********/**********.git
git push -u origin master

% Day 2;
ls
git status
git init yahoo2
cd yahoo2
ls -a
git config --global --edit
touch index.html
touch index.css
touch about_us.html
touch about_us.css
git add in*
git status
git commit -m "create index/home page for webs"
git status
git commit -am "added about us page"
git status
git add about_us.html
git status
git reset
git add .
git status
git commit -m "add about us page"
git status

git branch
git branch cart
git branch
git checkout cart
git branch
git branch -d cart
git checkout master
git branch -d cart
git branch
git checkout cart
git checkout -b cart
git branch
touch cart.html
touch cart.css
git add .
git status
git commit -m "First cut of shopping cart"
touch cart.js
git status
git add .
git commit
git status

git config --global alias.lg2 "log --oneline --decorate --all --graph -30"
git lg2
git checkout master
subl index.html
"Here is our phone number 555-555-5555"
git status
git commit -am "Added phone number to the home page"
git lg2
git checkout cart
subl cart.html
"Finish shoping cart"
git status
git commit -am "Finished up the shopping cart file"
git status
git checkout master
git merge cart
git lg2
git branch
git branch -d cart
git branch

%create branch under branch
git checkout -b contact
git branch
touch contact.html
touch contact.css
git commit -am "add first cut of contact"
touch contact.js
git add .
git commit -m "add java sctript"
git branch
git checkout -b contact_coffee
git mv contact.js contact.coffee.js
git commit -m "rename javascript to java script"
subl contact.coffee.js
git add .
git commit -m "implement coffee version script"
git branch
%git branch -D contact_coffee
git merge contact_coffee
git merge contact_coffee --no-ff
git branch -d contact_coffee
git checkout master

git remote add origin
git push -u origin master
git remote add origin git@github.com:**********/yahoo2.git

git branch
git branch -a
git pull
git checkout master
git branch
git pull
git lg2

git push
git status
git pull
git status

git config --global push.default simple
git push
toucn index.html
sub index.html
"and fax is XXXXXXX"
git add .
git commit -m "add fax number"
git pull
git push

%checkout
git status
sutl index.html
"Mess up"
git reset --hard
git status
sutl index.html
"Mess up"
sutl index.css
"Mess up"
git checout --index.html
git diff
git commit -am "clean index.css"
git lg2
ls
git checkout 45a6
cat index.html
git checkout master
git checkout

%merge
git branch
git merge contact
git lg2
git branch -d contact
git lg2
git show XXXXX
git merge test --no-ff

git checkout -b test2
subl index.html
git add .
git commit -m "change index"
subl index.html
git add .
git commit -m "change index"
git lg
git branch
git merge test2
%conflict in html
git difftool --tool-help
git help difftool
git status
git mergetool -t opendiff
git mergetool -t vimdiff
git difftool -t vimdiff
git status
subl index.html
%actual merge manually

%Q&A
subl .git/config
git remote -v
subl .git/config

git branch
git branch -d test2
git branch
git branch -a
%remote master: remotes/orgin/master
git pull
git lg
%fetch info from github git remote add <name> <url>
git branch -r
git branch -a

mkdir student
cd student
%copy other people's code
git clone https://github.com/PeterBell/yahoo2.git
git lg
touch myfile.text
git commit -m "my new life"
cd ..
%fork & clone
pwd
cd yahoo
git clone https://github.com/PeterBell/yahoo2.git
cd ../../yahoo3
touch per5143.txt
subl per5143.txt
git add .
git commit -m "perter's commit to the yahoo3 project"
subl .git/config
pull
git remote add pertermaster http://github.com
git push pertermaster
git pull pertermaster

%stash
cd ./../yahoo2
git status
git push
username
password
clear
git lg
git stash list
subl index.html
"Our address is 1 NY Plaza"
git status
subl index.css
"add new css"
git commit -am "fix css"
git push
username
password
git stash list
git stash pop
git status
git add .
git commit -m "
subl index.html
git stash list
git stash
subl contact.coffee.js
"add new cofffee fiel"
git stash
git stash pop
git stash list
git stash apply
subl index.html
git diff
git stash list
git branch
git checkout -b test3
git status
git stash pop
git stash list
git reset --hard
git status
git stash pop
git commit -am "make a change to the home page"
subl contact.html
"thist is the new phone"
git stash
git stash list
gti stash apply
git commit -m "add new phone"
git checkout status
git branch test3
git lg
git stash list
git reset --hard
git status
git commit -am "added field to the contact form"
git checkout
git stash pop
git diff
git commit -m "added new field to contact us form"
git lg
git stash list

%changing history of log
touch user.txt
git add .
git commit -m "list of usersss"
git lg
git commit --amend
%interactive mode, fix list of uesrs
ls
touch store.html
subl store.html
"store locator is link"
git status
git commit -am "added new store locator and a link page"
git status
git add .
git status
git commit --amend
%interactive mode, fix the error
git status
git lg
git show XXXXX
git lg
ls
touch store.css
touch store.js
git add .
git commit -m "added styling and js to store locator"
git lg
git diff
git reset --soft HEAD~1
git lg
git status
git commit -m "added store locator and link"
%git reset --hard HEAD~1
git status
git commit -m "undid the list"
git reset --hard
git revert
git reflog
git checkout master
git branch
git branch -D bad_code
git show XXXXXXX
git checkout -b good_code XXXXXXXXXX
git reflog
subl index.html
git status
git reset --hard
git lg

Git Basics

# Edit file
vi joke.txt
git diff
git commit –a
git status
# Add new file
vi new.txt
git add new.txt
git commit
git status
# Remove file
git rm new.txt
git commit
git status
# Move file
git mv old.txt new.txt
git commit
git status

Daily workflow

# Get latest and greatest code from origin
git checkout master
git pull
# Create a new workspace
git checkout –b bug1234
# Fix bug 1234 and commit changes
vi bugfix.txt
git commit –a
# Back to master to sync with origin
git checkout master
git pull
# Back to workspace to fold in latest code
# Rebase upstream changes into my downstream branch
git checkout bug1234
git rebase master
# Validate my change against latest stable code
run unittest.txt
# Ready to send downstream changes to master
# Merge my workspace and master so they have identical commits
git checkout master
git merge bug1234
# Push my downstream changes up to origin
git push
# Delete my workspace
git branch –d bug1234

# Unstage changes
git reset [file]
# Undoes all changes
git reset --hard [commit]
# Revert a single file
git checkout -- [file]

# Revert to a commit
revert -n [commit]
# Diff options
git diff [commit] [commit]
git diff master:file branch:file
git diff HEAD^ HEAD
git diff master..branch
git diff --cached
git diff --summary
git diff --name-only
git diff --name-status
git diff -w # ignore all whitespace
git diff --relative[=path] (run from subdir or set path)
# Log|Shortlog options
# --author=jenny, --pretty=oneline, --abbrev-commit,
# --no- merges, --stat, --since, --topo-order|--date-order
git log -- <filename # history of a file, deleted too
git log dir/ # commits that modify any file under dir/
git log test..master # commits on master but not test
git log master..test # commits on test but not master
git log master...test # commits on either test or master
# but not both
git log -S'foo()' # commits that add or remove any file data
# matching the string 'foo()'
git show :/fix # last commit w/"fix" in msg
# Compare master vs branch
git diff master..branch
git diff master..branch | grep "^diff" # changed files only
git shortlog master..branch
git show-branch
git whatchanged master..mybranch
git cherry –v <upstream [<head] #commits not merged upstream
git config core.autocrlf input
git config core.safecrlf true
git config --global push.default tracking # only push current
# Sync branch to master
git checkout master
git pull
# Clean up previous commits before sending upstream
git rebase -i HEAD~n
git rebase -i master mybranch
# Pull requests/tracking branches
[git remote add -f foobar git://github...] # set up remote
git branch --track newbranch foobar/whichbranch
# Push to remote branch
git push [remote] HEAD:[remote-branch]
git push origin HEAD
git push origin :branch (delete remote branch)
# Stashing
git stash list
git stash show -p stash@{2}
git stash [pop|apply] stash@{@2}
git stash drop stash@{2}
# Merge upstream changes with WIP
git stash save "Log msg."
git [pull|rebase master]
git stash apply
# Merge files from another branch into master
git checkout master
git checkout feature path/to/file path/to/another/file
# Copy commit from another branch
git cherry-pick –x [commit] # -x appends orig commit message
# Branching
git branch [-a | -r]
git checkout -b newbranch
git branch -d oldbranch
git branch -m oldbranch newbranch
# Interrupt WIP with quick fix
git stash save "Log msg."
vi file;
git commit -a
git stash pop
# Test incremental changes to a single file
git add --patch [file]
git stash save --keep-index "Log msg."
[test patch]
git commit
git stash pop
...repeat...

Tuesday, November 22, 2016

IPython and Using Notebooks

Python is an open source platform for interactive and parallel computing. It started with the realization that the standard Python interpreter was too limited for sustained interactive use, especially in the areas of scientific and parallel computing.

For OS X users it’s usually recommended that using package managers such as MacPorts and Homebrew instead, installing Python therein, and avoiding using the system Python. A better solution for novices is to install an independent Python distribution, including Anaconda and Enthought.

Installing Anaconda

Anaconda is a free distribution of Python packages distributed by Continuum Analytics. Conda can be used for package management as well as environment management.

bash Anaconda3-4.3.0-MacOSX-x86_64.sh

Installing Homebrew

Homebrew is a missing package management tool.

ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

To check for any issues with the install run

brew doctor

To search for an application:

brew search

To install an application:

brew install <application-name>

To list all apps installed by Homebrew

brew list

To remove an installed application

brew remove <application-name>

To update Homebrew

brew update

To see what else you can do

man brew

If /usr/local/Library/LinkedKegs seems to contain a list of, well, linked kegs, so this should do the trick:

ls -1 /usr/local/Library/LinkedKegs | while read line; do

echo $line

brew unlink $line

brew link --force $line

done

Installing Python

conda create -n py36 python=3.6 anaconda
source activate py36

Installing IPython

conda install ipython

IPython comes with a test suite called iptest.

iptest

Updating Python

All-in-one distributions -

When pip and easy-install are not enough, both Anaconda and Canopy have their own built-in package management systems.

Anaconda provides a powerful command-line tool called conda. conda can be used for package management as well as environment management. Every program runs in an environment that includes the version of Python, IPython, and all included packages.

conda update conda

conda update python

Python requires v3.

Install Python Packages

pip list
pip install -upgrade
pip install -r requirements.txt

To activate the environment

cd /Users/tkmaemd/anaconda/envs/py35/bin
source activate py35
ipython

To deactivate environment

cd /Users/tkmaemd/anaconda/envs/py35/bin
source deactivate py35

Shell integration
ipython
In[1]
Out[2]
?map
??map

Magic commands

OS equivalents: %cd, %env, and %pwd
Working with code: %run, %edit, %save, %load, %load_ext, and %%capture
Logging: %logstart, %logstop, %logon, %logoff, and %logstate
Debugging: %debug, %pdb, %run, and %tb
Documentation: %pdef, %pdoc, %pfile, %pprint, %psource, %pycat, and %%writefile
Profiling: %prun, %time, %run, and %timeit
Working with other languages: %%script, %%html, %%javascript, %%latex, %%perl, and %%ruby

Installing R in Jupyter

1 installing via supplied binary packages

install.packages(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'devtools', 'uuid', 'digest')) devtools::install_github('IRkernel/IRkernel')

2 Making the kernel available to Jupyter

IRkernel::installspec()

3 install basic R packages by conda

conda install -c r r-essentials

Extra for magic commands

With magic commands, IPython becomes a more full-featured development environment. A development session might include the following steps:

Set up the OS-level environment with the %cd, %env, and ! commands.
Set up the Python environment with %load and %load_ext.
Create a program using %edit.
Run the program using %run.
Log the input/output with %logstart, %logstop, %logon, and %logoff.
Debug with %pdb.
Create documentation with %pdoc and %pdef.

This is not a tenable workflow for a large project, but for exploratory coding of smaller modules, magic commands provide a lightweight support structure.

Some observations are in order:

Note that the function is, for the most part, standard Python. Also note the use of the !systeminfoshell command. You can freely mix both standard Python and IPython in IPython.
The name of the function will be the name of the line magic.
The line parameter contains the rest of the line (in case any parameters are passed).
A parameter is required, although it need not be used.
The Out associated with calling this line magic is the return value of the magic.
Any print statements executed as part of the magic are displayed on the terminal but are not part of Out (or _).

Debug example

x=0

1/x

%debug
h
(help)
w
(where am i)
p x
(print)
q
(drop debugger)
%pdb

A full complement of commands is available for navigation:

u/d for moving up/down in the call stack.
s to step into the next statement. This will step into any functions.
n to continue execution until the next line in the current function is reached or it returns. This will execute any functions along the way, without stopping to debug them.
r continues execution until the current function returns.
c continues execution until the next breakpoint (or exception).
j <line> jumps to line number <line> and executes it. Any lines between the current line and <line> are skipped over. The j works both forward and reverse.

And handling breakpoints:

b for setting a breakpoint. The b <line> will set a breakpoint at line number <line>. Each breakpoint is assigned a unique reference number that other breakpoint commands use.
tbreak. This is like break, but the breakpoint is temporary and is cleared after the first time it is encountered.
cl <bpNumber> clears a breakpoint, by reference number.
ignore <bpNumber> <count> is for ignoring a particular breakpoint for a certain number (<count>) of times.
disable <bpNumber> for disabling a breakpoint. Unlike clearing, the breakpoint remains and can be re-enabled.
enable <bpNumber> re-enables a breakpoint.

Examining values:

a to view the arguments to the current function
whatis <arg> prints the type of <arg>
p <expression> prints the value of <expression>
Matering IPython 4.0

Chapter 1 Using Python for HPC

High Performance Computing

API allowed people to store data on those machines (the Amazon Simple Storage Service, or S3) and an API allowed people to run programs on the same machines (the Amazon Elastic Compute Cloud, or EC2). Together, these made up the start of the Amazon Cloud.

Fortran provided answers to problems of readability, portability, and efficiency within the computing environments that existed in early machines. How Python/IPython, while not originally designed for runtime efficiency, takes these new considerations into account.

Chapter 2 Advanced Shell Topics

IPython beyond Python

There are too many magic commands to go over in detail, but there are some related families to be aware of:

OS equivalents: !ls, %cd, %env, and %pwd
Working with code: %run, %edit, %save, %load, %load_ext, and %%capture
Logging: %logstart, %logstop, %logon, %logoff, and %logstate
Debugging: %debug, %pdb, %run, and %tb
Documentation: %pdef, %pdoc, %pfile, %pprint, %psource, %pycat, and %%writefile
Profiling: %prun, %time, %run, and %timeit
Working with other languages: %%script, %%html, %%javascript, %%latex, %%perl, and %%ruby

Terminal Python
stdin&stdout
Python execution
JSON
IPython Kernel

Chapter 3 Stepping Up to IPython for Parallel Computing

Serial Processes

Program counters and address spaces
Batch systems
Multitasking (Cooperative multitasking / Preemptive multitasking) and preemption

Threading

Threading in Python

Limitations of threading

Global Interpreter Lock

Using multiple processors

The IPython parallel architecture

Getting started with ipyparallel

Parallel magic commands

Types of parallelism

Data Parallelism

Application steering

Sunday, November 20, 2016

WGSN Insight

WSGN Product Breakdown - 23 Categories

Automotive
Bed & Bath
Colour
Consumer Electronics
Decorative Accessories
Experience Design
Fashion Connection
Food & Drink
Furniture & Lighting
Garden & Outdoor
Hospitality
Interior Style

Kids’ Room
Kitchen & Tabletop
Materials & Surfaces
Paper & Packaging
Pets
Print & Pattern
Seasonal Gifting
Textiles
Vintage & Craft
Walls & Floors
Wellness

Insight - Transformative consumer and market intelligence
- In-depth insight into the consumer of today and tomorrow.
- Complete coverage of trends in retail, consumer markets and marketing.
- Global team of top industry experts and on-the-ground trend hunters.
- Original content with fresh perspectives to spark outside-the-box thinking.

Fashion - The world's #1 fashion trend forecaster.
- Enhance your planning with color and trend forecasts 2+ years ahead.
- Get inspired by more than 22m images and thousands of royalty free CADs and designs.
- Drive sales by staying on-trend with over 250 new reports each month.
- Save half a day every week with our productivity tools and city guides.

Lifestyle & Interiors - trend service for the consumer lifestyle and interiors industry.
- Plan ahead with color and trend reports, with specific edits for interiors.
- Develop inspired design with in-depth content in 23 sections, from automotive to wellness
- Drive revenue by staying on-trend with over 50 new, in-depth market reports each month.
- Save time with our trade show summaries, so you don't have to be there.

Instock - The high data analytics platform for critical retail decisions.
- Make faster buying and merchandising decisions with access to a daily feed of e-commerce data.
- Understand you market and product position with more than 12,000 brands and retailers analyses.
- Make smarter trading decisions with regular stock drop reports and more than 100m retail SKUs monitored and analyses.
- Improve range planning by analyzing competitor data by color, price and product mix.

Styletrial - Rapid consumer feedback to improve buying, merchandising and pricing.
- Reduce investment risk by testing new product and packaging ideas before you go to market.
- Improve certainty of buying and merchandising decisions by accessing millions of US and UK consumers.
- Ensure alignment of price and target audience to your product offering.
- Make more rapid decisions by receiving actionable feedback with result with five days.

Mindset - Tailored trend consulting by world-class experts.
- Improve your strategy by accessing our dedicated team of market and consumer insight specialists.
- Hone your brand proposition based on a tailor-made interpretation of current fashion and lifestyle trends.
- Improve the performance of your products and your team with our innovation workshops.
- Enhance your retail or trade show offer with our tailor-made trend zones and retail edits.

WGSN Future Key Takeaways

An experiential and innovative environment

Experimentation is key for retailers to be successful in the future, and needs to be built into business models. Over the course of a short period, the business was able to experiment with a number of emerging technologies, and was able to show in a live situation what worked and what didn’t.

Jeun Ho Tsang, the co-founder of London-based experimental store laboratory The Dandy Lab, explained 83% of retailers are failing to innovate. Jeun Ho Tsang said RFID loyalty cards proved successful as it meant the business knew what its customers had seen previously and their colour preferences. This enabled it to create a better relationship with them. Mobile payments also had a significant uptake, with 42% of shoppers signing up. Customers initially use the app to pay, and then continued to do so on repeat visits to scan and find out more about items.

The Role of the Store

Shumacher said that by 2020, 39% of purchases will be influenced by omnichannel, and the move to an experience economy means changing how we view the store. Many stores won’t sell product in 20 years’ time, but they will remain one of the most important components of the brand experience. Where today, physical retail’s success is down to sales, success metrics in the store of the future will be things like customer experience per square metre, active participation, social interaction, and how well retailers have staged the product. Retailers need to understand what a good brand relationship is, how what that involves is changing, and how consumer expectations are rising. Brands that are doing well are those with a “really strong sense of purpose” which aligns with that of the customer base. “A brand purpose is useful, and a short cut to an emotional relationship,” said Betmead. “What matters is that you care about something that they care about.”

Instagram Stories: Brand Narratives
Instagram gives users across the globe an insider's look into celebrity, fashion, luxury and more.
A new brand created a story to announce a new product offering.The first clip was to draw viewers in, provoking curiosity and enticing them to keep watching. Next, it featured images of the product (with clever use of emoji). This type of announcement allows audiences to feel as if they were let in on a secret of sorts, likely increasing the audience's receptiveness of the brand.

Consumer Attitudes

Chinese Millennials

As the importance of lifestyle continues to grow in China, active and semi-fine jewelry has emerged as key retail categories to watch. Currently considered as affordable fast-fashion accessory or a serious financial investment, contemporary-level jewelry is still a relatively new concept for the Chinese consumer.

The Lonely Generation
Social, community-focused shopping experiences have become a retail priority for brands operating in China as a way to attract the consumers. In line with the global importance of hybrid lifestyle stores, women-only concepts and in-store coffee shops are on the rise as retailers serve to fulfill the human need to connect.
On social media channels such as Instagram and Weibo, photo captions and hashtags feature phrases such as #lonely, #lonelytodeath, #lonelyphotgrapher and #lonelygroumand.

Self-Obsessed
Following an operating room selfie scandal and a story about changing room exhibitionists, the generation is now beginning to challenge the own digital narcissism as they become more self-aware and strive for personal improvement.

Economy as Culture
Driven by the need to provide for the past and the future generations, personal financing has become a form of pop culture among Millennials. Popular hobbies include investing in stocks and venturing into entrepreneurship.

Tech Entrepreneurs
Tech has emerged as a key industry to watch for sustaining the country's future financial growth. Expect to see this group grow among Generation Z as favorable entrepreneurial policies are set to roll out in higher education institutions in the near future.

Urban Inspiration
Urban centers in China represent a dream for a better quality of life for Millennial consumers. As an iconic symbol for innovation and opportunity, the relationship with the city serves an important inspiration for art and contemporary films.

Women-only Socials
Gender-specific group chats focusing on personal and professional support have also been growing to fulfill the need for a safe, growth-oriented community. This generation's interest in feminism is no the rise.

Digital Experience
The most popular topics on the platform include WeChat e-commence, health, red envelopers, travel, humor and mobile phone costs.

Fashion is a multi-billion dollar industry with social and economic implications worldwide. The fashion industry has traditionally placed high value on human creativity and has been slower to realize the potential of data analytics. With the advent of modern cognitive computing technologies (data mining and knowledge discovery, machine learning, deep learning, computer vision, natural language understanding etc.) and vast amounts of (structured and unstructured) fashion data the impact on fashion industry could be transformational. Already fashion e-commerce portals are using data to be branded as not just an online warehouse, but also as a fashion destination. Luxury fashion houses are planning to recreate physical in-store experience for their virtual channels, and a slew of technology startups are providing trending, forecasting, and styling services to fashion industry.

Cold Start Analysis

Increase the duration of moving window.

Develop hierarchy of keyword groups and calculate PTQS

Infer PTQS from partner attributes

The hierarchy structure can deal with the cold-start and smoothing to some extent.

Isotonic regression in scikit-learn

http://tullo.ch/articles/speeding-up-isotonic-regression/

Isotonic regression is a useful non-parametric regression technique for fitting an increasing function to a given dataset.

A classic use is in improving the calibration of a probabilistic classifier. Say we have a set of 0/1 data-points (e.g. ad clicks), and we train a probabilistic classifier on this dataset.

Unfortunately, we find that our classifier is poorly calibrated - for cases where it predicts about 50% probability of a click, there is actually a 20% probability of a click, and so on.

With a trained isotonic regression model, our final output is the composition of the classifiers prediction with the isotonic regression function.

For an example of this usage, see the Google Ad Click Prediction - A View from the Trenches paper from KDD 2013, which covers this technique in section 7. The AdPredictor ICML paper paper also uses this technique for calibrating a Naive Bayes predictor.

We'll now detail how we made the scikit-learn implementation of isotonic regression more than ~5,000x faster, while reducing the number of lines of code in the implementation.

The nature of a conversion event can vary widely across advertisers. Conversion events can be defined by: submission of ac completed form, a purchase event, subscribing a service, etc. Each of these has different intrinsic conversions rates.

A partner generates traffic from several websites, which may vary widely in traffic quality. Source tag may be a more natural granularity, however, source tag are susceptible to manipulation.

Classified and structured match

Product match

Domain match

Optimal Frequency

1 Introduction

First transaction after running EM campaign really counts.

Determine the optimal frequency and impose a sensible cap, which enable us decrease cost per sale.

This study focuses on optimal frequency from a direct response standpoint, namely, how to increase the efficiencies of a campaign to deliver leads and sales. After analyzing campaign data, we are able to look into the impact of frequency on redemptions and sales.

2 The wrong path to optimal frequency

A consumer who redeemed after the third mails, but subsequently receives other mails. Attributing the redemptions to the total mails would grossly overestimate the level at which consumer redeemed.

We are able to identify the most common frequency level prior to redemptions. Since the vast majority of mails from consumers who never redeem are ignored, realistic redemption rates are impossible to estimate.

3 The right path to optimal frequency

Cumulative redemption rates reveal the true optimal frequency level. By looking at cumulative mails and redemptions, a model is crated of how redemptions are harvested with each incremental mail. In effect, this methodology simulates what would have happened had the campaign been frequency capped at different levels.

The redemption rate on the first email was the highest, though the first three** all had at least 100% lift on average. At any given moment there are only a fraction of consumers who will immediately respond to your solicitation. Thus, a direct marketing campaign’s performance will depend on its ability to maximize reach at the optimal frequency level and boost the frequency of consumers that have only seen a few mails.

4 The most efficient frequency vs. the most profitable frequency

The frequency level with the highest response rate may not necessarily be the same frequency level to maximize your profits. There will always be a trade-off advertisers have to manage between efficiency and volume. Restricting frequency to only one email per consumer might achieve a lowest possible cost-per response, but you may end up with a very low total number of responses.

5 What this means for marketers

It is important for advertisers to recognize and react to the amount of money being wasted on excessive high frequency users. The culprits are not the consumers who consume for, five, or six mails, but rather the thousands of consumers who receive hundreds of mails without any response. We suggest basic frequency caps. Imagine what a frequency cap could mean when a consumer receives 1000 emails – a cap at 10 emails would drive reach to at least 100 additional potential customers.

It is significant knowing how various caps will impact campaign performance, monitoring where gross waste is significant, understanding whether negotiated caps are truly in effect, which will help in planning and buying media more intelligently.

Quantify the trade-offs between frequency levels and response rates.

Quantify how much pricing premiums for capped inventory are actually worth.

Strategically pick frequency levels that maximize total response yields, while still meeting the cost per response goals.

Identify customer purchase frequency increased as a result of a specific marketing campaign.

Identify email campaign assisting direct mail campaign for in-store purchase.

Typical visit frequency

Consider the case of a user who deletes cookies every day.

If a particular site has a group of very addicted uses who return frequently, even if a small number delete cookies daily, the result will significantly inflated numbers of cookies relative to the number of actual people who visited the site. In such cases it is not unusual to see an average of two or more cookies for every user over the courses of a month.

Of course, most users only result in one cookie, but a small number generate many cookies.

Controlled experiments have been done by economists and social scientists to show the effect of user pass exposure. While the results from the studies differ in different scenarios, largely it has been shown that with increased past exposure the user is more like to response positively. Intuitively, past exposure to an ad might help in several ways such as increased brand awareness and familiarity with the product, or even an increase probability to the user to notice the ad. At the same time, some studies have also show an “ad fatigue” effect where users might tired of an ad if is displayed too often.

Some site tend to have mostly passers-by, i.e., visitors that only go to the site once over a given time period. No impact on the total number of cookies for the site.

There exist some addicted users who visit particular sites frequently, which leads to significantly inflated frequency numbers of addicted users relative to the number of average people who visited the sites.

The frequency distribution of user visits RON is skewed, that is, a small portion of users are frequent visitors, while the remaining are infrequent visitors. Hence, the sample size available to estimate item affinity per user is small for a large number of users.

Google Analytics Reports - Key KPIs

Report and analyze

Monitor account health and performance

Monthly performance report

Site and app activity in real-time report

Audience reports

Overview of audience reports

Active users

Lifetime value

Demographics and interests reports

User flow report

User views and cross device reports

Advertising reports

AdWords reports

Audience for dynamic remarketing

Audience

Acquisition reports

Behavior reports

Conversion reports

Mobile App reports

Flow visualization reports

Campaign Report

Campaign ID

Campaign DESC

Audience ID

Audience DESC

Audience Size Current - # of email users

Campaign Utilization % - percentage of email users in the universe

Max eCPM – planned cost per email

Net delivered emails

Targeted audience – total number of unique email from the time period

Average frequency

Unique opens

Unique open rates

Unique clicks

Unique click rates

Unique conversions

Unique conversion rates

Average days to convert

Total spending

Total revenue

Margin

Daily Email Behavioral Attribute Report

Number of emails per day

Number of opens per day

Average unique open rate

Number of clicks per day

Average unique click rate

Number of conversions per day

Average unique conversion rate

Number of unique email address having purchase in the past 30 days

Temporal Email Behavioral Aggregation Attribute Report

Coupon Dashboard,

Coupon Wallet,

Dynamic Pricing,

Main Event Reporting

Sales Datamart

ADM process

Use Cases by Artificial Intelligence

--- Retail

WGSN Future Key Takeaways

Retail Analytics Models

- Data mining for customer relationship management (CRM)
1 Matching campaigns to customers
a. cross sell campaigns
b. up sell campaigns
c. usage stimulation campaigns
d. loyalty program

2 Segmenting the customer base
a. finding behavioral segments
b. typing market research segments to behavioral data

3 Reducing exposure to risk
a. predicting who will default
b. improving collection

4 Detecting customer lifetime value
a. Using current customer to learn about prospects
b. Start tracking customers before they become customers
c. Gather information from new customers
d. Acquisition time variables can predict future outcomes

5 Cross-selling, up-selling, and making recommendations
a. what to offer it
b. when to offer it
c. finding the right time for an offer
d. making recommendations

6 Detection and churn
a. recognizing churn
b. why churn matters
c. different kinds of churn including voluntary/forced, involuntary, expected

7 Different kinds of churn model
a. predicting who will leave
b. predicting how long customer will stay

Retail Analysis

Successful retailers are shifting away from omnichannel towards serving the customer as point of sale, and are looking at ways to best link up all the customer touchpoints to create a personalised and convenient shopping experience.

1 Demand-Driven Forecasting

This analysis involves gathering demographic data and economic indicators to build a picture of spending habits across the targeted market. For example, it would be more valuable to discover patterns due to the hurricane that were not obvious. To do this, analysts might examine the huge volume of retailer data from prior, similar situations to identify unusual local demand for products. From such patterns, the company might be able to anticipate unusual demand for products and rush stock to the stores ahead of the hurricane’s landfall.

One major problem for fashion companies is placing their production orders without having the actual knowledge of demand. Moreover, the demand is influenced by additional factors such as the economic situation, public holidays or changing weather conditions. Furthermore, items of a fashion collection are mostly replaced by following seasons collections, and therefore, companies often face a lack of historical sales data.

Use case: Weather based Recommendation

Use case: Ad-word optimization and ad buying. Calculating the right price for different keywords/ad slots.

Use case: Inventory Management (how many units)

In particular, perishable goods

2 Merchandising and Planning

With 1bn monthly active users on WhatsApp, 900m on Facebook Messenger, and the number of Snapchat users nearing 200m, retailers can no longer limit their communications to traditional channels such as email and phone for customer service. Trend forecasting algorithms comb brand perception surveys, social media posts and web browsing habits to work out what’s causing a buzz, and ad-buying data is analyzed to see what marketing departments will be pushing.

Brands and marketers engage in “sentiment analysis”, using sophisticated machine learning-based algorithms to determine the context when a product is discussed, and this data can be used to accurately predict what the top selling products in a category are likely to be. Meanwhile understanding customer’s enthusiasm, that is, what draws a customer to shop at the brand, the marketers can better craft messages and personalized visuals via direct mail, email, and mobile channels.

Can we identify fashion related discussion on Twitter?
- brand related tweets
Can we identify certain features of products such as colors, material or fashion styles?
- product type related tweets
Can we identify certain features of brands?
- event related tweets: fashion week
Can we identify associations between the mentioned attributes products/brands?

Use case: Survey study of First Insight on online product testing

Use case: Merchandizing to start stocking & discontinuing product lines

3 Pricing Strategy

Retailers spend millions on their real time merchandising systems, allowing action to be taken based on insights in a matter of minutes. Big Data also plays a part in helping to determine when prices should be dropped – known as “mark down optimization”. Prior to the age of analytics most retailers would just reduce prices at the end of a buying season for a particular product line, when demand has almost gone. However analytics has shown that a more gradual reduction in price, from the moment demand starts to sag, generally leads to increased revenues. Experiments by US retailer Stage Stores found that this approach, backed by a predictive approach to determine the rise and fall of demand for a product, beat a traditional “end of season sale” approach 90% of the time.

Use case: Everlane cost breakdown

What for: Optimize per time period, per item, per store

Was dominated by Retek, but got purchased by Oracle in 2005. Now Oracle Retail. JDA is also a player. (supply chain software).

4 Targeted marketing

Targeted advertising platforms of the type pushed by retailers offer businesses of all sizes the chance to benefit from big data-driven segmented marketing strategies. And also, there is undoubtedly still a great deal of untapped potential in social media, customer feedback comments, video footage, recorded telephone conversations and locational GPS data to benefits those who put it to best work, offering social analytics to help anyone work out where their customers are waiting for them on social media. Finding behavioral segments and linking market segments to behavioral data are crucial to marketing and advertising

Use case: RFM Segmentation Method

How often are customers shopping with retailers (loyal or infrequent customers).

Total Shopping – all shopping in any channel is used to score the customer/address.

On Coupon Shopping – Only on coupon transactions are used to score the customer/address (previously the only segmentation available to the company for direct mail targeting).

ECOM Shopping – Only ECOM transactions (currently web and TBS) are used to score the customer/address.

In-Store Shopping – Only In-store transactions are used to score the customer/address.

Use case: Customer segmentation based recommendation

5 Cross-Selling and Up-selling Marketing Campaigns

Up-selling - Given a customer’s characteristics, what is the likelihood that they‘ll upgrade in the future.

Given a customer’s past browsing history, purchase history and other characteristics, what are they likely to want to purchase in the future. Identifying creditable associations between two or more different items can help a business stakeholders make decisions such as what discount to offer, when to distribute coupons, when to put a product on sale, or how to present items in store displays. Employ targeted cross-selling tactics to increase sales among existing customers. And also, given a customer’s characteristics, what is the likelihood that they‘ll upgrade in the future. Market Basket analyses is able to discover and visualize item associations and purchase sequence, seamless integration of rules with inputs for enriched predictive modeling to predict the average basket size per store and how that changes regionally.

Use Case: Market Basket

Key Features of E-Miner Market Basket Analysis

1) Associations and sequence discovery / visualization:

2) Interactively subset rules based on lift, confidence, support, chain length, etc

3) Seamless integration of rules with other inputs for enriched predictive modeling

4) Hierarchical associations: Derive rules at multiple levels. Specify parent and child mappings for the dimensional input table.

5) Discount targeting: What is the probability of inducing the desired behavior with a discount?

6) The Average basket size per store and how that changes regionally.

Use case: Product mix

What for: What mix of products offers the lowest churn? Eg. Giving a combined policy discount for home+auto = low churn

Usage: online algorithm and static report

Use case: Wallet share estimation

What for: the proportion of a customer’s spending in a category accrues to a company allows that company to identify upsell and cross-sell opportunities

Usage: Can be both an online algorithm and a static report showing the characteristics of low wallet share customers

How much of customers wallet do offline and online stores garner.

6 Customer Acquisition and Retention

It is the simplest setting where one encounters the exploration-exploitation dilemma.A good way to find good prospects are to look in the same places that today’s best customers came from. That means having some of way of determining who the best customers are today. It also means keeping a record of how current customers were acquired and what they looked like at the time of acquisition. The danger of relying on current customers to learn where to look for prospects is that the current customers reflect past marketing decisions. Studying current customers will not suggest looking for new prospects any place that hasn’t already been tried. Nevertheless, the performance of current customers is a great way to evaluate the existing acquisition channels. Ideally you should start tracking customers before they become customers. Gather information form new customers at the time they are acquired. Model the relationship between acquisition-time data and future outcomes of interest.

Usage: can be both an online algorithm and a statistic report showing the characteristics of likely engaged customers

New: Customer/Address made the first identified transaction in a selected fiscal period (fiscal month).

Existing: ‘Base customer or address’ or ‘Existing <= 24 months’ – may or may not have made a transaction in the selected fiscal period (fiscal month) and their last identified transaction in the last 2-24 months.

Reactivated: Customer/Address mades transaction in the selected fiscal period (fiscal month) in the selected fiscal period (fiscal month) and their last transaction > 24 fiscal months ago.

Lapsed: Customer/Address existing in last 2-48 months – made their last identified last transaction in the last 25-48 fiscal months.

Attrited: Customer/Address made their last identified transaction is more than 48 months ago.

https://news.greylock.com/why-onboarding-is-the-most-crucial-part-of-your-growth-strategy-8f9ad3ec8d5e

What is your frequency target? (How often should we expect the user to receive value?)
What is your key action? (The action signifies the user is receiving enough value to remain engaged)

The principles of successful onboarding

Principle #1: Get to product value as fast as possible — but not faster
A lot of companies have a “cold start problem” — that is, they start the user in an empty state where the product doesn’t work until the user does something. This frequently leaves users confused as to what to do. If we know a successful onboarding experience leads to the key action adopted at the target frequency, we can focus on best practices to maximize the number of people who reach that point

Principle #2: Remove all friction that distracts the user from experiencing product value
Retention is driven by a maniacal focus on the core product experience. That is more likely to mean reducing friction in the product than adding features to it. New users are not like existing users. They are trying to understand the basics of how to use a product and what to do next. You have built features for existing users that already understand the basics and now want more value. New users not only don’t need those yet; including them makes it harder to understand the basics. So, a key element of successful onboarding is removing everything but the basics of the product until those basics are understood. At Pinterest, this meant removing descriptions underneath Pins as well as who Pinned the item, because the core product value had to do with finding images you liked, and removing descriptions and social attribution allowed news users to see more images in the feed.

Principle #3: Don’t be afraid to educate contextually
There’s a quote popular in Silicon Valley that says if your design requires education, it’s a bad design. It sounds smart, but its actually dangerous. Product education frequently helps users understand how to get value out of a product and create long term engagement. While you should always be striving for a design that doesn’t need explanation, you should not be afraid to educate if it helps in this way.

Onboarding is both the most difficult and ultimately most rewarding part of the funnel to improve to increase a company’s growth. And it’s where most companies fall short. By focusing on your onboarding, you can delight users more often and be more confident exposing your product to more people. For more advice on onboarding, please read Scott Belsky’s excellent article on the first mile of product.

7 Customer Churn Analysis and Reactivation Likelihood

Churn prediction is one of the most popular use cases for people who want to leverage machine learning. It has a large business value and benefit attached to itself especially in industries like the telecom and banking. Several challenges such as the skewed nature of the data set available and the ability to decide which models to use are going to be under a lot of debate. Working out the characteristics of churners allows a company to product adjustments and an online algorithm allows them to reach out to churners

Usage: can be both an online algorithm and a statistic report showing the characteristics of likely churners

Reactivation likelihood

What is the reactivation likelihood for a given customer?

8 Fault detection

There is a lot said online these days and it is hard to determine what is true and what is fake. We have bots smart enough to publish content like human beings and there are social aspects attached to the ratings of various entities online. Machine learning will be leveraged as a big challenge to determine the veracity/truth of information online.

Usage: use shrinkage analysis on theft analytics/prevention

Usage: can be both an online algorithm and a statistic report showing the characteristics of likely fault alerts

9 Customer Behavior/ Customer Segmentation

Deciding which customers are likely to want a particular product, and the best way to go about putting it in front of them, is key here. At a basic level, it means that knowing important factors about the customers such as gender, demographic, location, website browsing habits, search habits, and where they shop in store. And also, measure the influence of all touch points on a customer’s journey to purchase – online, offline and across devices – using sophisticated measurement systems. Track the customer’s journey through each channel – TV, display, search, email, and direct mail – providing a holistic view of how a valuable customer makes a purchase. To this end retailers rely heavily on recommendation engine technology online, and data collected through transactional records and loyalty programs off and online. Increase customer loyalty will create better retention programs to increase loyalty and customer lifetime value among existing customers.

What for: understand qualitatively different customer groups, then we can give them different treatments (perhaps even by different groups in the company). Questions be addressed like what makes people buy, stop buying, etc

Where else do customers spend money, in and out of the category

Different income levels of consumers across the board and where Walmart is losing or gaining market share amongst specific income levels.

Analyze regional markets, climate patterns and consumer spending when planning promotions and pricing strategies.

10 Customer Lifetime Value / Predicting Lifetime Value (LTV)

Case study: Customer lifetime value propensity application based on marketing customer type

What for: if you can predict the characteristics of high LTV customers, this supports customer segmentation, identifies upsell opportunities and supports other marketing initiatives.

Increase customer loyalty will create better retention programs to increase loyalty and customer lifetime value among existing customers.

Usage: can be both an online algorithm and a static report showing the characteristics of high LTV customers

11 Campaign Effectiveness

Over time how are campaigns affecting market share among the competitive set across region, DMA and certain zip codes.

It provides an insider's look into the best practices, challenges and examples of brands taking advantage of the closed-off social platform.

12 SKU lifetime Value - Inventory Management (how many units)

In particular, perishable goods

13 Channel Optimization

What is the optimal way to reach a customer with certain characteristics?

14 Location of New Stores / Store Placement

For a given trade area, is there any significant relationship between sales volume of stores and customer’ sentiments towards the brand? Are there significant correlations? How does the above relationship change with the presence of location based demographics and socioeconomic factors such as ‘Total number of households’, ‘Average home value’ and ‘Total retail spending’ in the trade area of the store location. The association of customer sentiments and retail sales improves.

Pioneerd by Tesco. Dominated by Buxton. Site selection in the restaurant industry is widely performed via Pitney Bowes. Where do I open my next store based on similar competitor success in certain DMAs and ZIP codes.

15 Warranty analytics

Rates of failure of different components

What types of customers buying what types of products are likely to actually redeem a warranty

18 Product layout in stores

Plan-o-gramming

Hierarchical taxonomy of interests

Commercial relevant categories including retail, travel, automotive, finance, CPG, Pharmaceuticals, Small business, Sports, etc.

18 top-level categories

Up to 6 levels deep, 90% of categories are at depth 4 or lower

Around 500 categories are being actively used worldwide

Assumptions on collected data:

1 Have to ignore the temporal factors that could potentially influence sentiments such as festivals, holidays and other economic factors including unemployment rate, poverty level etc.

2 Use 100 mile radius from the store location to collect tweets to maintain the sample of the order of 1000 tweets for each location.

3 A trade area for each store is assumed to be a circle of 10 mile radius from the store location. Demographic factors and sentiments of people residing in a trade area are characteristics of potential customers.

4 For a given store location, although the sample of tweets collected for the analysis represent data from a much larger trade area, the sentiments of people remain fairly uniform within the trade area including the studied 10 mile radius trade area.

-- Product Design

Color Play - create own palettes using full Pantone, CSI, CNCS color libraries
Clip images from anywhere on the web and add to your own library
Plan research trip by expert City by City Travel guides

Marketing
● Predict Lifetime Value (LTV): Predict the characteristics of high LTV customers; this supports customer segmentation, identifies up-sell opportunities, and supports other marketing initiatives.
● Churn: Determine the characteristics of customers who churn (i.e., customer defection); this enables a company to develop adjustments to an online algorithm that allows them to reach out to churners.
● Customer segmentation: Allows you to understand qualitatively different customer groups to answer questions like what makes people buy, stop buying, etc.
● Product mix: What mix of products offers the lowest churn rate? For example, giving a combined insurance policy discount for home and auto yields a low churn rate.
● Cross-selling/up-selling and recommendation algorithms: Given a customer’s past browsing history, purchase history, and other behavioral characteristics, what are they likely to want to purchase (or upgrade to) in the future?
● Discount targeting: What is the probability of encouraging the desired behavior with a discount offer?
● Reactivation likelihood: What is the likelihood of reactivation for a given customer?
● Google Adwords optimization and ad buying: Determine the optimal price for different search keywords and ad slots.

Sales
● Lead prioritization: Determine the likelihood that a given sales lead will close.
● Sales forecasting: Provide strategic planning and insight into the sales forecasting process.

Supply Chain
● Demand forecasting: Determine optimal inventory levels for different distribution centers, enabling a lean inventory and preventing out of stock situations.

Risk Management
● Fraud detection: Predict whether or not a transaction should be blocked because it involves some kind of fraud, e.g., credit card fraud or Medicare fraud.
● Accounts payable recovery: Predict the probably that a liability can be recovered, given the characteristics of the borrower and the loan.

Customer Support
● Call center management: Call center volume forecasting (i.e., predicting call volume for staffing purposes).
● Call routing: Determine wait times based on caller ID history, time of day, call volumes, products owned, churn risk, LTV, etc.

Human Resources
● Talent management: Establish objective measures of employee success.
● Employee churn: Predict which employees are most likely to leave.
● Resume screening: Score resumes based on the outcomes of past job interviews and hires.
● Training recommendation: Recommend a specific training program based on employee performance review data.

Ad products or features: multiple impressions, streaming fatigue, NSFW.
Brand Survey targeting algorithms
Performance bidding, auction simulator parameter tuning
Deep dive analysis for A/B experiments
Diagnosis analysis of CTR prediction model and logging diagnostic

User Journey

User Journey

Historically, the most sophisticated marketers have relied on top-down ad campaign planning. They develop econometric models by looking at the distribution of the whole advertising budget. They analyze changes in allocation and one-time promotions and see how those changes affect the key performance indicators (KPIs), which may be opening a new account, or making an in-store purchase, etc.

With the advent of big data, marketers worldwide are attempting to use data and analytics to solve problems previously out of their reach. Marketers now can apply big data and analytics to create competitive advantage within their markets, often focusing on building a thorough understanding of their customer base.

High-priority big data and analytics projects often target customer-centric outcomes such as improving customer loyalty or improving up-selling.

The promise of big data analytics is that marketers can analyze information about the digital activity of the purchaser and combine it with the knowledge of television, radio, billboard, and print campaigns to tailor marketing messages and, ultimately, improve return on investment.

But as marketers begin to execute on big data and analytics projects, many quickly run into a road-block.

Traditional data sources: addresses and contact details, identifiers, contracts and accounts, relationships and support history.

Marketers rely on digital scoring of actions, starting from the bottom up with the KPI, trying to work backwards to see the touch points along the consumers digital journey by telling exactly what a specific buyer sawing the ads, and how long the buyer watching a video or lingered on a page carrying the ad. It’s all found by backwards tracking from the point of sale of the product ultimately bought.

A key challenge for any marketer is deciding what mix of media will best promote a product or service. By media-mix modeling using big data and machine learning, there are a lot of micro-efficiencies we can tap into.

To understand the impact of temporal spacing between different touch points on a consumer’s journey of whether to purchase product, we create an individual level data set with exogenous variation in the spacing and intensity of ads. The data show that at a purchase occasion, the likelihood of a product’s purchase increases if ifs past ads are spread apart rather than bunched together, even if spreading apart of ads involves shifting some ads away from the purchase occasion.

In the behavior literature, repeated exposure to advertising for a product strengthens the memories associated with it, thereby increasing the likelihood of their recall at a purchase occasion, so short-term effects are important for the carry-over of ad effects to future occasions. To quantify the long-term effects of advertising on consumer’s decisions, on the other hand, researchers have used models of advising carry-over that accumulates due to advertising exposures. Estimating the impact of carry-over is the ability to measure the casual effect of advertising on consumer decision.

Challenges: individual level data, targeting of ads, selection problem.

Because online marketplaces aggregate products from a wide array of providers, selection is usually wider, availability is higher, and prices are more competitive than in vendor-specific online retail stores.

For marketers trying to maximize the return of investment, predictive analytics based on big data is an exciting new tool. The predictive analytics based on big data holds the promise of creating a detailed view of what works, providing guidance that has never been available before for the fine tuning of advertising campaigns.

Build Y Ad Manager Platform, large scale ad buying platform with powerful optimization and actionable insights based on deep science and big data, across display, video, and mobile channels, on Y and third party real-time bidding (RTB) programmatic inventory.

Data-driven marketers, continue to assign credit for campaign conversions in such a myopic way?

The problem is that last-touch and last-click attribution have been a pretty consistent standard. Collectively evolving to something better is one of the hardest changes to make in advertising. It is also one of the most important next milestones for the industry.

For media agencies, measuring performance at every level in digital media, especially with the ability to adjust in real time, means that view-through credit beyond last-touch is an integral part of the marketing plan and not an exception. It is becoming the new standard. View-through credit maps the user’s journey from first impression to conversion, whether they converted from a click-through or visited the advertiser’s site independently after being served enough ads to sway them into buying online. It’s what offline advertising always struggled to understand: Did my customer buy my product because of that billboard or my newspaper ads? Or did both contribute?

In theory, the algorithms should be able to allocate budget to advertising networks that police the inventory to avoid phony ads. They should generate more key-performance indicators. By the same token, ads that aren’t viewable won’t drive KPIs. It isn’t clear whether the promise is being fulfilled.

Predictive analytics may discover correlations among categories of potential buyers that whould be unlikely to occur to human marketers. Advertising is all about correlation.