My DS Coding Bolg: December 2016

Saturday, December 31, 2016

Mac, Excel and Jupyter Hotkeys

Excel

Paste cmd+v
Copy cmd+c
Cut cmd+x
Paste special cmd+ctrl+v

Clear delete
Undo cmd+z
Redo cmd+y

New blank workbook cmd+n
Print cmd+p
Save cmd+s
Close window cmd + w
Quit Excel cmd + q

Underline cmd+u
Italic cmd+i
Bold cmd+b

Select all cmd+a
Add or remove a filter ctrl+shift+l

Fill Down cmd+shift+down, cntl+d
Fill Right cmd+shift+right, cntl+d

Screen right fn+option+arrow down
Screen left fn+option+arrow up
Move to Last cell fn+control+arrow right
Move to first cell fn+control+arrow left

Display the Go To dialog box cntl+g
Display the Format Cells dialog box cmd + 1
Display the Replace dialog box cntl+h
Display the Save As dialog box cmd+shift+s
Display the Open dialog box cmd+o

Jupyter

Tuesday, December 27, 2016

Hello TensorFlow

In November, 2015, Google open-sourced its numerical computation library called TensorFlow using data flow graphs. Its flexible implementation and architecture enables you to focus on building the computation graph and deploy the model with little efforts on heterogeous platforms such as mobile devices, hundreds of machines, or thousands of computational devices.

#########################################################

## Installations of TensorFlow
#########################################################
Anaconda is a Python distribution that includes a large number of standard numeric and scientific computing packages. Anaconda uses a package manager called 'condo' hat has its own environment system similar to Virtualenv.

- Install Anaconda

- Create a condo environment

conda create -n tensorflow python=3.6

conda install -c conda-forge tensorflow

conda install ipython

conda install jupyter

which python

which ipython

which jupyter

- Activate the condo environment and install TensorFlow in it.

source activate tensor flow

- After the install you will activate the condo environment each time you want to use TensorFlow.

export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-0.12.0rc0-cp27-none-linux_x86_64.whl

- Optionally install iPython and other packages into the condo environment.

source activate tensor flow

source deactivate

Install Python

### Python commands

快速安装python命令行工具

```

python3 -m pip install --user pipx

python3 -m pipx ensurepath

```

Pipenv自动为您的项目创建和管理virtualenv，以及在安装/卸载软件包时从Pipfile添加/删除软件包。它还生成了非常重要的Pipfile.lock文件，用于生成确定性构建。

```

pipx install pipenv

```

Black是代码格式化工具, 产生的代码差异最小，可以加速代码审查.

isort是可以按字母顺序对 import 进行排序，并自动分成多个部分。

```

pipenv install black isort --dev

```

setup.cfg config

```

[isort]

multi_line_output=3

include_trailing_comma=True

force_grid_wrap=0

use_parentheses=True

line_length=88

```

use black and isort

```

pipenv run black

pipenv run isort

```

cookiecutter生成项目

```

pipx run cookiecutter gh:sourceryai/python-best-practices-cookiecutter

```

#############################
# Test the TensorFlow installation
#############################

python
...
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess = tf.Session()
>>> print(sess.run(hello))
Hello, TensorFlow!
>>> a = tf.constant(10)
>>> b = tf.constant(32)
>>> print(sess.run(a + b))
42

###################################
# Run TensorFlow from the Command Line

###################################
>>> import os;
>>> import inspect;
>>> import tensorflow;
>>> print(os.path.dirname(inspect.getfile(tensorflow)));
/Users/tkmaemd/anaconda/envs/tensorflow/lib/python2.7/site-packages/tensorflow

(tensorflow) NY-C02MW0YGFD58:~ tkmaemd$ python -c 'import os; import inspect; import tensorflow; print(os.path.dirname(inspect.getfile(tensorflow)))'
/Users/tkmaemd/anaconda/envs/tensorflow/lib/python2.7/site-packages/tensorflow

###################################
# Basic Usage
###################################
TensorFlow programs are usually structured into a construction phase, that assembles a graph, and an execution phase that uses a session to execute ops in the graph.

For example, it is common to create a graph to represent and train a neural network in the construction phase, and then repeatedly execute a set of training ops in the graph in the execution phase.

# Building the graph

import tensorflow as tf
matrix1 = tf.constant([[3., 3.]])
matrix2 = tf.constant([[2.],[2.]])
product = tf.matmul(matrix1, matrix2)

# Launch the default graph

sess = tf.Session()
result = sess.run(product)
print(result)
sess.close()

# Interactive Usage

import tensorflow as tf
sess = tf.InteractiveSession()

x = tf.Variable([1.0, 2.0])
a = tf.constant([3.0, 3.0])
x.initializer.run()
sub = tf.sub(x, a)
print(sub.eval())
# ==> [-2. -1.]
sess.close()

# Variables

state = tf.Variable(0, name="counter")
one = tf.constant(1)
new_value = tf.add(state, one)
update = tf.assign(state, new_value)

init_op = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init_op)
print(sess.run(state))

for _ in range(3):
sess.run(update)
print(sess.run(state))

# Fetches

input1 = tf.constant([3.0])
input2 = tf.constant([2.0])
input3 = tf.constant([5.0])
intermed = tf.add(input2, input3)
mul = tf.mul(input1, intermed)

with tf.Session() as sess:
result = sess.run([mul, intermed])
print(result)

# Feeds

input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)
output = tf.mul(input1, input2)

with tf.Session() as sess:
print(sess.run([output], feed_dict={input1:[7.], input2:[2.]}))

###################################
# Hello World

###################################

import tensorflow as tf

h = tf.constant("Hello")

w = tf.constant(" World!")

hw = h + w

with tf.Session() as less:

ans = sess.run(hw)

print ans

###################################
# Run a TensorFlow demo model
###################################

cd /Users/tkmaemd/anaconda/envs/tensorflow/lib/python2.7/site-
packages/tensorflow/models/image/mnist

###################################
# Introduction
###################################
source activate py35
source activate tensor flow
ipython
source deactivate tensor flow
source deactivate py35

import tensorflow as tf
import numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# Before starting, initialize the variables. We will 'run' this first.
init = tf.global_variables_initializer()

# Launch the graph.
sess = tf.Session()
sess.run(init)

# Fit the line.
for step in range(201):
sess.run(train)
if step % 20 == 0:
print(step, sess.run(w), sess.run(b))

# Learns best fit is W: [0.1], b: [0.3]

Friday, December 23, 2016

Executive Briefings

Up-to-the-minute news, executive briefings, global retail and VM trends and trend confirmations are all covered here to help hone your strategies for retail, customer communications and business.
Uncover up-and-coming trends, in-depth reports on marketing strategies and experience design
An annual calendar to help you plan for the most important industry events.

Kroger CEO, Rodney McMullen.

win over not only high end, but also low end.

The economy continues slowly improve, and customers continue to feel more optimistic, but the bifurcation in the economy remains. Some consumers are willing to spend more while others are worried about their job or next paycheck, or more focused on saving. We find all customers want quality products and a great shopping experience. For the customer who is more focused on natural and organic products we have our won Simple Truth products and a great shopping experience. We also have many entry-level price point items of excellent quality. For customer looking for incredibly high quality products, like Boar’s Head or Murray’s Cheese, just to name a couple, we have that, too.

Our job is to understand and deliver for our diverse set of customers so they can save where they want to save and splurge where they want to splurge.

Kohls CEO, Kevin Mansell

Even more importantly, traffic in our stores was extremely strong. Our stores enjoyed a solid increase in sales for the three days combined. On an annual basis, about 80% of our business is done in stores, so it was exciting to see so many customers enjoying the experience of shopping together with family and friends this weekend.

Solid activity in store, combined with increased online business, is how we will succeed as an omni-channel retailer. At the core of this omnichannel transformation is the need to evolve the way our Merchandise organization works-- combining our separate E-commerce and Store Merchandise and Planning organizations into one unified omnichannel team. We are seeing our online and in-store experience work very well together as nearly half of our online orders were fulfilled either through Buy Online Pick Up in Store or Ship from Store. We are driving online shoppers to our stores, and our stores are making a better online experience by cutting shipping time down and increasing available inventory.

From an Incredible Savings standpoint, many of our competitors can offer great savings on items at any point. What makes us stand out is our loyalty efforts that are ongoing — from K's marketing program to the incredible value of the K's app and mobile wallet. At no time was that more evident than this past weekend. I believe that rewarding our customers through loyalty is how we will drive positive customer behaviors and differentiate ourselves for the rest of this season and for the future.

Airbnb CEO, Brian Chesky

Put Customers First. Here is an elegantly simple but powerful viewpoint. If you want to create a great product, just focus on one person. Make that person have the most amazing experience ever.

Berkshire-Hathaway CEO, Warren Buffett

Act with Integrity. It takes 20 years to build a reputation and five minutes to ruin it. If you think about that, you'll do things differently.

Facebook CEO, Mark Zuckerberg

Build Great Teams. How does Facebook stay relevant in a space where things change in a nanosecond? If you’re in an environment where you’re not learning as much as you think you should be, if you don’t have the people around you who you think are going to inspire you to do the best work that you can, then think about changing something. Because that’s a big deal.

Pepsi CEO, Indra Nooyi

Drive Results. You’ve got to look at the investments you make in the company as a portfolio. There’s a bunch of stuff that delivers in the short term. That gives you the breathing room and the fodder to invest in the long term.

If you're not prepared to be wrong, you'll never come up with anything original.
Sir Ken Robinson, TED 2006 (#1 TED talk)

WeChat Case

WeChat in Sep 2016 geo-location based in stream Moments advertising was launched to offer more segmented targeting. The app has also launched a banner ad format allowing advertisers to pick their preferred official accounts. Rates are based on page views.

Popular topics include astrology, humour, self-improvement, health and wellness, cooking, fitness, travel, news, entertainment, and parenthood.

Thursday, December 8, 2016

Open Data Sources

DataHub (http://datahub.io/dataset)

World Health Organization (http://www.who.int/research/en/)

Data.gov (http://data.gov)

European Union Open Data Portal (http://open-data.europa.eu/en/data/)

Amazon Web Service public datasets (http://aws.amazon.com/datasets)

Facebook Graph (http://developers.facebook.com/docs/graph-api)

Healthdata.gov (http://www.healthdata.gov)

Google Trends (http://www.google.com/trends/explore)

Google Finance (https://www.google.com/finance)

Google Books Ngrams (http://storage.googleapis.com/books/ngrams/books/datasetsv2.html)

Machine Learning Repository (http://archive.ics.uci.edu/ml/)

Other Popular open data repositories:

UC Irvine Machine Learning Repository

Kaggle datasets

Amazon’s AWS datasets

Meta portals (they list open data repositories):

http://dataportals.org/

http://opendatamonitor.eu/

http://quandl.com/

Other pages listing many popular open data repositories:

Wikipedia’s list of Machine Learning datasets (https://goo.gl/SJHN2k)

Quora.com question (http://goo.gl/zDR78y)

Visual 13 - Create Pie Chart

dat1 <- dbGetQuery(conn,"select channel, sum(a.sld_qty)
from eipdb_sandbox.ling_sls_brnd_demog a
where a.gma_nbr in (2,5)
and a.trn_sls_dte between '2016-11-14' AND '2016-12-16'
group by 1
order by 1;")
dat1$perc <- dat1$sld_qty/sum(dat1$sld_qty)
p1 <- ggplot(dat1, aes(x = factor(1), y =sld_qty, fill = channel)) +
geom_bar(width = 1, stat = "identity") +
scale_fill_manual(values = c("red", "blue")) +
coord_polar(theta="y", start = pi / 3) +
##labs(title = "Kohl's Sold Items by Channel") +
geom_text_repel(aes(label=scales::percent(perc)), size=4.5) + ylab("") + xlab("") +
theme_void()

dat2 <-dbGetQuery(conn,"select new_ind, sum(a.sld_qty)
from eipdb_sandbox.ling_sls_brnd_demog a
where a.gma_nbr in (2,5)
and a.trn_sls_dte between '2016-11-14' AND '2016-12-16'
group by 1
order by 1;")
dat2$new_ind <- factor(dat2$new_ind)
dat2$perc <- dat2$sld_qty/sum(dat2$sld_qty)
p2 <- ggplot(dat2, aes(x = "", y =sld_qty, fill = new_ind)) +
geom_bar(width = 1, stat = "identity") +
scale_fill_manual(values = c("darkgreen", "orangered", "red")) +
coord_polar("y", start = pi / 3) +
##labs(title = "Kohl's Sold Items by New/Existed Customer") +
geom_text_repel(aes(label=scales::percent(perc)), size=4.5) + ylab("") + xlab("") +
theme_void()

dat3 <-dbGetQuery(conn,"select sku_stat_desc, sum(a.sld_qty)
from eipdb_sandbox.ling_sls_brnd_demog a
where a.gma_nbr in (2,5)
and a.trn_sls_dte between '2016-11-14' AND '2016-12-16'
group by 1
order by 1;")
dat3$perc <- dat3$sld_qty/sum(dat3$sld_qty)
dat3 <- dat3[order(dat3$sld_qty),]
p3 <- ggplot(dat3, aes(x = "", y =sld_qty, fill = sku_stat_desc)) +
geom_bar(width = 1, stat = "identity") +
scale_fill_brewer(palette = "Spectral") +
coord_polar("y", start = pi / 3) +
##labs(title = "Kohl's Sold Items by SKU Status") +
geom_text_repel(aes(label=scales::percent(perc)), size=4.5) + ylab("") + xlab("") +
theme_void()

grid.arrange(p1,p2,p3, nrow=3, ncol=1)

Wednesday, December 7, 2016

Visual12 - Create Maps

http://bcb.dfci.harvard.edu/~aedin/courses/R/CDC/maps.html

#########################################################
## Geographic Information of Customers
#########################################################
# Returns centroids
getLabelPoint <- function(county) {
Polygon(county[c('long', 'lat')])@labpt}
df <- map_data("state")
centroids <- by(df, df$region, getLabelPoint) # Returns list
centroids <- do.call("rbind.data.frame", centroids) # Convert to Data Frame
names(centroids) <- c('long', 'lat') # Appropriate Header
centroids$states <- rownames(centroids)

dat8<-dbGetQuery(conn,"select demand_state,
count(distinct mstr_persona_key) cust
from eipdb_sandbox.ling_sls_brnd_demog
where new_ind=1 and trn_sls_dte between '2014-11-01' and '2016-10-31'
group by 1
order by 1;
")
## Join with States
dat8$states <- tolower(state.name[match(dat8$demand_state, state.abb)])

states <- map_data("state")
head(states)
dat9 <- merge(dat8, centroids, by="states")
dat9$statelabel <- paste(dat9$demand_state, "\n", format(dat9$cust, big.mark = ",", scientific = F), sep="")

# ggplot(data = Total) +
# geom_polygon(aes(x = long, y = lat, fill = region, group = group), color = "white") +
# coord_fixed(1.3) +
# guides(fill=FALSE) +
# geom_text(data=statelable, aes(x=long, y=lat, label = demand_state), size=2)

ggplot() +
geom_map(data=states, map=states,
aes(x=long, y=lat, map_id=region),
fill="#ffffff", color="#ffffff", size=0.15) +
geom_map(data=dat8, map=states,
aes(fill=cust, map_id=states),
color="#ffffff", size=0.15) +
coord_fixed(1.3) +
scale_fill_continuous(low = "thistle2", high = "darkred", guide="colorbar") +
#scale_fill_distiller(name="Customers", palette = "YlGn", breaks=pretty_breaks(n=5)) +
#geom_text(data=dat9, hjust=0.5, vjust=-0.5, aes(x=long, y=lat, label=statelabel), colour="black", size=4 ) +
geom_text(data=dat9, aes(x=long, y=lat, label=statelabel), colour="black", size=4 ) +
ggtitle("Customers from 11/1/2014 to 10/31/2016") + ylab("") + xlab("") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)

## Plot2
## American Community Survey (ACS) Data
## Join with States
## access population estimates for US States in 2012
?df_pop_state
data(df_pop_state)
head(df_pop_state)
dat10 <- merge(dat9, df_pop_state, by.x="states", by.y="region")
dat10$perc <- dat10$cust/dat10$value
percent <- function(x, digits = 2, format = "f", ...) {
paste0(formatC(100 * x, format = format, digits = digits, ...), "%")
}
dat10$statelabel <- paste(dat10$demand_state, "\n", percent(dat10$perc,2,"f"), sep="")
head(dat10)

p9 <- ggplot() +
geom_map(data=states, map=states,
aes(x=long, y=lat, map_id=region),
fill="#ffffff", color="#ffffff", size=0.15) +
geom_map(data=dat10, map=states,
aes(fill=perc, map_id=states),
color="#ffffff", size=0.15) +
coord_fixed(1.3) +
scale_fill_continuous(low = "thistle2", high = "darkred", guide="colorbar") +
geom_text(data=dat10, aes(x=long, y=lat, label=statelabel), colour="black", size=4 ) +
ggtitle("Customers from 11/1/2015 to 10/31/2016") + ylab("") + xlab("") +
theme(plot.title = element_text(face = "bold", size = 20, hjust = 0.5)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)

#########################################################
## World Maps
#########################################################
## regi_geo
data <- ddply(user_regi, .(regi_geo), summarise, tot=length(user_id))
summary(data$tot)
tmp=joinCountryData2Map(data, joinCode = "ISO2"
, nameJoinColumn = "regi_geo"
, verbose='TRUE'
)
tmp$tot[is.na(tmp$tot)]=0
# catMethod='categorical'
mapCountryData(tmp, nameColumnToPlot="tot",catMethod="fixedWidth")

#getting class intervals
classInt <- classIntervals(tmp[["tot"]], n=5, style = "jenks")
catMethod = classInt[["brks"]]
#getting colours
colourPalette <- brewer.pal(5,'RdPu')
#plot map
mapParams <- mapCountryData(tmp
,nameColumnToPlot="tot"
,addLegend=FALSE
,catMethod = catMethod
,colourPalette=colourPalette )
#adding legend
do.call(addMapLegend
,c(mapParams
,legendLabels="all"
,legendWidth=0.5
,legendIntervals="data"
,legendMar = 2))

tmp2=data.frame(tmp[['regi_geo']], tmp[['tot']], tmp[['NAME']])
tmp2=tmp2[order(tmp2$tmp...tot...,decreasing = T),]
write.csv(tmp2, "junk.csv")

Saturday, December 31, 2016

Tuesday, December 27, 2016

Friday, December 23, 2016

Thursday, December 8, 2016

Wednesday, December 7, 2016

Blog Archive