Store Attribute model
Provide recommendation to field management, senior planning, store operations management for changes to store profile
Conduct store segmentation based on regional, lifestyle, and customer type attributes.
Recommend product assortment and merchandising based on in store and online sales with respect to demographic information among different store segments
Spatial-Oriented and Social-Aware Business
Location Optimization
To find the best locations for starting a new business and placing the corresponding advertisements. Considered the social, spatial and financial simultaneously.
Budget Allocation for Hub and Broadcasting Media
To chooses the locations for placing ads, as well as selecting some broadcasting media under a budget constraint
Algorithm: Advertisements Selection with Dynamic Programming (ASDP)
K/LAB brand highlights
- style-minded millennial
- data informed
- real time fashion
- trend driven product
- testing ground environment
- customer engage
- stay chic and save
- user inshore experience
- integrate simplistic fixture package act does not feel 'over design'
Strike a pose
Think out loud
Sunday, November 20, 2016
Product Affinity Segmentation
Product Affinity Segmentation Application
Email is the most common channel to have sent people into stores, followed by direct mail and coupons.
Success retailers manage customer relationship by knowing them, getting their attention, building trust, tailoring message and conveying benefits to make a deal happen.
Product affinity segmentation can be the foundation for a retailer’s marketing strategies in many areas, such as web product recommendation, co-purchase promotion, direct-mail and opt-in email advertising programs.
Marketing Mixed Modeling
In many businesses, the marketing fund is allocated based on the marketing manager’s experience, departmental budget allocation rules and sometimes ‘gut feelings’ of business leaders. Those traditional ways of budget allocation yield sub optimal results and in many cases lead to money wasting on certain irrelevant marketing efforts. Market Mixed Models can be used to understand the effects of marketing activities and identify the key marketing efforts that drive the most sales among a group of competing marketing activities. The results can be used in marketing budget allocation and take out the guess work that typically goes into the budget allocation.
How do firms allocate their budget today?
How do they evaluate the effectiveness of those investments?
How do they plan to invest their limited marketing budget in the future?
How to ensure the highest return on the investments whiles building brand image and reputation?
Apply predictive analytics to guide the company’s marketing budget allocation by using Marketing Mix Modeling. MMM is the use of statistical and analytical methods to quantify the effectiveness of various marketing activities (marketing mix) and systematically optimizing marketing budgets and allocation resources into profitable marketing efforts.
Innovative Approaches to Measuring Advertising Effectiveness
A popular saying illustrating how difficult it was to reach potential customers using traditional advertising is attributed to John Wanamaker: "Half the money I spend on advertising is wasted; the trouble is I don't know which half."
Probably the most famous quote in the field of marketing is the apocryphal line attributed to John Wanamaker about the difficulty of assessing the impact of advertising spending. Here we are, roughly 100 years since that phrase was first popularized, and Wanamaker’s words continue to resonate with today’s marketing executives just as much as ever before. The development of customer tracking technologies, measurable media, sophisticated attribution models, and platforms that facilitate controlled experiments all promise great advances in our ability to make more precise statements about the effectiveness of advertising spending, but many lingering questions remain.
With this vexing backdrop in mind, the Wharton Customer Analytics Initiative and the Future of Advertising Program are joining forces to commission leading-edge research projects to develop state-of-the-art methods and insights in this important area. Specifically, we are looking to support high-quality, data-oriented research that will demonstrate just how far we’ve come since Wanamaker first expressed his frustrations with measuring advertising impact. We have in mind three broad sub-categories of research:
New modeling approaches that leverage customer-level data on behavioral responses to advertising
(2) Advances in experimental methods that can isolate and quantify different advertising effects
(3) Other innovative approaches that represent a significant step beyond traditional advertising effectiveness measurement methods
How do firms allocate their budget today?
How do they evaluate the effectiveness of those investments?
How do they plan to invest their limited marketing budget in the future?
How to ensure the highest return on the investments whiles building brand image and reputation?
Apply predictive analytics to guide the company’s marketing budget allocation by using Marketing Mix Modeling. MMM is the use of statistical and analytical methods to quantify the effectiveness of various marketing activities (marketing mix) and systematically optimizing marketing budgets and allocation resources into profitable marketing efforts.
Innovative Approaches to Measuring Advertising Effectiveness
A popular saying illustrating how difficult it was to reach potential customers using traditional advertising is attributed to John Wanamaker: "Half the money I spend on advertising is wasted; the trouble is I don't know which half."
Probably the most famous quote in the field of marketing is the apocryphal line attributed to John Wanamaker about the difficulty of assessing the impact of advertising spending. Here we are, roughly 100 years since that phrase was first popularized, and Wanamaker’s words continue to resonate with today’s marketing executives just as much as ever before. The development of customer tracking technologies, measurable media, sophisticated attribution models, and platforms that facilitate controlled experiments all promise great advances in our ability to make more precise statements about the effectiveness of advertising spending, but many lingering questions remain.
With this vexing backdrop in mind, the Wharton Customer Analytics Initiative and the Future of Advertising Program are joining forces to commission leading-edge research projects to develop state-of-the-art methods and insights in this important area. Specifically, we are looking to support high-quality, data-oriented research that will demonstrate just how far we’ve come since Wanamaker first expressed his frustrations with measuring advertising impact. We have in mind three broad sub-categories of research:
New modeling approaches that leverage customer-level data on behavioral responses to advertising
(2) Advances in experimental methods that can isolate and quantify different advertising effects
(3) Other innovative approaches that represent a significant step beyond traditional advertising effectiveness measurement methods
Things must to remember
1. There's always time. Time is priorities.
2. Days was fill up. Only plan for 4-5 hours of real work per day.
3. Work more when you're in the zone. Relax when you're not. It's normal to have days where you just can't work and days where you'll work 12 hours straight.
4. Respect your time and make it respected. You time is $1000/hour, and you need to act accordingly.
5. Stop multi-tasking. It merely kills your focus.
6. Set up a work routine and stick to it. Your body will adapt.
7. We're always more focused and productive with limited time.
8. Work is the best way to get working. Start with short tasks to get the ball rolling.
9. Doing is better than perfect. Work iteratively. Expectations to do things perfectly are stifling.
10. More work hours doesn't mean more productivity. Use constraints as opportunities.
11. Separate brainless and strategic tasks to become more productive. Separate thinking and execution to execute faster and think better.
12. Organize meetings early during the day. Time leading up to an event is often wasted.
13. Group meetings and communication(email or phone) to create blocks of uninterrupted work. A single meeting can blow a whole afternoon, by breaking it into two pieces each too small to do anything hard in.
14. Keep the same context throughout the day. Switching between projects/clients is unproductive.
15. Work around procrastination. Procrastinate between intense sprints of work.
16. Break the unreasonable down into little reasonable chunks. A big goal is only achieved when every little thing that you do everyday. Gets you closer to that goal.
17. No 2 tasks ever hold the same importance. always prioritize. Be really careful with to-do lists.
18. Always know the one thing you really need to get done during the day. Only ever work o the thing that will have the biggest impact.
19.
20.
21. Turn the page on yesterday. Only ever think about today and tomorrow. Yesterday's home runs don't win today's games.
22. Set deadlines for everything. Don't let tasks go on indefinitely.
23. Set end dates for intense or stressful activities. Everything ends at some points.
24. Always take notes. Get a reminder app for everything. Don not trust your own brain for your memory.
25. Write down anything that distracts you - google searches, random thoughts, new ideas, whatever. The point is, if you write them down, they 'll step bubbling up when you're int he zone.
If I have been able to see further, it was only because I stood on the shoulders of giants.
– Isaac Newton
Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make your feel that you, too, can become great.
– Mark Twain
Insanity: doing the same thing over and over again and expecting different results.
– Albert Einstein
Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.
– Chinese proverb
Absorb what is useful, discard what is useless and add what is specifically you own.
– Bruce Lee
If we have data, let's look at data. If all we have are opinions, let's go with mine.
– Jim Barksdale, former Netscape CEO
How about 'reciprocity'! Never impose on others what you would not choose for yourself.
– Confucius
Do not dwell in the past, do not dream of the future, concentrate the mind on the present moment.
– The Shakyamuni Buddha
Dream no small dreams for they have no power to move the hearts of men.
– Johann Wolfgang von Goethe
You can never figure out life. What you thought you own, is actually leaving; What you thought you have lost, is actually on its way to you.
La La Land 2016
Don’t fix something not broken.
Gain trusts before change.
Where there is a will, there is a way.
Do what you believe is right.
Great minds think alike.
It's our choices, Harry, that shows what we truly are, far more than our abilities.
THANKS WGSN @wgsn FOR THE FEATURE IM SO EXTREMELY FLATTERED TO BE PART OF YOUR MUST-FOLLOW
Learning from mistakes is the best education.
2. Days was fill up. Only plan for 4-5 hours of real work per day.
3. Work more when you're in the zone. Relax when you're not. It's normal to have days where you just can't work and days where you'll work 12 hours straight.
4. Respect your time and make it respected. You time is $1000/hour, and you need to act accordingly.
5. Stop multi-tasking. It merely kills your focus.
6. Set up a work routine and stick to it. Your body will adapt.
7. We're always more focused and productive with limited time.
8. Work is the best way to get working. Start with short tasks to get the ball rolling.
9. Doing is better than perfect. Work iteratively. Expectations to do things perfectly are stifling.
10. More work hours doesn't mean more productivity. Use constraints as opportunities.
11. Separate brainless and strategic tasks to become more productive. Separate thinking and execution to execute faster and think better.
12. Organize meetings early during the day. Time leading up to an event is often wasted.
13. Group meetings and communication(email or phone) to create blocks of uninterrupted work. A single meeting can blow a whole afternoon, by breaking it into two pieces each too small to do anything hard in.
14. Keep the same context throughout the day. Switching between projects/clients is unproductive.
15. Work around procrastination. Procrastinate between intense sprints of work.
16. Break the unreasonable down into little reasonable chunks. A big goal is only achieved when every little thing that you do everyday. Gets you closer to that goal.
17. No 2 tasks ever hold the same importance. always prioritize. Be really careful with to-do lists.
18. Always know the one thing you really need to get done during the day. Only ever work o the thing that will have the biggest impact.
19.
20.
21. Turn the page on yesterday. Only ever think about today and tomorrow. Yesterday's home runs don't win today's games.
22. Set deadlines for everything. Don't let tasks go on indefinitely.
23. Set end dates for intense or stressful activities. Everything ends at some points.
24. Always take notes. Get a reminder app for everything. Don not trust your own brain for your memory.
25. Write down anything that distracts you - google searches, random thoughts, new ideas, whatever. The point is, if you write them down, they 'll step bubbling up when you're int he zone.
If I have been able to see further, it was only because I stood on the shoulders of giants.
– Isaac Newton
Keep away from people who try to belittle your ambitions. Small people always do that, but the really great make your feel that you, too, can become great.
– Mark Twain
Insanity: doing the same thing over and over again and expecting different results.
– Albert Einstein
Give a man a fish and you feed him for a day. Teach a man to fish and you feed him for a lifetime.
– Chinese proverb
Absorb what is useful, discard what is useless and add what is specifically you own.
– Bruce Lee
If we have data, let's look at data. If all we have are opinions, let's go with mine.
– Jim Barksdale, former Netscape CEO
How about 'reciprocity'! Never impose on others what you would not choose for yourself.
– Confucius
Do not dwell in the past, do not dream of the future, concentrate the mind on the present moment.
– The Shakyamuni Buddha
Dream no small dreams for they have no power to move the hearts of men.
– Johann Wolfgang von Goethe
You can never figure out life. What you thought you own, is actually leaving; What you thought you have lost, is actually on its way to you.
La La Land 2016
Don’t fix something not broken.
Gain trusts before change.
Where there is a will, there is a way.
Do what you believe is right.
Great minds think alike.
It's our choices, Harry, that shows what we truly are, far more than our abilities.
THANKS WGSN @wgsn FOR THE FEATURE IM SO EXTREMELY FLATTERED TO BE PART OF YOUR MUST-FOLLOW
Learning from mistakes is the best education.
Check Bivariate Distribution in R
data$index=1+(data$rankP-mean(data$rankP))/sd(data$rankP)
ggplot(data, aes(y=data$index, x=data$Avg...Viewed)) +
geom_density_2d() +
ggtitle("Contour Plot of Original APV and APV Index") +
ylab("Original APV Index") +
xlab("Origianl APV") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90))
## Type
p1=ggplot(data, aes(y=data$Avg...Viewed, x=data$Type)) +
geom_boxplot(aes(fill = data$Type), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of Original APV by Type") +
ylab("Original APV") +
xlab("Type") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
p2=ggplot(data, aes(y=data$index, x=data$Type)) +
geom_boxplot(aes(fill = data$Type), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of APV Index by Type") +
ylab("APV Index") +
xlab("Type") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
#X..Eps
p3=ggplot(data, aes(y=data$Avg...Viewed, x=data$X..Eps)) +
geom_density_2d() +
ggtitle("Contor Plot of Original APV by Number of Episodes") +
ylab("Original APV") +
xlab("Number of Episodes") +
scale_x_continuous(limits=c(0,25)) +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90))
p4=ggplot(data, aes(y=data$index, x=data$X..Eps)) +
geom_density_2d() +
ggtitle("Contor Plot of APV Index by Number of Episodes") +
ylab("APV Index") +
xlab("Number of Episodes") +
scale_x_continuous(limits=c(0,30)) +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90))
## daypart
p5=ggplot(data, aes(y=data$Avg...Viewed, x=data$daypart)) +
geom_boxplot(aes(fill = data$daypart), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of Original APV by Daypart") +
ylab("Original APV") +
xlab("Daypart") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
p6=ggplot(data, aes(y=data$index, x=data$daypart)) +
geom_boxplot(aes(fill = data$daypart), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of APV Index by Type") +
ylab("APV Index") +
xlab("Daypart") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
pdf("result.pdf", width=15, height=8.5)
grid.arrange(p1,p2, ncol=2)
grid.arrange(p3,p4, ncol=2)
grid.arrange(p5,p6, ncol=2)
dev.off()
ggplot(data, aes(y=data$index, x=data$Avg...Viewed)) +
geom_density_2d() +
ggtitle("Contour Plot of Original APV and APV Index") +
ylab("Original APV Index") +
xlab("Origianl APV") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90))
## Type
p1=ggplot(data, aes(y=data$Avg...Viewed, x=data$Type)) +
geom_boxplot(aes(fill = data$Type), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of Original APV by Type") +
ylab("Original APV") +
xlab("Type") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
p2=ggplot(data, aes(y=data$index, x=data$Type)) +
geom_boxplot(aes(fill = data$Type), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of APV Index by Type") +
ylab("APV Index") +
xlab("Type") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
#X..Eps
p3=ggplot(data, aes(y=data$Avg...Viewed, x=data$X..Eps)) +
geom_density_2d() +
ggtitle("Contor Plot of Original APV by Number of Episodes") +
ylab("Original APV") +
xlab("Number of Episodes") +
scale_x_continuous(limits=c(0,25)) +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90))
p4=ggplot(data, aes(y=data$index, x=data$X..Eps)) +
geom_density_2d() +
ggtitle("Contor Plot of APV Index by Number of Episodes") +
ylab("APV Index") +
xlab("Number of Episodes") +
scale_x_continuous(limits=c(0,30)) +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90))
## daypart
p5=ggplot(data, aes(y=data$Avg...Viewed, x=data$daypart)) +
geom_boxplot(aes(fill = data$daypart), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of Original APV by Daypart") +
ylab("Original APV") +
xlab("Daypart") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
p6=ggplot(data, aes(y=data$index, x=data$daypart)) +
geom_boxplot(aes(fill = data$daypart), outlier.colour = "red", outlier.size = 2) +
ggtitle("Boxplot of APV Index by Type") +
ylab("APV Index") +
xlab("Daypart") +
theme(plot.title = element_text(face = "bold", size = 20)) +
theme(axis.text.x = element_text(face = "bold", size = 14)) +
theme(axis.text.y = element_text(face = "bold", size = 14)) +
theme(axis.title.x = element_text(face = "bold", size = 16)) +
theme(strip.text.x = element_text(face = "bold", size = 16)) +
theme(axis.title.y = element_text(face = "bold", size = 16, angle=90)) +
guides(fill=FALSE)
pdf("result.pdf", width=15, height=8.5)
grid.arrange(p1,p2, ncol=2)
grid.arrange(p3,p4, ncol=2)
grid.arrange(p5,p6, ncol=2)
dev.off()
Saturday, November 19, 2016
Check Univariate Distribution in R
memory.limit()
memory.size(max = TRUE)
rm(list=ls(all=TRUE))
sessionInfo()
require(data.table)
require(stringr)
require(lubridate)
require(scales)
require(tigerstats)
require(ggplot2)
require(gridExtra)
require(ggthemes)
##################################################################
## Check Distribution
##################################################################
## Change data type
# series - actual TV series such as American Idol or Glee (names are masked for this exercise)
data$series = as.character(data$series)
sort(xtabs(~series,data), decreasing = T)[1:10]
# network - Networks such as ABC, HBO, FOX, etc. Names are masked.
data$network = as.character(data$network)
sort(xtabs(~network,data), decreasing = T)[1:10]
# Type - Type of TV network (broadcast or cable)
data$Type = as.character(data$Type)
sort(xtabs(~Type,data), decreasing = T)
rowPerc(xtabs(~Type,data))
# Eps 4 - Number of episodes in the given timeframe (assume a broadcast month = 4 weeks)
data$X..Eps = as.integer(data$X..Eps)
summary(data$X..Eps)
# Air Day - Day of episode airing (M, T, W, R, F, S, U)
data$Air.Day_M=unlist(lapply(data$Air.Day, function(x) grepl('M',x)))
data$Air.Day_T=unlist(lapply(data$Air.Day, function(x) grepl('T',x)))
data$Air.Day_W=unlist(lapply(data$Air.Day, function(x) grepl('W',x)))
data$Air.Day_R=unlist(lapply(data$Air.Day, function(x) grepl('R',x)))
data$Air.Day_F=unlist(lapply(data$Air.Day, function(x) grepl('F',x)))
data$Air.Day_S=unlist(lapply(data$Air.Day, function(x) grepl('S',x)))
data$Air.Day_U=unlist(lapply(data$Air.Day, function(x) grepl('U',x)))
rowPerc(xtabs(~data$Air.Day_M,data))
rowPerc(xtabs(~data$Air.Day_T,data))
rowPerc(xtabs(~data$Air.Day_W,data))
rowPerc(xtabs(~data$Air.Day_R,data))
rowPerc(xtabs(~data$Air.Day_F,data))
rowPerc(xtabs(~data$Air.Day_S,data))
rowPerc(xtabs(~data$Air.Day_U,data))
# National Time - 9:00 PM Airing start time
# tmp=as.POSIXct(as.character(data$National.Time), format="%H:%M %r")
# class(data$National.Time)
# rowPerc(xtabs(~data$National.Time,data))
# daypart prime Industry-standard time block (see side panel for details)
data$daypart=as.character(data$daypart)
xtabs(~daypart,data)
rowPerc(xtabs(~daypart,data))
#Run_time (min) 60 Series run time in minutes
summary(data$Run_time..min.)
data=data[order(data$Run_time..min., decreasing = T), ]
#Unique HHs 2,636,448 Number of unique Households tuned in to a given series within given time interval*
data$Unique.HHs=as.integer(data$Unique.HHs)
summary(data$Unique.HHs)
#Total Hrs Viewed 1,534,543 Sum of hours 'logged in' to the given program by all viewers in given time frame
data$Total.Hrs.Viewed=as.integer(data$Total.Hrs.Viewed)
summary(data$Total.Hrs.Viewed)
#Avg % Viewed 53.6% Average % of the program viewed**
tmp=str_replace_all(data$Avg...Viewed, "%", "")
data$Avg...Viewed=as.numeric(tmp)/100
summary(data$Avg...Viewed)
##################################################################
## Check Time Series Distribution
##################################################################
train$Date = as.Date(train$Date, "%Y-%m-%d")
ggplot(train, aes(x=Date, y=embroidered_top)) +
geom_point(aes( x= Date, y=embroidered_top), col = "blue", size = 1) +
scale_x_date(labels=date_format("%b %y")) +
stat_smooth(color="red")
memory.size(max = TRUE)
rm(list=ls(all=TRUE))
sessionInfo()
require(data.table)
require(stringr)
require(lubridate)
require(scales)
require(tigerstats)
require(ggplot2)
require(gridExtra)
require(ggthemes)
##################################################################
## Check Distribution
##################################################################
## Change data type
# series - actual TV series such as American Idol or Glee (names are masked for this exercise)
data$series = as.character(data$series)
sort(xtabs(~series,data), decreasing = T)[1:10]
# network - Networks such as ABC, HBO, FOX, etc. Names are masked.
data$network = as.character(data$network)
sort(xtabs(~network,data), decreasing = T)[1:10]
# Type - Type of TV network (broadcast or cable)
data$Type = as.character(data$Type)
sort(xtabs(~Type,data), decreasing = T)
rowPerc(xtabs(~Type,data))
# Eps 4 - Number of episodes in the given timeframe (assume a broadcast month = 4 weeks)
data$X..Eps = as.integer(data$X..Eps)
summary(data$X..Eps)
# Air Day - Day of episode airing (M, T, W, R, F, S, U)
data$Air.Day_M=unlist(lapply(data$Air.Day, function(x) grepl('M',x)))
data$Air.Day_T=unlist(lapply(data$Air.Day, function(x) grepl('T',x)))
data$Air.Day_W=unlist(lapply(data$Air.Day, function(x) grepl('W',x)))
data$Air.Day_R=unlist(lapply(data$Air.Day, function(x) grepl('R',x)))
data$Air.Day_F=unlist(lapply(data$Air.Day, function(x) grepl('F',x)))
data$Air.Day_S=unlist(lapply(data$Air.Day, function(x) grepl('S',x)))
data$Air.Day_U=unlist(lapply(data$Air.Day, function(x) grepl('U',x)))
rowPerc(xtabs(~data$Air.Day_M,data))
rowPerc(xtabs(~data$Air.Day_T,data))
rowPerc(xtabs(~data$Air.Day_W,data))
rowPerc(xtabs(~data$Air.Day_R,data))
rowPerc(xtabs(~data$Air.Day_F,data))
rowPerc(xtabs(~data$Air.Day_S,data))
rowPerc(xtabs(~data$Air.Day_U,data))
# National Time - 9:00 PM Airing start time
# tmp=as.POSIXct(as.character(data$National.Time), format="%H:%M %r")
# class(data$National.Time)
# rowPerc(xtabs(~data$National.Time,data))
# daypart prime Industry-standard time block (see side panel for details)
data$daypart=as.character(data$daypart)
xtabs(~daypart,data)
rowPerc(xtabs(~daypart,data))
#Run_time (min) 60 Series run time in minutes
summary(data$Run_time..min.)
data=data[order(data$Run_time..min., decreasing = T), ]
#Unique HHs 2,636,448 Number of unique Households tuned in to a given series within given time interval*
data$Unique.HHs=as.integer(data$Unique.HHs)
summary(data$Unique.HHs)
#Total Hrs Viewed 1,534,543 Sum of hours 'logged in' to the given program by all viewers in given time frame
data$Total.Hrs.Viewed=as.integer(data$Total.Hrs.Viewed)
summary(data$Total.Hrs.Viewed)
#Avg % Viewed 53.6% Average % of the program viewed**
tmp=str_replace_all(data$Avg...Viewed, "%", "")
data$Avg...Viewed=as.numeric(tmp)/100
summary(data$Avg...Viewed)
##################################################################
## Check Time Series Distribution
##################################################################
train$Date = as.Date(train$Date, "%Y-%m-%d")
ggplot(train, aes(x=Date, y=embroidered_top)) +
geom_point(aes( x= Date, y=embroidered_top), col = "blue", size = 1) +
scale_x_date(labels=date_format("%b %y")) +
stat_smooth(color="red")
Grep Strs in R
Identify specific characters in columns
> data[1:20,]
series network Type X..Eps Air.Day National.Time daypart Run_time..min. Unique.HHs Total.Hrs.Viewed
...
14 series314 network5 Broadcast 1 T 8:00 PM prime 60 991,953 469,083
15 series838 network4 Broadcast 4 T 9:00 PM prime 60 2,636,448 1,534,543
16 series230 network5 Broadcast 3 M 10:01 PM prime 59 2,080,862 1,112,973
17 series39 network4 Broadcast 3 U 7:00 PM primeaccess 60 2,133,072 1,123,172
18 series193 network5 Broadcast 5 T U 10:01 PM prime 59 3,351,520 1,795,463
# Air Day - Day of episode airing (M, T, W, R, F, S, U)
> unlist(lapply(data$Air.Day, function(x) grepl('M',x)))
[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
...
> data[1:20,]
series network Type X..Eps Air.Day National.Time daypart Run_time..min. Unique.HHs Total.Hrs.Viewed
...
14 series314 network5 Broadcast 1 T 8:00 PM prime 60 991,953 469,083
15 series838 network4 Broadcast 4 T 9:00 PM prime 60 2,636,448 1,534,543
16 series230 network5 Broadcast 3 M 10:01 PM prime 59 2,080,862 1,112,973
17 series39 network4 Broadcast 3 U 7:00 PM primeaccess 60 2,133,072 1,123,172
18 series193 network5 Broadcast 5 T U 10:01 PM prime 59 3,351,520 1,795,463
# Air Day - Day of episode airing (M, T, W, R, F, S, U)
> unlist(lapply(data$Air.Day, function(x) grepl('M',x)))
[1] TRUE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
...
Fetch out specific strings
tmp=str_replace_all(topstr, " ", "_")
tmp=paste("^",tmp,"$", sep="")
tmp2=grep(tmp, totalSampleName)
tmp=str_replace_all(topstr, " ", "_")
tmp=paste("^",tmp,"$", sep="")
tmp2=grep(tmp, totalSampleName)
Subscribe to:
Posts (Atom)