Sign up FAST! Login

Vincent Granville is writing a book on Data Science... performance art


http://www.datasciencecentral.com/profiles/blogs/my-data-science-book?xg_source=activity

Vincent Granville is writing a book...

lisa simpson writing book

... and is publishing the contents as he goes...

Table of Content

Chapter 1: What is Data Science?

1

  • Fake Data Science

2

  • Fake data science: Two Examples

4

  • The Face of the New University

6

  • Thirteen Problems

9

  • DUI Arrests Decreases After State Monopoly on Liquor Sales Ends  

9

  • Data Science Defeats Intuition

11

  • Data Glitch Turns Data into Gibberish

13

  • Regression in Unusual Spaces

15

  • Analytics versus Seduction to Boost Sales

16

  • About Hidden Data

18

  • High Crime Rates Caused by Gasoline Lead. Really?

19

  • Boeing’s Dreamliner Problems

20

  • Seven Tricky Sentences for NLP

21

  • Data Scientists Dictate What We Eat

22

  • Increasing Amazon.com Sales with Better Relevancy

24

  • Detecting Fake Profiles or Likes on Facebook

26

  • Analytics for Restaurants

27

  • History and Milestones

27

  • Statistics Will Experience a Renaissance

28

  • A Few Random Thoughts

29

  • History And Milestones

30

  • Modern Trends

32

  • Data Scientist Versus Data Architect

33

  • Summary

37

 

 

Chapter 2:  Big Data is Different 

38

  • Two Big Data Issues

38

  • The Curse of Big Data

38

  • When Data Flows Faster Than it Can Be Processed

41

  • Examples of Big Data Techniques

45

  • Excel for Big Data

45

  • Clustering and Taxonomy Creation for Massive Data Sets

48

  • Source Code for Keyword Correlations API

53

  • Big Data Problem That Epitomizes The Challenges of Data Science

56

  • What Map Reduce Can’t Do

60

  • Data Science: The End of Statistics?

63

  • Eight Worst Statistical Techniques

63

  • Marrying Computer Science, Statistics And Domain Expertize

65

  • The Big Data Ecosystem

68

  • Summary

69

 

 

Chapter 3: Becoming a Data Scientist

70

  • Types of Data Scientists

70

  • A Domain Expert, Analyst and Management Consultant

70

  • Horizontal Versus Vertical Data Scientist

72

  • Types of Data Scientists

75

  • Example of Amateur Data Science

76

  • Example of Extreme Data Science

77

79

  • Training

80

  • University Programs

80

  • Certifications And Other Training

84

  • Data Science Apprenticeship

85

  • Online Training: The Basics

85

  • Special Tutorials

86

  • Data Sets

87

  • Projects

88

  • Source Code

88

  • The Independent Consultant

89

  • Finding Clients

90

  • Managing Your Finances

90

  • Salary Surveys

91

  • Sample Proposals

92

  • CRM Tools

95

  • The Entrepreneur

96

  • Our Story: Data Science Publisher

97

  • Startup Ideas For Data Scientists

99

  • Summary

109

 

 

Chapter 4: Data science Craftsmanship - Part I

110

  • The Data Scientist

111

  • Data Scientist Versus Data Engineer

111

  • Data Scientist Versus Statistician

113

  • Data Scientist Versus Business Analyst

114

  • New Types of Metrics

114

  • Metrics To Optimize Digital Marketing Campaigns

115

  • Metrics For Fraud Detection

116

  • Choosing an Analytic Tool

118

  • Questions to Ask When Choosing Analytic Software

118

  • Questions to Ask When Considering Visualization Tools

120

  • Questions to Ask Regarding Real-Time Products

121

  • Programming Languages For Data science

122

  • Visualization

123

  • Producing Data Videos With R

123

  • More Sophisticated Videos

126

  • Statistical Modeling Without Models

127

  • Perl Code To Produce Data Sets

130

  • R Code To Produce Data Sets

131

  • New Types of Infographics

133

  • Venn Diagrams

134

  • Adding Dimensions To a Chart

135

  • Three Classes of Metrics: Centrality, Volatility, Bumpiness

136

  • How Can Bumpiness Be Defined?

137

  • About The Excel Spreadsheet

138

  • Uses of the Bumpiness Coefficient

139

  • Statistical Clustering For Big Data

140

  • New Correlation and R Squared For Big Data

141

  • A New Family of Rank Correlations

143

  • Asymptotic Distribution, Normalization

146

  • Computer Science

149

  • Computing q(n)

149

  • A Theoretical Solution

152

  • Structuredness Coefficient

154

  • Identifying The Number of Clusters

155

  • Internet Topology Mapping

157

  • 11 Features Any Database, SQL or NoSQL, Should Have

160

  • Additional Topics

162

  • Belly Dancing Mathematics

162

  • Securing Communications: Data Encoding

164

  • 10 Unusual Ways Analytics Are Used To Make Our Lives Better

166

  • Summary

166

 

 

Chapter 5:  Data Science Craftsmanship - Part II

168

  • Data Dictionary

168

  • What is a Data Dictionary

169

  • How To Build a Dictionary

169

  • Hidden Decision Trees

170

  • Implementation

171

  • Example: Scoring Internet Traffic

173

  • Conclusions

175

  • Model-Free Confidence Intervals

175

  • Methodology

176

  • The AnalyticBridge First Theorem

176

  • Application

177

  • Source Code

178

  • Random Numbers

178

  • Four Ways To Solve a Problem

181

  • Causation Versus Correlation

182

  • So How Do We Detect Causes?

183

  • Lifecycle of Data Science Projects

185

  • Predictive Modeling Mistakes

186

  • Logistic-Related Regressions

187

  • Interactions Between Variables

187

  • First Order Approximation

188

  • Second Order Approximation

189

  • Regression With Excel

191

  • Experimental Design

191

  • Interesting Metrics

192

  • Segmenting The Patient Population

192

  • Customized Treatments

192

  • Analytics as a Service And API’s

194

  • Example of Implementation

195

  • Miscellaneous Topics

196

  • Preserving Scores When Datasets Change

196

  • Optimizing Web Crawlers

197

  • Hash Joins

198

  • Simple Source Code To Simulate Clusters

199

  • Summary

200

 

 

Chapter 6:  Data Science Applications - Case Studies

201

  • Stock Market

201

  • Pattern To Boost Return By 500 Percent

201

  • Optimizing Statistical Trading Strategies

203

  • Stock Trading API: Statistical Model

207

  • Stock Trading API: Implementation

209

  • Stock Market Simulations

211

  • Some Mathematics

213

  • New Trends

215

  • Encryption

216

  • Data Science Application: Steganography

216

  • Solid Email Encryption

220

  • Captcha Hack

222

  • Fraud Detection

224

  • Click Fraud

224

  • Continuous Click Scores Versus Binary Fraud / Non Fraud

226

  • Mathematical Model, Bench-marking

228

  • Bias Due To Bogus Conversions

229

  • A Few Misconceptions

230

  • Statistical Challenges

231

  • Click Scoring to Optimize Keywords Bids

232

  • Automated, Fast Feature Selection With Combinatorial Optimization

233

  • Predictive Power of a Feature, Cross-Validation

234

  • Association Rules To Detect Collusion and Botnets

237

  • Extreme Value Theory For Pattern Detection

238

  • Digital Analytics

239

  • Online Advertising: Formula For Reach And Frequency

239

  • Email Marketing: Boosting Performance by 300 Percent

240

  • Optimizing Keyword Advertising Campaigns in 7 Days

241

  • Automated News Feed Optimization

242

  • Competitive Intelligence with Bit.ly

243

  • Measuring Return on Twitter Hashtags

245

  • Improving Google Searches with Three Fixes

249

  • Improving Relevancy Algorithms

251

  • Ad Rotation Problem

253

  • Miscellaneous

255

  • Better Sales Forecasts with Simpler Models

255

  • Better Detection of Healthcare Fraud

258

  • Attribution Modeling

258

  • Forecasting Meteorite Hits

259

  • Data Collection at Trailhead Parking

265

  • Other Application of Data Science

266

  • Summary

267

 

 

Chapter 7:  Launching Your New Data Science Career

268

Stashed in:

To save this post, select a stash from drop-down menu or type in a new one:

You May Also Like: