Agenda Discovery Week Month

Curated for Me

[EVENTBRITE] Predictive Data Science in R & How to be a Successful Consultant

SF Bay ACM Chapter
Write a Review
Select your rating. ( ) ( ) ( ) ( ) ( )
Endorsed by Curators:
Sep 16 8:30AM - 9:30AM

TICKETS $[masked], SIGN UP THROUGH EVENTBRITE

</a>

<a href="https://www.eventbrite.com/e/predictive-data-science-in-r-tickets-35366885306?aff=MeetupSFbayACM">https://www.eventbrite.com/e/predictive-data-science-in-r-tickets-35366885306?aff=MeetupSFbayACM


Tickets $130-195. (For single tickets, the price is $150 from 7/26 to 9/15 at 3pm. After 9/15 at 3pm, the price for any ticket is $195. Before 9/15, if you sign up 2-6 people at a time, the price is $130 / person.)

We are seeking TA's who know R to help the audience. TA applicants should contact the instructor in advance. Usethe [contact] button on the left, send email, phone, LinkedIn and R experience).


PARKING AND ENTRANCE:

Parking: Enter Intel main entrance at 2200 Mission College Blvd. Turn right immediately upon entering and park in visitor parking. (You may also park in Garage B which you can enter from near the corner of Mission College Blvd and Juliette Lane.)
Entrance: We are working to have direct entrance close to the classroom. Once you park, go to the Star on the attached map - look for the ACM signs.

Otherwise, if you cannot find that entrance, go to the second floor of Garage B and walk over the short concrete skybridge to the Employee Entrance of building SC-9 and ask for the ACM Workshop.
Here's a map you might find useful. https://www.flickr.com/photos/joshb/320803384

PRE-LOADING: BEFORE THE CLASS, PREPARATIONS:


The class uses RStudio, the IDE which is what you would use for typical R data mining projects at work.

ThisUCLA R Studio Tutorial linkdocuments the following steps, which be helpful before you come to the class. It is recommend to go over both the Installation and the short Basic Tutorial (if you don't already have this knowledge).

Install R 3.3.3 or laterhttps://cran.r-project.org/

InstallRStudio, DesktopIDE (free)

If you install on Windows, it is strongly recommend you usethis link to enable R to use your available memory, with --max-mem-size=xxxxMB. Install devtools package.

Install R libraries: data.table, Hmisc, gmodels, e1071, doMC (if you are on a Mac or Unix), doParallel (if on Windows), caret, rpart, randomForest, partykit, pROC, nnet, xgboost, ggplot2, zoo. (Check a week before the class, the list may get updated).


8 HR CLASS - SUMMARY (detailed outline follows) Go through a sprint of a predictive data mining project, introducing R as we go. Review the training process for regression, backpropagation neural nets, decision trees and XGboost. Introduce R data.tables and the caret interface to 233 predictive algorithms. Focus on strategies to structure a successful project design and data pull. Review a variety of preprocessing and knowledge representation. Provide questions you can take away and apply to the design of your future projects, to describe models to clients (sensitivity analysis code included) and to manage models over their natural lifecycle. Introduce R + Spark integrations, and show an example R Shiny web GUI interface.

TARGET AUDIENCE would include people who ...

are comfortable programming

may already work on consulting projects or in some technical business problem solving role.


It is helpful if you have tried R, or some basic exposure to R before the class can help. The focus is much more on "being successful with deploying Data Mining".

COURSE DESIGN: The instructor does not want to repeat "R in a Nutshell" or training that goes "sequential and broad" (i.e. everything about data structure X, then everything about feature Y). That material is great for a larger training time frame. For students to get the most out of a one day class, Ithe instructor is focusing on a "narrow" path, like a project sprint, going through a complete set of steps in a data mining project. Many pointers will be provided to invite you to broaden your skills more after the class.

The instructor likes the Covey quote "If the ladder is not leaning against the right wall, every step we take just gets us to the wrong place faster." A successful data mining project is not just coding and executing a function. Design is crucial. There is a gap that is not covered by Kaggle experience or starting with a ready-made data set. The instructorfocuses on covering general strategies that you can take away as questions you can ask about your upcoming project, such as how to identify projects, how to structure a project for success.


CLASS DETAILED OUTLINE

Part 1: Get started and play with your data

Overview (and Lab 1a) of R studio, basics of variables, lists, read a CSV file into a data table, find out the ways to look and manipulate the table. Discuss the HMEQ (Home Equity) data. The problem is to predict if the person would be good or bad loan.

Discuss a comparison / contrast for a few data mining algorithms: Regression, neural nets, decision trees, XGboost and ensemble models. Train a first decision tree on an existing training set (Lab 1b). Go toTensorFlow Playgroundto try setting some neural net parameters and training them on different data set.

Part 2: Data Science Project Design

Model Evaluation Fundamentals, DS Model Loops or sprints

Selling Data Mining to Executive Check Writers - assessing the upside of opportunities or the problem size

Finding candidate projects with a Knowledge Discovery Workshop

Data Mining Project Design and Objectives (accurate, general, understandable)

Designing the training data to represent the production scoring data in the future.

Retraining Frequency (daily or re-evaluate monthly)

Reference Dates (separate analysis past from future)

Target and Weight Variable Variations

Business Metrics to Optimize, lift tables

Big Data Production, Lambda and Kappa Architecture

R data.tables lecture/Lab 2on the HMEQ data table. Show the analogy with SQL, selecting rows, creating columns, aggregation. Writing a small function, R macros get you unstuck and help scale in complexity.

Part 3: Preprocessing Design - Simple to Complex

Review math requirements on input data - by algorithms. Focus on preparing a data set that can get loaded in most any algorithm.

Missing data handling (simple to sophisticated)

Convert rules, queries or func. to detector fields [01] to capture use cases of behavior

Convert observed frequency of normal to rareness detectors - for fraud detection.

Lab 3:preprocessing your HMEQ data

Fit linear models to time series within a record to extrap.

Time series: detect individual past behavior to adapt future estimate

Dont ignore input variables with 20+ categories, use DBC (Dependent by Category)

Variable interactions: not A*B, DBC tables, clusters

Part 4: Modeling Design, Spark

Model Notebook to track, plan design of experiments and to automate

Sensitivity Analysis: describe modelsor model ensembles overall. Provide record level reasons. Explain how to detect model drift over time, and describe why.

Lab 4:training models, evaluating,run sensitivity analysis with provided sensitivity code.

Discuss additional topics: Review available R + Spark combinations. (Apache SparkR, RStudio's SparklyR, IBM's R4ML). Time permitting, discuss R web GUI's with Shiny & Shiny Dashboards, RStudio's TensorFlow for R.


BEFORE THE CLASS, PREPARATIONS:

The class uses RStudio, the IDE which is what you would use for typical R data mining projects at work.

This UCLA R Studio Tutorial linkdocuments the following steps, which be helpful before you come to the class. It is recommend to go over both the Installation and the short Basic Tutorial (if you don't already have this knowledge).


Install R 3.3.3 or laterhttps://cran.r-project.org/

InstallRStudio, DesktopIDE (free)


If you install on Windows, it is strongly recommend you use this link to enable R to use your available memory, with --max-mem-size=xxxxMB. Install devtools package.


Install R libraries: data.table, Hmisc, gmodels, e1071, doMC (if you are on a Mac or Unix), doParallel (if on Windows), caret, rpart, randomForest, partykit, pROC, nnet, xgboost, ggplot2, zoo. (Check a week before the class, the list may get updated).


For fun, play around with some neural nets at the TensorFlow Playground. This will be covered in the class as well.


You are invited to submit a description of your upcoming predictive projects or vertical. The instructor will review and may try to incorporate some ideas in the class. Through the meetup site, on the left margin, use the [contact] button.

SCHEDULE

8:00 - 8:30 arrive, register, coffee, network

8:30 - 10:30 lecture / lab

15 min break, coffee

10:45 - 12:45 lecture / lab

45 min break for lunch

1:30 - 3:30 lecture / lab


15 min break, coffee, small snacks

3:45 - 6:00 lecture / lab

15 min Q&A

ABOUT THE SPEAKER:

Greg Makowskihas been deploying data mining models for 25 years (before the terms Data Science or Data Mining) as the "neural net guy" at American Express/Epsilon. He likes to "begin with the end" with the business decisions and values to be made by the analytic system, the job function to be complemented and by the deployment constraints. He has developed the analytic internals and automation for 6+ enterprise software systems or SaaS systems. His first convolutional neural net was trained in 1991, a Time Delay Neural Net for speech recognition. Vertical experience includes financial services (credit card, retail banking, bond pricing, ACH payments, fraud detection, customer relationship management (mail, phone, email, banner), retail supply chain among others. He always has something to learn from everybody.

How to find us:Parking and Entrance: see instructions below. PRE-LOADING SOFTWARE: see instructions below.

Upcoming Events

Write a Review
Select your rating. ( ) ( ) ( ) ( ) ( )
×
Endorsed by Curators:
Envisioning the Customer Experience of Autonomous Vehicles, 5-20 Years Out

Envisioning the Customer Experience of Autonomous Vehicles, 5-20 Years Out

Sep 20 6:30PM - 7:30PM
Distinguished ACM Speaker: Beverly May, Oxford Tech and Executive Director of UX Awards Agenda 6:30 Doors Open, Food & Networking7:00 Presentation*** Please arrive by 7 PM due to Security ***…
 
Write a Review
Select your rating. ( ) ( ) ( ) ( ) ( )
×
Endorsed by Curators:
Challenges in Data Science with Hadley Wickham

Challenges in Data Science with Hadley Wickham

Sep 21 6:00PM - 7:00PM
Join us for Hadley Wickham's vision of Data Science as a field, focusing on the corner that he's most familiar: designing tools for Data Scientist-Programmers. *** Please arrive before 6:30 PM due to…
 
Write a Review
Select your rating. ( ) ( ) ( ) ( ) ( )
×
Endorsed by Curators:
Safety and Ethics for Advanced Autonomous Artificial Intelligences

Safety and Ethics for Advanced Autonomous Artificial Intelligences

Sep 25 6:30PM - 7:30PM
Speaker Richard Mallah, Director of AI Projects at the Future of Life Institute. Richard is flying out from Boston. Agenda 6:30 - 7:00 audience arrives, registers, network, pizza 7:00 - 7:10 ACM…