Friday Half-Day Class Kevin Lee

Machine Learning Training for Programmers

Presented: Friday September 7, 2018,   1:30pm-5:00pm

Presented by:
Kevin Lee is Director of Data Science at Clindata Insight.  Kevin has been supporting the pharmaceutical industry for nearly 20 years as a programmer, statistician, data scientist and CDISC SME. Kevin is a active in data-driven solution architecture including CDISC standards data development,  Big Data and machine learning.  He is also a current member of the data standards team at CDISC and leads machine learning team at PhUSE. Kevin has presented more than 60 papers at various conferences and taught many corporate trainings on oncology, CDISC, submission and machine learning.  Kevin earned an M.S. in Applied Statistics at Villanova University following a B.S. from the University of Pennsylvania.   Kevin is a life time learner who loves to learn and share.

The most popular buzz word nowadays in technology world is “machine learning”.  Machine learning is the computer science technology that provides systems with the ability to learn without being explicitly programmed. A lot of organizations and start-up companies take an advantage of “machine learning” to solve business problems. Right now, machine learning can help with the following:

  • Self-driving vehicle
  • Online recommendation in Netflix and Amazon
  • Fraud detection in Bank
  • Image and video recognition
  • Security monitor
  • Natural language processing
  • Question answering machines (e.g., IBM Watson)

Most economists and business experts expect “Machine Learning” to change every aspect of our lives in the next 10 to 20 years just like the internet and iPhone did in the last 20 years.  Machine learning is expected to automate and optimize many processes, providing tremendous value to organizations and societies, so a lot of pharmaceutical companies are also looking to implement machine learning in their organizations.

Statistical programmers and statisticians are in a very interesting position.   They have similar technical skills sets to use machine learning for their own organizations–programming, statistics and data knowledge.   This class is intended for statistical programmers and statisticians who are interested in learning about machine learning and applying machine learning to lead an innovation in their organizations.   The class will start with an introduction to machine learning, then it will discuss its basic concepts, applications, algorithms and types.  The class will also introduce the most powerful machine learning algorithm–deep neural networks.   The class will show how programmers can use Python Scikit-Learn and TensorFlow, the most widely used and powerful machine learning algorithm packages.  The class will also introduce some of popular SAS machine learning procedures and package, SAS Visual Data Mining and Machine Learning.

Through this training, programmers and statisticians will be able to achieve the following:

  • Understanding in machine learning
  • Knowledge on concepts, theory and program codes of machine learning
  • Potential and impact of machine learning
  • Interests, excitement and opportunities for programmers and statisticians
  • Current, future and practical implementation of machine learning in Pharma R&D
  • Confidence in machine learning

Intended Audience: All levels

Tools Discussed: Python Scikit-Learn and TensorFlow, SAS® ML procedures and package, SAS Visual Data Mining and Machine Learning

Prerequisite: None

Class Outline:

  • Introduction of machine learning
    • What is machine learning?
    • Difference with normal programming
    • How does a machine learn?
    • Current application
    • Future application
  • Machine learning concepts
    • Hypothesis
    • Cost function
    • Gradient descent
    • Learning rate
  • Machine learning types
    • Supervised machine learnings
    • Unsupervised Machine learnings
  • Machine learning algorithms
    • Classification
    • Regression
    • Linear regression
    • Decision tree
    • Clustering
  • Artificial neural networks
    • Definition
    • Structures – input layers, hidden layers and output layers
    • Parameters – weight, activation function, learning rate
    • Google TensorFlow
    • Deep Neural Networks, Convolutional Neural Network and Recurrent Neural Network