# Data Science Masters Program

## Data Science Masters Program

Data Science Master program at Goals InfoCloud Technologies provided by experienced Data Scientists. Our Data Science Course module is completely designed about how to analyze Data Science with R programming and Data Science with Python programming. Data Science course certification will help you be a professional Data Scientist. If you really Interested to Learn Data Science, then Goals InfoCloud Technologies is the Right place.

This course prepares you for the role of Data Scientist by making you an expert in Statistics, Data Science, Big Data, R Programming, Python. There is an increasing demand for skilled data scientists across all industries, making this data science certification course well-suited for participants at all levels of experience.

## Learning Path Curriculum

The term “data scientist” is an industry recognized designation for a professional with deep analytics experience, industry knowledge, and skills. Our Data Science Masters Training will give hands-on experience to you to meet the demands of industry needs.

## Batch Schedule for Data Science Masters Program

Goals InfoCloud Technologies provides flexible timings to all our students. Here are the Data Science Masters Program Shedule for our branch. If this schedule doesn’t match please let us know. We will try to arrange appropriate timings based on your flexible timings.

#### Statistics Essentials for Analytics

All the topics in the following section will explain the basis of what it is, which scenario you want to use, What math behind it, How to implement with an analytic tool, what inferences you are getting from the final result.

• Understanding the Data
• Probability and its Uses
• Statistical Inference
• Data Clustering
• Testing the Data
• Regression Modelling

#### Module 1: Introduction to R (Duration: 2Hrs)

• What is R?
• Why R?
• Installing R
• R environment
• How to get help in R
• R Studio Overview

#### Module 2: R Basics (Duration: 5Hrs)

• Environment setup
• Data Types
• Variables
• Vectors
• Lists
• Matrix
• Array
• Factors
• Data Frames
• Loops
• Packages
• Functions
• In-Built Data sets

#### Module 3: R Packages (Duration: 3Hrs)

• DMwR
• Dplyr/plyr
• Caret
• Lubridate
• E1071
• Cluster/fpc
• table
• Stats/utils
• Ggplot/ggplot2
• Glmnet

#### Module 4: Machine Learning using R (Duration: 10Hrs)

• Linear Regression
• Logistic Regression
• K-Means
• K-Means++
• Hierarchical Clustering – Agglomerative
• CART
• c5.0
• Random forest
• Naïve Bayes

#### Module 1: Introduction to Data Science (Duration: 1Hr)

• What is Data Science?
• What is Machine Learning?
• What is Deep Learning?
• What is AI?
• Data Analytics & it’s types

#### Module 2: Introduction to Python (Duration: 1Hr)

• What is Python?
• Why Python?
• Installing Python
• Python IDEs
• Jupyter Notebook Overview

#### Module 3: Python Basics (Duration: 5Hrs)

• Python Basic Data types
• Lists
• Slicing
• IF statements
• Loops
• Dictionaries
• Tuples
• Functions
• Array
• Selection by position & Labels

#### Module 4: Python Packages (Duration: 2Hrs)

• Pandas
• Numpy
• Sci-kit Learn
• Mat-plot library

#### Module 5: Importing data (Duration: 1Hr)

• Saving in Python data
• Writing data to csv file

#### Module 6: Manipulating Data (Duration: 1Hr)

• Selecting rows/observations
• Rounding Number
• Selecting columns/fields
• Merging data
• Data aggregation
• Data munging techniques

#### Module 7: Statistics Basics (Duration: 11Hrs)

• Central Tendency
• Mean
• Median
• Mode
• Skewness
• Normal Distribution
• Probability Basics
• What does mean by probability?
• Types of Probability
• ODDS Ratio?
• Standard Deviation
• Data deviation & distribution
• Variance
• Underfitting
• Overfitting
• Distance metrics
• Euclidean Distance
• Manhattan Distance
• Outlier analysis
• What is an Outlier?
• Inter Quartile Range
• Box & whisker plot
• Upper Whisker
• Lower Whisker
• Scatter plot
• Cook’s Distance
• Missing Value treatments
• What is a NA?
• Central Imputation
• KNN imputation
• Dummification
• Correlation
• Pearson correlation
• Positive & Negative correlation

#### Module 8: Error Metrics (Duration: 3Hrs)

• Classification
• Confusion Matrix
• Precision
• Recall
• Specificity
• F1 Score
• Regression
• MSE
• RMSE
• MAPE

#### Module 9: Supervised Learning (Duration: 6Hrs)

• Linear Regression
• Linear Equation
• Slope
• Intercept
• R square value
• Logistic regression
• ODDS ratio
• Probability of success
• Probability of failure
• ROC curve

#### Module 10: Unsupervised Learning (Duration: 4Hrs)

• K-Means
• K-Means ++
• Hierarchical Clustering

#### Module 11: Other Machine Learning algorithms (Duration: 10Hrs)

• K – Nearest Neighbour
• Naïve Bayes Classifier
• Decision Tree – CART
• Decision Tree – C50
• Random Forest

#### Module 1: Tableau Course Material (Duration: 5Hrs)

• Start Page
• Show Me
• Connecting to Excel Files
• Connecting to Text Files
• Connect to Microsoft SQL Server
• Connecting to Microsoft Analysis Services
• Creating and Removing Hierarchies
• Bins
• Joining Tables
• Data Blending

#### Module 2: Learn Tableau Basic Reports (Duration: 5Hrs)

• Parameters
• Grouping Example 1
• Grouping Example 2
• Edit Groups
• Set
• Combined Sets
• Creating a First Report
• Data Labels
• Create Folders
• Sorting Data
• Add Totals, Sub Totals and Grand Totals to Report

#### Module 3: Learn Tableau Charts (Duration: 4Hrs)

• Area Chart
• Bar Chart
• Box Plot
• Bubble Chart
• Bump Chart
• Bullet Graph
• Circle Views
• Dual Combination Chart
• Dual Lines Chart
• Funnel Chart
• Gantt Chart
• Grouped Bar or Side by Side Bars Chart
• Heatmap
• Highlight Table
• Histogram
• Cumulative Histogram
• Line Chart
• Lollipop Chart
• Pareto Chart
• Pie Chart
• Scatter Plot
• Stacked Bar Chart
• Text Label
• Tree Map
• Word Cloud
• Waterfall Chart

#### Module 4: Learn Tableau Advanced Reports (Duration: 6Hrs)

• Dual Axis Reports
• Blended Axis
• Individual Axis
• Reference Bands
• Reference Distributions
• Basic Maps
• Symbol Map
• Mapbox Maps as a Background Map
• WMS Server Map as a Background Map

#### Module 5: Learn Tableau Calculations & Filters (Duration: 6Hrs)

• Calculated Fields
• Basic Approach to Calculate Rank
• Advanced Approach to Calculate Rank
• Calculating Running Total
• Filters Introduction
• Quick Filters
• Filters on Dimensions
• Conditional Filters
• Top and Bottom Filters
• Filters on Measures
• Context Filters
• Slicing Fliters
• Data Source Filters
• Extract Filters

#### Module 6: Learn Tableau Dashboards (Duration :4Hrs)

• Create a Dashboard
• Format Dashboard Layout
• Create a Device Preview of a Dashboard
• Create Filters on Dashboard
• Dashboard Objects
• Create a Story

## Data Science Jobs Out Look

Data Science Careers Outlook. A shortage of data scientists means the employment outlook for professionals with the required knowledge and technical skills is extremely positive. It predicts that between now and 2020, demand for data scientists and data engineers will grow by 39 percent.

## FAQ's

Cleaning data from multiple sources to transform it into a format that data analysts or data scientists can work with is a cumbersome process because – as the number of data sources increases, the time take to clean the data increases exponentially due to the number of sources and the volume of data generated in these sources. It might take up to 80% of the time for just cleaning data making it a critical part of analysis task.

Logistic Regression often referred as logit model is a technique to predict the binary outcome from a linear combination of predictor variables. For example, if you want to predict whether a particular political leader will win the election or not. In this case, the outcome of prediction is binary i.e. 0 or 1 (Win/Lose). The predictor variables here would be the amount of money spent for election campaigning of a particular candidate, the amount of time spent in campaigning, etc.

A subclass of information filtering systems that are meant to predict the preferences or ratings that a user would give to a product. Recommender systems are widely used in movies, news, research articles, products, social tags, music, etc.

Linear regression is a statistical technique where the score of a variable Y is predicted from the score of a second variable X. X is referred to as the predictor variable and Y as the criterion variable.

Estimating a value from 2 known values from a list of values is Interpolation. Extrapolation is approximating a value by extending a known set of values or facts.