1. Fundamentatls
    1. Matrices & Linear Algebra Fundamentals
    2. Hash Functions, Binary Tree, O(n)
    3. Relational Algebra, DB Basics
    4. Inner, Outer, Cross, Theta Join
    5. CAP Theorem
    6. Tabular Data
    7. Entrophy
    8. Data Frames & Series
    9. Sharding
    10. OLAP (Online analytical processing)
    11. Multidimensional Data Model
    12. ETL (Extract, Transform, Load)
    13. Reporting Vs BI Vs Analytics
    14. JSON & XML
    15. NoSQL
    16. Regex
    17. Vendor Landscape
    18. Env Setup
  2. Statistics
    1. Pick a Dataset (UCI Repo)
    2. Descriptive Statistics (mean, median, range, SD, Var)
    3. Exploratory Data Analysis
    4. Histograms
    5. Percentiles & Outliers
    6. Probability Theory
    7. Bayes Theorem
    8. Random Variables
    9. CDF
    10. Continuous Distributions (Normal, Poisson, Gaussian)
    11. Skewness
    12. ANOVA
    13. PDF
    14. CLR
    15. Monte Carlo Method
    16. Hypothesis Testing
    17. p-Value
    18. Chi-Square Test
    19. Estimation
    20. Confidence Interval
    21. MLE
    22. Kernel Density Estimation
    23. Regression
    24. Covariance
    25. Correlation
    26. Pearson Coeff
    27. Causation
    28. Least square Fit
    29. Euclidean Distance
  3. Programming
    1. python basics
    2. working in excel
    3. r setup & studi
    4. r basics
    5. expressions
    6. variables
    7. vectors
    8. matrices
    9. arrays
    10. factors
    11. lists
    12. data frames
    13. reading csv data
    14. reading raw data
    15. subsetting data
    16. manipulate data frames
    17. functions
    18. factor analysis
    19. install pkgs
    20. ibm spss
    21. rapid miner
  4. Machine Learning
    1. what is ml?
    2. numerical var
    3. categorical var
    4. supervised learning
    5. unsupervised learning
    6. concepts, inputs & attributes
    7. training & test data
    8. classifier
    9. prediction
    10. lift
    11. overfitting
    12. bias & variance
    13. trees & classification
    14. classifciation rate
    15. decision trees
    16. boosting
    17. naive bayes classifiers
    18. k-nearest clssifiers
    19. logistic regression
    20. ranking
    21. linear regression
    22. perceptron
    23. hierarchical clustering
    24. k-means clustering
    25. neural networks
    26. sentiment analysis
    27. collaborative filtering
    28. tagging
  5. Text Mining / Natural Languate Processing
    1. vocabulary mapping
    2. classify text
    3. using nltk
    4. using weka
    5. using mahout
    6. feature extraction
    7. market based analysis
    8. association rules
    9. support vector machines
    10. term frequency & weight
    11. term document matrix
    12. uima
    13. text analysis
    14. named entity recognition
    15. corpus
  6. Data Visualization
    1. data exploraion in R
  7. Big Data
  8. Data Ingestion
  9. Data Munging
  10. Toolbox

 

 

citation

Becoming a Data Scientist – Curriculum via Metromap