Data analysis with lean six sigma
We use many statistical concept in data science project and now researcher focusing on using lean six sigma concepts in data analysis and interpretation. some of the important topics in lean six sigma analytics are as follows
Data manipulation
Descriptive Statistics
Histogram, distribution curve, and confidence level
Box plot, stem and leaf plot, scatter plot, and heat map
Pearson's correlation and inter-correlation
Hypothesis testing - ANOVA, z test, student t test, and chi square test
Statistical Process Control (SPC)
Probability Distribution
Probability Plot
Sampling
to solve these problems we can use excel solver ad-in or python scipy and sklearn library.
some of the python build models given in this link : data analysis with lean six sigma models in python click here
Prescriptive Analytics (optimization)
Optimization is an one of the core data science topic. where we try to predict or solve problems using present data rather than historical data status. optimization technique such as gradient decent and Monte Carlo simulation play an important role in machine learning and deep learning algorithms. some of the important topics in prescriptive analytics are as follows
Linear Programming (LP)
Integer Programming (IP)
Multi Criteria Decision Making (MCDM) technique such as Analytic Hierachy Process (AHP)
Non-linear Programming
Stochastic models such as Markov Models and Monte Carlo Simulation
to solve these problems we can use excel solver ad-in or python scipy library.
some of the python build models given in this link :prescriptive analytics models in python click here
Your Learning path in data science
As of now, the hottest jobs in the industry are machine learning, deep learning, business analyst, AI engineer, and Digital marketer. here iam trying to outline learning path for freshers, students and job enthusiast looking for career transition as data scientist or business analyst.
AI - Artificial Intelligence is a domain where we trying to create a replica of our brain other sense we can say creating a machine which behave as a intelligent human being. AI consist of both hardware and software; in software we can divide into two part rule based model and machine learning models.
In rule based model logic's are coded by developer to solve a problem statement. in machine learning/deep learning models logic is inferred by using historical data or experience.
In machine and deep learning various algorithms where developed by data scientist to improve model accuracy and precision. some of them are Random Forest, Support Vector Machines, DBSCAN, ANN, CNN, RNN etc..
The improved version of machine learning models with many layers and filters are termed as deep learning.
If you are planning to become a business analyst, machine learning engineer or deep learning engineer, Data scientist then follow this path.
- Foundation
a. Advanced Excel
Many will be having a doubt how come excel helps in data analysis. here is answer for that, many used cases are available in kaggle.com are in CSV or excel file format. so if you are good in terms of Microsoft excel then one can handle data in other programming language easily.
if you are not learned excel then click on this link and learn now for free.
b. Basic Statistics
If you like to master on data analysis then learn basic statistics, here no need to remember formula but you should be thorough with concept and its application in real world problems. some of the important statistical topics to be learned are: mean, median, mode, variance, standard deviation, central limit theorem, Probability and Distribution, Hypothesis testing, Covariance and correlation.
c. Data Visualization
Data visualization help one to understand data clearly and which guide analyst on feature selection and to create a model. some of the data visualization software available for free are tableau, Power BI and Google Data Studio.
learn them now by clicking following link:
d. Python
there are many non coding machine learning model building like google teachable machine. but if you are planning became a data scientist then learn python and python library helps to analyze and build machine/deep learning models.
e. Data Cleaning And Analysis
75-80% of time data analyst working on data cleaning and doing exploratory data analysis to improve model accuracy. specially in natural language processing (NLP) data cleaning such as stopwords, special characters, lemmatization, stemming will be import to build reliable model.
2. start with Machine Learning
Machine learning models are categorized into 3 they are supervised, unsupervised and semi-supervised (reinforcement) machine learning models. follow bellow given road map to be a machine learning engineer.
to be a machine learning engineer as we discussed in foundation you should learn statistics, excel, python and doing exploratory data analysis (EDA) using some python library's such as pandas, numpy, matplotlib, seaborn, D-tale, and sweetviz etc.
once you have done with EDA, next go for feature engineering where you should learn
a. Handling Missing Values, Outliers and Unbalanced Data set.
b. Categorical Encoding
c. Normalization and Standardization
the next step in ML modelling will be feature selection where we use statistical methods such as correlation, forword elimination, backword elimination, Univarient selection, and principle component analysis (PCA).
once though with these topics then move to building model by using scikit learn library in python. for this one should thorough with concepts such as Regression, Classification, Clustering, and models such as Linear models, Logistic Models, Decision Tress, Random Forest, XG boost, Kmeans, KNN, DBSCAN, adaboost and many more...
before finalizing the model one should check model performance using various evaluation metrics such as R square, accuracy, precision, confusion matrix and many more. models such as GridSearchCV, RandamizesSearch, CrossValue Score these are some of the hyper parametric tuning techniques used to improve model performance.
now you are ready with model, but no idea how to share your model with other, i mean web app or some GUI for your model where common man can use model. for that we have to learn few techniques related to model deployment such as Docker, Kubernets, Heroku, Flask, HTML and Git Hub repository for model deployment.
once you learn all these then its time to do end to end projects related to machine learning. which help you in interviews and to understand topics in depth. i suggest at-least 3 end to end project to be done with deployment.
Click here to start machine learning
3. start with Deep Learning
Once you have finished with machine learning, now its time to move ahead and learn deep learning concepts. where you can follow this road map
In deep learning always start with understanding the neural networks, loss functions such as squared error, absolute error, huber loss, cross entropy, and learn about optimizer such as Gradient Descent, Stochastic Gradient Descent, adagrad optimizer , RMSprop, Adam, adadelta etc... and also activation functions such as Relu, Sigmoid, Leaky relu, p relu, softmax etc...
once you understood these then move to build a artificial neural network (ANN) using multiple layers.
next topic to be learned in deep learning is convolutional neural network (CNN) in other words image classification and object detection. while building CNN model use google colab where GPU can be used to run model which consume much lower time compared to CPU in local environment.
last but least topic to be learned in deep learning will be Recurrent neural network (RNN) where you need to understand LSTM, Bidirectional LSTM, natural language process (NLP), stemming, word2vec, bag of words, stopwords, encoders and decoders, transformers, BERT, attention models etc...
Click here to start deep learning
4. start with reinforcement Learning
Reinforcement learning is a big topic where you need to learn game theory, machine learning models and deep learning models.
You should learn about various environments such as OpenAI’s Gym and libraries such as TF-Agents. You should study various techniques such as Behavior replication, Markov Decision Process, Q-Learning, Deep Q-Learning and more.
Finally build 5 to 6 ML and DL Model using different data set available in kaggle.com
Now you can call your self as a data scientist
Bonus: Data science project from author are given bellow check them if you interested
Machine Learning:
1. github.com/rajendra99999/machine_learning_models
2. https://github.com/rajendra99999/heart-prediction
3. https://github.com/rajendra99999/Air-Quality-Index-heroku
Deep Learning:
1. https://github.com/rajendra99999/Deep_learning_Notes
2. https://github.com/rajendra99999/customer_DL
3. https://colab.research.google.com/drive/1Chv_7XBrnuH81adZUgHywrnL_dBYeNda?usp=sharing
Financial Analytics:
1. https://github.com/rajendra99999/Financial_analytics
2. https://github.com/rajendra99999/Share_price_prediction
3. https://github.com/rajendra99999/goldprice
if you found this article helpful share with others and kindly subscribe to our YouTube Channel.