A comprehensive collection of end-to-end data analysis projects, showcasing expertise in Data Gathering, Advanced Cleaning Pipelines, Exploratory Data Analysis (EDA), and Feature Engineering.
Tools Used: pandas, numpy, regex
- Engineered a robust data wrangling pipeline to transform raw, unstructured smartphone specifications into a clean, analytical dataset.
- Handled missing values, standardized formats, extracted complex nested string patterns, and managed outlier detection to prepare data for machine learning modeling.
Tools Used: pandas, matplotlib, seaborn, scikit-learn
- Conducted an exhaustive Exploratory Data Analysis on the Ames Housing dataset to identify key predictors of house prices.
- Implemented rigorous feature engineering, correlation analysis, and data scaling (Standardization/Normalization) strategies.
- Analyzed skewness and performed log transformations on the target variable.
Tools Used: pandas, seaborn
- Extracted actionable insights from the iconic Titanic dataset through deep univariate and bivariate analysis.
- Evaluated survival rates across socioeconomic classes, genders, and age groups using statistical visualizations.
A curated collection of theoretical implementations and best practices, combined into unified modules. Located in /4_Data_Science_Handbook.
| Module | Description | Key Concepts |
|---|---|---|
01_Statistical_Data_Analysis |
Univariate and Bivariate analysis | Categorical/Numerical distributions, Correlation matrices, Probability distributions |
02_Data_Wrangling_Techniques |
Data assessment and cleaning | Missing data imputation, Iterative assessment (Define-Code-Test) |
03_Data_Visualization_Masterclass |
Advanced visualization strategies | Matplotlib Object-Oriented API, Seaborn Relational/Matrix/Joint plots |
04_Data_Acquisition_and_Scraping |
Gathering raw data | Pandas I/O (CSV, JSON, SQL, Excel), REST APIs, BeautifulSoup web scraping |
- Clone the repository:
git clone https://github.com/Ashwin14101/Data-Analysis-Portfolio.git
cd Data-Analysis-Portfolio- Install dependencies:
pip install pandas numpy matplotlib seaborn scikit-learn jupyterlab- Launch the environment:
jupyter labLooking for a passionate Data Analyst/Scientist? Feel free to reach out via GitHub or LinkedIn!
MIT License © Ashwin14101