AI Engineering graduate (Mansoura University, GPA 3.83/4.0) with internship experience at Samsung and InTimeTec building LLM-powered pipelines, RAG systems, and predictive models on datasets exceeding 700K+ records. Fine-tuned Llama 3.1 to outperform GPT-4o mini. AWS Certified Solutions Architect and Deep Learning specialist. UAE-based on a family visa with no sponsorship requirement.
- LLMs & AI Engineering: Fine-tuned Llama 3.1 (8B) with QLoRA on 400K samples; built RAG systems with sub-2-second retrieval and 87% accuracy across 4,000+ documents.
- Machine Learning: Classification and regression models using XGBoost, SVM, and Scikit-learn achieving up to 92% accuracy on datasets up to 700K+ records.
- Data Engineering & BI: Python ETL pipelines, SQL optimisation, Power BI dashboards with DAX measures, and KPI development across large-scale operational datasets.
InTimeTec — Data Scientist Intern
- Fine-tuned Llama 3.1 (8B parameters) using 4-bit quantization with QLoRA (rank=32, alpha=64) on 400K product samples, reducing memory footprint from 32GB to under 8GB.
- Achieved 75% accuracy with 0.36 RMSLE for price prediction, outperforming GPT-4o mini with RAG (69.2% accuracy).
- Developed a conversational AI system for product feature discussion and dynamic pricing prediction.
- Built Power BI dashboards with 10+ features and 18 DAX measures analysing trip duration, delivery status, and regional delay patterns across 140K+ records.
- Developed regression models (R²=0.84) forecasting warehouse processing times, cutting average processing time by 10 minutes and delivery delays by 20%.
Samsung — Data Scientist Intern
- Automated Python ETL workflows on 700K+ row datasets, delivering insights to 50+ stakeholders.
- Designed SQL pipelines reducing query execution to under 200ms for live KPI tracking across 15K+ dashboards.
- Built XGBoost and Random Forest models on 50K+ defect records achieving 92% accuracy, accelerating quality approvals by 27%.
| Domain | Technologies |
|---|---|
| LLMs & AI | |
| ML & Data Science | |
| BI & Analysis | |
| Core & Cloud |
RAG-based knowledge retrieval system across 4,000+ HR documents using FastAPI, Qdrant, MongoDB, and LangChain.
- Engineered dual-database architecture — Qdrant vector store (384-dim embeddings) + MongoDB for metadata — achieving sub-2-second query responses with 87% retrieval accuracy.
- Implemented intelligent chunking with LangChain's recursive text splitter (256–1024 token chunks, 20–100 token overlap) for context-preserving semantic retrieval.
- Built multi-endpoint AI system with semantic re-ranking, RAG Q&A, HR email generation, and web scraping via OpenRouter API (Grok-4.1), handling 50+ concurrent requests.
End-to-end attrition analysis and classification on 68K HR records across 35 attributes.
- Built SVM, XGBoost, and Naive Bayes models achieving 91% accuracy and 0.90 F1-score.
- Applied three feature selection methods (RF importance, RFECV, Chi-squared) with SMOTE for class imbalance handling.
- Identified that 60% of attrition concentrated in the lowest income bracket through EDA and Power BI visualisations.
End-to-end analysis of US residential real estate sales using BigQuery SQL and Power BI.
- Cleaned 113K+ property records in BigQuery — deduplication via window functions, null resolution, and city name standardisation.
- Built a 3-page Power BI dashboard across 12 states and 23 years covering pricing trends, seasonal sales volume, and property characteristics.
- Identified New Jersey as the top market by volume (20K+ properties) and New York by market size ($21bn).




