Project Overview
- Role: ML Intern
- Location: San Jose
- Duration: 08/2024 - 03/2025
Key Contributions
- Latency Optimization: Engineered a high-performance resume generation pipeline, migrating from GPT-4o API to a self-hosted Llama 3 model served via vLLM.
- Performance Impact: Achieved a 67% reduction in generation latency (from 15s to 5s) and slashed operational costs by 30%.
- Prompt Engineering: Designed a 5-step prompt chain incorporating Chain-of-Thought (CoT) to improve rewriting accuracy.
- Semantic Search: Leveraged fine-tuned LLMs (DeepSeek, GPT-4) to power semantic search across 40+ dimensions.
Link to Site