ML Intern: OpenJobs AI

Latency Optimization: Engineered a high-performance resume generation pipeline, migrating from GPT-4o API to a self-hosted Llama 3 model served via vLLM.
Performance Impact: Achieved a 67% reduction in generation latency (from 15s to 5s) and slashed operational costs by 30%.
Prompt Engineering: Designed a 5-step prompt chain incorporating Chain-of-Thought (CoT) to improve rewriting accuracy.
Semantic Search: Leveraged fine-tuned LLMs (DeepSeek, GPT-4) to power semantic search across 40+ dimensions.