ML Intern: OpenJobs AI

Project Overview

  • Role: ML Intern
  • Location: San Jose
  • Duration: 08/2024 - 03/2025

Key Contributions

  • Latency Optimization: Engineered a high-performance resume generation pipeline, migrating from GPT-4o API to a self-hosted Llama 3 model served via vLLM.
  • Performance Impact: Achieved a 67% reduction in generation latency (from 15s to 5s) and slashed operational costs by 30%.
  • Prompt Engineering: Designed a 5-step prompt chain incorporating Chain-of-Thought (CoT) to improve rewriting accuracy.
  • Semantic Search: Leveraged fine-tuned LLMs (DeepSeek, GPT-4) to power semantic search across 40+ dimensions.

Link to Site