Professional Summary
AI/ML Cloud Engineer with 10+ years of experience supporting and optimizing cloud infrastructure solutions. Specialized in GPU-accelerated AI/ML workloads with proven expertise in Kubernetes (CKA/CKAD certified), SLURM for HPC job management, and MLOps pipeline optimization.
Demonstrated success supporting large-scale distributed training and inference workloads including Megatron-LM implementations. Strong customer-facing experience providing technical guidance, conducting workshops, and developing solutions that directly impact business outcomes. Deep understanding of GPU computing, ML frameworks, and cloud infrastructure optimization. Published technical author on Kubernetes networking and container orchestration.
Professional Experience
AI/ML Cloud Engineer
- Provide support and architecture guidance for GPU-accelerated AI/ML workloads on Kubernetes infrastructure, serving as trusted advisor for customers implementing distributed training and inference pipelines
- Support SLURM implementations for HPC job management and scheduling, enabling efficient resource allocation for large-scale distributed computing workloads
- Configure and optimize Megatron-LM deployments for large-scale distributed training on multi-GPU clusters, ensuring optimal performance and scalability
- Deploy and integrate Weights & Biases (WandB) for ML experiment tracking and visualization, enabling data-driven model optimization for customers
- Conduct technical workshops and provide hands-on guidance to customers on GPU cloud best practices, ML frameworks, and pipeline optimization
- Support both AMD64 and ARM architectures for inference and training jobs, providing expertise across diverse hardware platforms
- Maintain proficiency with Prometheus, Grafana, and Loki for infrastructure observability and troubleshooting
Senior Linux Programmer / Administrator
- Designed and implemented highly efficient real-time data integration system using PHP, Python, and Bash, improving data accuracy and operational efficiency
- Developed custom API integrations (Smartsheet APIs) and automation scripts to streamline workflows and support business requirements
- Managed VMware vSphere infrastructure for virtualized workloads, creating standardized processes and documentation for rapid deployment
- Supported enterprise-scale Linux point-of-sale infrastructure across 1,000+ locations with 2,000+ systems, ensuring high availability and reliability
- Completed IBM training courses in DevOps, Python for Data Science and AI, and Continuous Integration and Delivery (CI/CD)
Senior Linux Developer / Administrator
- Designed and deployed high-availability Linux clusters using Kubernetes, implementing comprehensive monitoring with Prometheus and Grafana
- Managed multi-cloud infrastructure spanning local, Google Cloud, Azure, and AWS environments
- Developed and maintained web applications using PHP, Perl, JavaScript, and HTML with focus on performance and scalability
- Implemented CI/CD pipelines using Jenkins and GitHub for automated testing and deployment
- Deployed containerized applications using Docker and Kubernetes with persistent storage solutions
- Administered 2,000+ domains including DNS, Apache/Nginx web servers, SSL certificates, and email services
Technical Expertise
Cloud & Infrastructure
AI/ML Technologies
Programming & Scripting
Observability
Systems & Platforms
GPU Stack
Certifications
Publications & Technical Writing
Education
Bachelor of Business Administration (BBA) - GPA: 3.90
Accounting and Computer Science
Research & Projects
AI Document Search: Google File Search API Review
Started as a deeper comparison of RAG architectures (custom pipelines vs managed services), but Google's File Search API made it trivially simple. Built a working AI-powered document search demo in two hours with 50 lines of code. Indexed 12 classic novels with full semantic search for 35 cents. Research concluded early - the managed approach just works.