Greg Abrams

AI/ML Cloud Engineer

Professional Summary

AI/ML Cloud Engineer with 10+ years of experience supporting and optimizing cloud infrastructure solutions. Specialized in GPU-accelerated AI/ML workloads with proven expertise in Kubernetes (CKA/CKAD certified), SLURM for HPC job management, and MLOps pipeline optimization.


Demonstrated success supporting large-scale distributed training and inference workloads including Megatron-LM implementations. Strong customer-facing experience providing technical guidance, conducting workshops, and developing solutions that directly impact business outcomes. Deep understanding of GPU computing, ML frameworks, and cloud infrastructure optimization. Published technical author on Kubernetes networking and container orchestration.

Professional Experience

AI/ML Cloud Engineer

CoreWeave | Livingston, New Jersey (Remote)
2023 - 2025
  • Provide support and architecture guidance for GPU-accelerated AI/ML workloads on Kubernetes infrastructure, serving as trusted advisor for customers implementing distributed training and inference pipelines
  • Support SLURM implementations for HPC job management and scheduling, enabling efficient resource allocation for large-scale distributed computing workloads
  • Configure and optimize Megatron-LM deployments for large-scale distributed training on multi-GPU clusters, ensuring optimal performance and scalability
  • Deploy and integrate Weights & Biases (WandB) for ML experiment tracking and visualization, enabling data-driven model optimization for customers
  • Conduct technical workshops and provide hands-on guidance to customers on GPU cloud best practices, ML frameworks, and pipeline optimization
  • Support both AMD64 and ARM architectures for inference and training jobs, providing expertise across diverse hardware platforms
  • Maintain proficiency with Prometheus, Grafana, and Loki for infrastructure observability and troubleshooting

Senior Linux Programmer / Administrator

Genesco | Nashville, Tennessee
2022 - 2023
  • Designed and implemented highly efficient real-time data integration system using PHP, Python, and Bash, improving data accuracy and operational efficiency
  • Developed custom API integrations (Smartsheet APIs) and automation scripts to streamline workflows and support business requirements
  • Managed VMware vSphere infrastructure for virtualized workloads, creating standardized processes and documentation for rapid deployment
  • Supported enterprise-scale Linux point-of-sale infrastructure across 1,000+ locations with 2,000+ systems, ensuring high availability and reliability
  • Completed IBM training courses in DevOps, Python for Data Science and AI, and Continuous Integration and Delivery (CI/CD)

Senior Linux Developer / Administrator

Consulting First, Inc. | Nashville, Tennessee
2011 - 2022
  • Designed and deployed high-availability Linux clusters using Kubernetes, implementing comprehensive monitoring with Prometheus and Grafana
  • Managed multi-cloud infrastructure spanning local, Google Cloud, Azure, and AWS environments
  • Developed and maintained web applications using PHP, Perl, JavaScript, and HTML with focus on performance and scalability
  • Implemented CI/CD pipelines using Jenkins and GitHub for automated testing and deployment
  • Deployed containerized applications using Docker and Kubernetes with persistent storage solutions
  • Administered 2,000+ domains including DNS, Apache/Nginx web servers, SSL certificates, and email services

Technical Expertise

Cloud & Infrastructure

Kubernetes (CKA/CKAD) Docker AWS Azure Google Cloud Linux Cilium Flannel

AI/ML Technologies

GPU Computing SLURM Megatron-LM PyTorch TensorFlow Weights & Biases NVIDIA AI

Programming & Scripting

Python Bash PHP JavaScript VS Code

Observability

Prometheus Grafana Loki

Systems & Platforms

Red Hat Linux Ubuntu Debian VMware vSphere VAST Data Storage

GPU Stack

CUDA GPU Drivers AMD64 ARM

Certifications

CKA: Certified Kubernetes Administrator
CKAD: Certified Kubernetes Application Developer
NVIDIA AI Infrastructure and Operations Fundamentals
Observability with Grafana, Prometheus, Loki, Alloy and Tempo
VAST 101 - VAST Fundamentals
VAST 201 - Networking Overview & Best Practices
Continuous Integration and Continuous Delivery (CI/CD) - IBM
DevOps Essentials - IBM
Python for Data Science and AI - IBM

Publications & Technical Writing

Education

Bachelor of Business Administration (BBA) - GPA: 3.90

Columbus State University - Dean's List

Accounting and Computer Science

Research & Projects

AI Document Search: Google File Search API Review

December 2025

Started as a deeper comparison of RAG architectures (custom pipelines vs managed services), but Google's File Search API made it trivially simple. Built a working AI-powered document search demo in two hours with 50 lines of code. Indexed 12 classic novels with full semantic search for 35 cents. Research concluded early - the managed approach just works.

View Research Try Live Demo LinkedIn