Artificial Intelligence Engineer

at ApTask
Published January 18, 2024
Location San Francisco, CA
Category Default  
Job Type Full-time  


Position: Artificial Intelligence Engineer
Location: San Francisco, CA (Day 1 Onsite)
Duration: Fulltime/Permanent

Job Details:

Technical/Functional Skills:
Proficiency in RoCEv2, K8s, KVM, Ubuntu, Python, Shell, Go, Rust, GPU drivers, and Cluster interconnect with 200G/400G networking.
Managing GPU clusters optimizing GPU-based services/tools/software

Roles & Responsibilities:
•Develop, implement, and maintain GPU-based clusters of 10 to 1000 nodes, ensuring optimal performance and availability.
•Administer ML/AI platforms – Distributed ML services, LLMs, Vector-DB and AI inferencing, by managing deployments, resource allocation, monitoring, and security.
•Collaborate with cross-functional teams to address AI infrastructure requirements, support AI-related projects, and provide technical expertise.
•Monitor and evaluate the performance of AI systems and clusters, ensuring that they adhere to industry best practices and meet company standards.
•Compile reports, document procedures, and publish recommendations for improving AI infrastructure and solutions.
•Use AI/ML to continuously improve internal processes and tools that are used in end-to-end delivery of your services in this team.