Job Information
Honeywell HPC Technical Leader in Tempe, Arizona
The future is what you make it. When you join Honeywell, you become a member of our global team of thinkers, innovators, dreamers and doers who make the things that make the future. That means changing the way we fly, fueling jets in an eco-friendly way, keeping buildings smart and safe and even making it possible to breathe on Mars. Working at Honeywell isn’t just about developing cool things. That’s why all of our employees enjoy access to dynamic career opportunities across different fields and industries. Are you ready to help us make the future?
The Enterprise Datacenter and Networks Organization designs, implements and operates a state-of-the-art Information Systems Infrastructure serving around 110,000 employees, in over 700 locations across 65 plus countries. We work with the latest and emerging technologies to deliver scalable, high performing, always available infrastructure services around the world. We deliver services that connect, host, virtualize, store, collaborate, integrate, compute and transact business solutions that utilizes cloud first, automation, analytics, “as a Service”, BIG data and software defined.
The HPC Global Leader will be responsible for designing, implementing and operating the Global HPC/GPU Compute Infrastructure with 200,000+ cores at Honeywell. You will Lead a team of HPC engineers responsible for managing the networks and storage, distributed DGX GPU compute resource, establish and automate AI/Deep Learning software stack, accelerate CFD applications, integrate on-premise hardware and operations with cloud-based solutions, work with IOT-based applications, and assist with quantum computing efforts and implementation. This role includes application rationalization components to best assist business teams on migration, upgrades or decommissioning of legacy Infrastructure. This role will be responsible for the entire lifecycle of the HPC infrastructure and applications assisting with automation and optimization activities while modernizing the different supported clusters.
Key Responsibilities:
Overall leadership and operational support of Global HPC and GPU Cluster Infrastructure in on-call shifts
Work with HPC SGI-based and NVIDIA clusters, networks and storage
Work with AI/Deep Learning Software stack
Troubleshoot HPC performance issues related to on-premise and virtualized workloads
Facilitate integration of HPC workloads to/from on-prem and off-prem cloud
Liaison with multiple business units to align the HPC service to their needs optimizing the environment
Consolidate and manage HPC TCO and overall financials
Develop HPC skills within the organization to build a standard and skilled workforce
10% travel
YOU MUST HAVE
A Bachelor’s Degree
US Citizen due to contractual requirements
8+ years of experience in operational support of enterprise HPC Clusters ecosystem involving:
HPC Linux and Windows Operating System
HPC clusters Enterprise level
HPC Networks: enhanced hypercube single or dual rail, dragonfly topologies, infiniband -HDR/EDR
Cluster management tools: HPCM, SMC, BCM, XCAT
HPC Storage- petabyte-scale- Lustre, PNFS, GPFS, Block-based RDMA, WekaIO
HPC storage Architecture – DDN, Panasas, WekaIO
Schedulers: Moab/Torque, Slurm, SGE, PBSPro
Disaster Recovery, redundant, distributed backups
Local and WAN big data migration network optimization
Computation Fluid Dynamics CFD optimization experience
HPC IT security
HPCaaS Cloud Integration
Virtualization Technology- VMware, Vbox, Qemu, Hypervisor, Xen, MS Hyper-V
3+ years of experience in operational support of GPU compute environment
NVIDIA GPU DGX1/2/DGX-A100 clusters and Pods
Building AI/Deep Learning Software stack at production level
Experience with docker, kubernetes, openstack, tensorflow
GPU Cloud integration with HPCaaS
Excellent oral, written and collaborative communication skills, including executive level communications
The ability to partner effectively across IT teams, suppliers and business customers on cross-functional projects and process improvements
Strong interpersonal skills - effective listening and teaming
Self-motivated, demonstrated bias for action
Skilled in partnering with internal customers at all levels to define problems, identify solutions, and facilitate change
WE VALUE
Bachelor’s Degree preferably in Information Technology, Computer Science, Engineering or Business-Related Discipline
Technical certifications in RHEL, Ubuntu
Excellent leadership communication and executive presence
Strong influencing, program and change management skills
Strong business acumen and customer focus
Creation of resulted-oriented Management Operating System
Creative and collaborative problem-solving capability
Attracting, motivating and developing diverse talent
The ability to partner effectively across IT teams, suppliers and business customers on cross-functional projects and process improvements
The ability to translate business issues / requirements and objectives into technical solutions
Strong knowledge of IT business processes and practices including ITIL methodology
Experience with data protection, business continuity and disaster recovery options, configuration and execution
Honeywell is an equal opportunity employer. Qualified applicants will be considered without regard to age, race, creed, color, national origin, ancestry, marital status, affectional or sexual orientation, gender identity or expression, disability, nationality, sex, religion, or veteran status.