WNTD
Job Specification: Solution Architect - NVIDIA Cluster (End-to-End Design & Validation) Location: London (1 day per week onsite) Travel: Occasional travel to datacenter sites outside the UK Engagement: Contract Inside IR35 Department: Engineering/Advanced Compute Role Overview We are seeking a highly skilled Solution Architect with deep experience in designing, validating, and delivering end-to-end NVIDIA GPU clusters in enterprise and hyperscale environments. This individual will own the full life cycle of architectural design-from requirements gathering through implementation oversight and performance validation. They will work closely with engineering, networking, DevOps, security, and datacenter operations teams to ensure high-performance, scalable, and resilient GPU infrastructure for AI, HPC, and ML workloads. The role is primarily London-based one day per week , with occasional international travel required to support datacenter design reviews, deployment validation, or site acceptance testing. Key Responsibilities Architecture & Design Lead the architecture of NVIDIA GPU clusters leveraging technologies such as H100/H200 , NVLink, NVSwitch, DGX, HGX, or SuperPod-class designs. Produce high-level and low-level designs (HLD/LLD), including compute, network, storage, and power/cooling considerations. Validate hardware and platform selections, ensuring architectural alignment with customer requirements and scalability goals. Design fabric architectures including InfiniBand (200/400Gb) , RoCE, and high-performance east-west traffic patterns. Ensure designs adhere to NVIDIA reference architectures (NVAIE, Base Command, DGX SuperPod specs, etc.). Cluster Integration & Validation Define and execute validation test plans for GPU cluster performance, resilience, networking throughput, and workload behaviour. Oversee integration of GPU nodes, networking, and storage systems into the existing datacenter environment. Collaborate with DevOps/Platform teams to validate cluster orchestration (Kubernetes, Slurm, Bright Cluster Manager, or equivalents). Validate firmware, drivers, NCCL, CUDA libraries, and container environments for production readiness. Deployment & Delivery Oversight Provide technical leadership across the full deployment life cycle. Partner with datacenter operations to ensure correct rack layouts, cabling, airflow and power design. Support delivery teams during build-out phases, ensuring the design is executed correctly. Participate in factory acceptance tests (FAT), site acceptance tests (SAT), and operational readiness reviews. Stakeholder Collaboration Work closely with internal and external teams including network engineering, platform engineering, procurement, and vendors such as NVIDIA, Mellanox, Supermicro, Dell, or HPE. Provide technical guidance to customers, partners, and cross-functional engineering teams. Communicate complex architectural concepts clearly to both technical and non-technical audiences. Documentation & Governance Produce detailed architecture documents, diagrams, acceptance criteria, and operational runbooks. Ensure security, compliance, and governance standards are built into the design. Provide knowledge transfer (KT) and training sessions to internal teams where required. Required Skills & Experience Technical Expertise Proven experience architecting and delivering NVIDIA GPU clusters at scale (AI/ML/HPC environments). Strong hands-on understanding of GPU interconnects (NVLink/NVSwitch) and DGX/HGX/SuperPod architectures. Deep knowledge of InfiniBand and high-performance networking architectures. Experience with cluster orchestration: Kubernetes , Slurm, PBS, or similar. Familiarity with AI/ML workload requirements, CUDA, Docker/OCI containers, and NVIDIA software stacks (NCCL, CUDA Toolkit). Comfort with Linux systems engineering, hardware validation, and troubleshooting across compute/network layers. Soft Skills Strong communication skills, with the ability to bridge engineering and business discussions. Comfortable owning architecture decisions and delivering executive-ready documentation. Ability to work autonomously while coordinating with multi-disciplinary teams. Problem-solver with strong critical-thinking abilities and a delivery-focused mindset. Desirable Experience Experience with hyperscaler-class deployments or multi-megawatt datacenter environments. Work with NVIDIA Base Command Manager or similar cluster management tooling. Exposure to data pipelines, storage systems (Lustre, GPUDirect Storage, Ceph), or AI workflow platforms. Certifications such as NVIDIA Certified Associate/Expert , Kubernetes certifications (CKA/CKS), or related vendor accreditations. What We Offer Hybrid working: 1 day per week in London Opportunity to design next-generation high-performance GPU infrastructure Exposure to cutting-edge AI compute at scale
Job Specification: Solution Architect - NVIDIA Cluster (End-to-End Design & Validation) Location: London (1 day per week onsite) Travel: Occasional travel to datacenter sites outside the UK Engagement: Contract Inside IR35 Department: Engineering/Advanced Compute Role Overview We are seeking a highly skilled Solution Architect with deep experience in designing, validating, and delivering end-to-end NVIDIA GPU clusters in enterprise and hyperscale environments. This individual will own the full life cycle of architectural design-from requirements gathering through implementation oversight and performance validation. They will work closely with engineering, networking, DevOps, security, and datacenter operations teams to ensure high-performance, scalable, and resilient GPU infrastructure for AI, HPC, and ML workloads. The role is primarily London-based one day per week , with occasional international travel required to support datacenter design reviews, deployment validation, or site acceptance testing. Key Responsibilities Architecture & Design Lead the architecture of NVIDIA GPU clusters leveraging technologies such as H100/H200 , NVLink, NVSwitch, DGX, HGX, or SuperPod-class designs. Produce high-level and low-level designs (HLD/LLD), including compute, network, storage, and power/cooling considerations. Validate hardware and platform selections, ensuring architectural alignment with customer requirements and scalability goals. Design fabric architectures including InfiniBand (200/400Gb) , RoCE, and high-performance east-west traffic patterns. Ensure designs adhere to NVIDIA reference architectures (NVAIE, Base Command, DGX SuperPod specs, etc.). Cluster Integration & Validation Define and execute validation test plans for GPU cluster performance, resilience, networking throughput, and workload behaviour. Oversee integration of GPU nodes, networking, and storage systems into the existing datacenter environment. Collaborate with DevOps/Platform teams to validate cluster orchestration (Kubernetes, Slurm, Bright Cluster Manager, or equivalents). Validate firmware, drivers, NCCL, CUDA libraries, and container environments for production readiness. Deployment & Delivery Oversight Provide technical leadership across the full deployment life cycle. Partner with datacenter operations to ensure correct rack layouts, cabling, airflow and power design. Support delivery teams during build-out phases, ensuring the design is executed correctly. Participate in factory acceptance tests (FAT), site acceptance tests (SAT), and operational readiness reviews. Stakeholder Collaboration Work closely with internal and external teams including network engineering, platform engineering, procurement, and vendors such as NVIDIA, Mellanox, Supermicro, Dell, or HPE. Provide technical guidance to customers, partners, and cross-functional engineering teams. Communicate complex architectural concepts clearly to both technical and non-technical audiences. Documentation & Governance Produce detailed architecture documents, diagrams, acceptance criteria, and operational runbooks. Ensure security, compliance, and governance standards are built into the design. Provide knowledge transfer (KT) and training sessions to internal teams where required. Required Skills & Experience Technical Expertise Proven experience architecting and delivering NVIDIA GPU clusters at scale (AI/ML/HPC environments). Strong hands-on understanding of GPU interconnects (NVLink/NVSwitch) and DGX/HGX/SuperPod architectures. Deep knowledge of InfiniBand and high-performance networking architectures. Experience with cluster orchestration: Kubernetes , Slurm, PBS, or similar. Familiarity with AI/ML workload requirements, CUDA, Docker/OCI containers, and NVIDIA software stacks (NCCL, CUDA Toolkit). Comfort with Linux systems engineering, hardware validation, and troubleshooting across compute/network layers. Soft Skills Strong communication skills, with the ability to bridge engineering and business discussions. Comfortable owning architecture decisions and delivering executive-ready documentation. Ability to work autonomously while coordinating with multi-disciplinary teams. Problem-solver with strong critical-thinking abilities and a delivery-focused mindset. Desirable Experience Experience with hyperscaler-class deployments or multi-megawatt datacenter environments. Work with NVIDIA Base Command Manager or similar cluster management tooling. Exposure to data pipelines, storage systems (Lustre, GPUDirect Storage, Ceph), or AI workflow platforms. Certifications such as NVIDIA Certified Associate/Expert , Kubernetes certifications (CKA/CKS), or related vendor accreditations. What We Offer Hybrid working: 1 day per week in London Opportunity to design next-generation high-performance GPU infrastructure Exposure to cutting-edge AI compute at scale
WNTD
Derby, Derbyshire
3rd Line Analyst (Wintel/AD Specialist) Location: Remote with occasional visits to Derby Contract: 12-month FTC Clearance: Current SC clearance (or above) required - no dual nationality Day Rate: up to £500 - £550 Role Purpose We are seeking an experienced 3rd Line Analyst to join our team, specialising in Wintel and Active Directory environments . You will be responsible for resolving complex technical incidents, supporting a large-scale hybrid infrastructure, and ensuring service excellence across authentication, authorisation, and access services. This role combines business-as-usual support with the opportunity to contribute to solution design and implementation. Key Responsibilities Technical (80%) Respond to incidents and service calls, ensuring SLA targets are consistently met. Deliver higher First Time Fix rates and resolve escalated technical issues. Monitor call queues, liaise with specialist teams, and drive timely resolution. Provide operational support for Microsoft Windows Server, Active Directory, Entra ID, VMware/Hyper-V, and Azure. Manage identity services including Authentication, Federation, and Access Control. Support Windows networking services (DNS, DHCP) and anti-virus solutions. Troubleshoot hybrid infrastructure environments. Create and maintain accurate operational documentation. Administration (20%) Maintain incident records and reporting in ITSM systems. Contribute to SLA reporting and performance metrics. Participate in team/sector meetings and knowledge sharing sessions. Ensure compliance with customer and security policies. Required Skills & Experience 5+ years' experience supporting large hybrid IT environments. 3+ years working with Microsoft Windows Server (all current versions), Active Directory, Entra ID, VMware/Hyper-V. Current SC clearance (or higher) . Strong understanding of ITIL Service Operations (Incident, Request, Problem, Change). Proficiency in PowerShell for automation and reporting. Relevant technical certifications (Microsoft, VMware, Azure). Excellent communication, documentation, and stakeholder management skills. Ability to work under pressure and manage multiple priorities effectively. Desired Skills (Advantageous) In-depth 3rd Line Support experience resolving complex incidents. Modern device management (Intune, Workspace One). Azure PIM & Identity Governance. Microsoft Certifications (eg, AZ-800/801, SC-300). Strong Active Directory deployment, configuration, and troubleshooting experience. Knowledge of enterprise app deployment and conditional access policies. Familiarity with converged technologies (Simplivity, UCS). Personal Attributes Strong analytical and problem-solving skills. Organised, systematic, and detail-oriented. Effective interpersonal skills, with the ability to influence at all levels. Self-driven, adaptable, and collaborative in approach.
3rd Line Analyst (Wintel/AD Specialist) Location: Remote with occasional visits to Derby Contract: 12-month FTC Clearance: Current SC clearance (or above) required - no dual nationality Day Rate: up to £500 - £550 Role Purpose We are seeking an experienced 3rd Line Analyst to join our team, specialising in Wintel and Active Directory environments . You will be responsible for resolving complex technical incidents, supporting a large-scale hybrid infrastructure, and ensuring service excellence across authentication, authorisation, and access services. This role combines business-as-usual support with the opportunity to contribute to solution design and implementation. Key Responsibilities Technical (80%) Respond to incidents and service calls, ensuring SLA targets are consistently met. Deliver higher First Time Fix rates and resolve escalated technical issues. Monitor call queues, liaise with specialist teams, and drive timely resolution. Provide operational support for Microsoft Windows Server, Active Directory, Entra ID, VMware/Hyper-V, and Azure. Manage identity services including Authentication, Federation, and Access Control. Support Windows networking services (DNS, DHCP) and anti-virus solutions. Troubleshoot hybrid infrastructure environments. Create and maintain accurate operational documentation. Administration (20%) Maintain incident records and reporting in ITSM systems. Contribute to SLA reporting and performance metrics. Participate in team/sector meetings and knowledge sharing sessions. Ensure compliance with customer and security policies. Required Skills & Experience 5+ years' experience supporting large hybrid IT environments. 3+ years working with Microsoft Windows Server (all current versions), Active Directory, Entra ID, VMware/Hyper-V. Current SC clearance (or higher) . Strong understanding of ITIL Service Operations (Incident, Request, Problem, Change). Proficiency in PowerShell for automation and reporting. Relevant technical certifications (Microsoft, VMware, Azure). Excellent communication, documentation, and stakeholder management skills. Ability to work under pressure and manage multiple priorities effectively. Desired Skills (Advantageous) In-depth 3rd Line Support experience resolving complex incidents. Modern device management (Intune, Workspace One). Azure PIM & Identity Governance. Microsoft Certifications (eg, AZ-800/801, SC-300). Strong Active Directory deployment, configuration, and troubleshooting experience. Knowledge of enterprise app deployment and conditional access policies. Familiarity with converged technologies (Simplivity, UCS). Personal Attributes Strong analytical and problem-solving skills. Organised, systematic, and detail-oriented. Effective interpersonal skills, with the ability to influence at all levels. Self-driven, adaptable, and collaborative in approach.