By Do IT Now

HPC support and maintenance

Introduction

Do IT Now offers a complete portfolio of services for HPC clusters, on-premise, hybrid and in the cloud. We deliver a true end-to-end support experience for HPC environments. Our aim is to handle all support needs, so that our customers can get back up and running as soon as possible when a problem arises.

Both in HPC and High Availability (HA) clusters, state of the art technical support is a critical component to get the best ROI from your HPC infrastructure. Do IT Now professionals can leverage 30 years of experience in management and support of HPC systems and users. This knowledge enables us to provide the best user support, give advice and solve any technical difficulties that may arise during the use of many types of software applications run by a supercomputer.

Our technical support skills include, among others*

Batch schedulerSlurm, LSF, PBS, Torque, SGE, Moab
Cluster managerBright Computing, xCat, Rocks, Warewulf
Parallel file systemBeeGFS, Lustre, GPFS, HDFS, Ceph
User environmentsUser Libraries, Modules, EasyBuild, Spack, NVIDIA Toolkit
Development toolsCompilers: GNU, Intel®, PGI, IBMXL compilers;
Debuggers and profilers: V-Tune, DDT, GDB
Monitoring & alert toolsGanglia, Nagios, Icinga, Grafana, Elastic Search, Zabbix
VirtualizationOpenNebula, OpenStack, VMware, Xen-Source
ContainersSingularity, Docker, Docker Swarm, LXD
Remote visualizationTurboVNC, VirtualGL, Websocket, DCV, X2Go
HPC portalEnginFrame, Open Ondemand
HPC data scienceAnalyze-IT, Predict-IT, CloudShaper powered by OKA
Scientific and engineering applicationsMore tham 100 references. Contact us to learn more.

*All trademarks, logos and brand names are property of their respective owners.

Levels of Support

1st Level Support

1st level support is provided by Do IT Now staff with general system administration competences. 1st Level support includes:

  • Problem definition; problem replication; expected behaviour prediction.
  • Customer hardware and software setup description/analysis.
  • Information gathering.
  • Problem solving based on similar cases.

2nd Level Support

2nd level support is provided by Do IT Now senior staff. Our Senior team has a deep understanding of HPC software, tools. 2nd level of support includes:

  • Problem and root cause analysis.
  • Hardware/software checks.
  • Problem solving (where possible).
  • Problem analysis and system setup definition to replicate the problem before escalating to a higher support level.

3rd Level Support

3rd level support is provided by Do IT Now in conjunction with the ISVs. Our Senior team has extensive understanding of HPC software and will prioritize incoming support tickets depending on issue severity. 3rd level support includes:

  • Problem and root cause analysis.
  • Infrastructure check.
    Patch installation with early update releases (only if it is available in the open-source community or from the ISV – with a valid support contract).
  • System optimization and tuning.
  • Escalation to third parties (ISVs and/or Cloud Vendor).

Share

Read more technical papers