ifw-core  5.0.0-pre2
Functions
stooUtils.nomad Namespace Reference

Functions

def is_running (nomad_client, job_id)
 
def load_job (job_file)
 This procedures loads a nomad job file. More...
 
def start_job (nomad_client, job_file)
 
def wait_until_healthy (nomad_client, job_id)
 This procedure is reused from ETR (Calle's code) More...
 

Function Documentation

◆ is_running()

def stooUtils.nomad.is_running (   nomad_client,
  job_id 
)

◆ load_job()

def stooUtils.nomad.load_job (   job_file)

This procedures loads a nomad job file.

Parameters
job_file:return:

◆ start_job()

def stooUtils.nomad.start_job (   nomad_client,
  job_file 
)

◆ wait_until_healthy()

def stooUtils.nomad.wait_until_healthy (   nomad_client,
  job_id 
)

This procedure is reused from ETR (Calle's code)

Parameters
nomad_clientnomad client
job_idjob id :return:

Running job does not mean it is healthy. When running, allocations may still be under way, or failing. Allocation statuses:

- Queued
- Running
- Starting
- Failed
- Complete
- Lost

Allocations can be considered "run attempts". So it does not mean that the numbers will add up to the number of tasks.

Starting means that an allocation is under way. This seems to be true for the duration of retry attempts. Failed means that an allocation has failed (restart attempts exceeded). The task group might still end up in running though due to rescheduling, so this number cannot be relied upon to figure out health. Running means that a task allocation is running (but not necessarily healthy). Complete, dunno Starting, dunno. Queued, dunno. Lost, dunno.

@param Services

For services Failed can increment indefinitely depending on the limitations on restart and reschedule stanzas.

Parameters
QHow do I determine when job has been deployed fully?
AHealthy job deployment at this point means to have as many Running tasks as the sum of all task group counts (to be verified). Note that a task may be Running for a short period before exiting causing false positives.
QHow do I know when to give up waiting for Nomad job to be healthy?
AThere is no obvious way to know when to give up except monitor monitor the allocations.
   If an allocation is `dead` that means that Nomad has given up on it for a given scheduling.
   Now nomad may try to reschedule it at which point it will create a new allocation which
   may or may not fail again.
QHow do I know if Nomad have given up rescheduling an allocation?
AThere seems to be no way to see this easily.