Feature #7815

Manage OOM score of frontend processes

Added by Brian Bockelman almost 6 years ago. Updated 3 months ago.

Target version:
Start date:
Due date:
% Done:


Estimated time:




We had a fire in CMS where an increase in the number of jobs in the pool caused an increase in the memory for each worker sub-processes of the frontend.

This pushed the node into swap and invoked the OOM killer. Unfortunately, after a few rounds, we got unlucky and the OOM killer selected the frontend top-level process as a victim.

No frontend process = no glideins = angry users.

Can we set the oom_adj (on newer kernels, oom_score_adj) explicitly for the parent and child processes so the child process is always selected?


#1 Updated by Burt Holzman almost 6 years ago

Is this the right thing to do? Killing the children leads to harder-to-understand behavior (the service is running, but not always requesting glideins since the child keeps dying).
I'd actually think about the opposite: always kill the parent, then it's much more clear what's going on.

Not really convinced either way..

#2 Updated by Parag Mhashilkar almost 6 years ago

  • Assignee set to Marco Mambelli
  • Target version set to v3_3

#3 Updated by Parag Mhashilkar over 5 years ago

  • Priority changed from High to Normal

I agree with Burt, if there are system wide issues, killing the main frontend process is better than randomly selected child.

#4 Updated by Parag Mhashilkar over 5 years ago

  • Target version changed from v3_3 to v3_2_x

#5 Updated by Marco Mambelli over 2 years ago

  • Target version changed from v3_2_x to v3_4_x

#6 Updated by Marco Mambelli about 2 years ago

  • Target version changed from v3_4_x to v3_5_x

#7 Updated by Marco Mambelli about 1 year ago

  • Target version changed from v3_5_x to v3_6_x

#8 Updated by Marco Mascheroni 3 months ago

  • Status changed from New to Closed

Also available in: Atom PDF