Hardening available cores auto detection
Auto detection of available cores did not work correctly.
Brian suggested to test submitting glideins to Tusker, Crane and other sites to troubleshoot and verify the detection.
Here the email from Brian after the default auto detection caused problems:
There were a few problematic EU hosts. If you want to debug, try submitting pilots to Tusker and Crane at Nebraska - both were affected by incorrect auto-detection. I believe you should be able to submit pilots there? I think condor is already auto detecting all the memory it has available. Do you mean, when running inside a HTCondor batch system, the pilot is detecting the memory allocated to it by the HTCondor batch system (good!)? Or do you mean it is literally detecting all available host memory (bad!)? I do not understand well the second point. Auto detection is not affecting the request to the host system. That is affected through the RSL or condor attributes. The glidein is receiving anyway many cores that would not be used Is your point that since the slots requested (in the factory RSL/condor attributes) have to many cores compared to the memory available and the requirements of the jobs, then the default equal splitting for condor static slots creates all unusable slots (instead of having at least few usable ones)? We recommend partitionable slots and would like to know if there are drawbacks against them. Well, the p-slots should also know how much memory they should utilize. Should we change something in the splitting in static slots? We could add a minimum memory requirement and split using the min between the available cores and available memory slots. Any suggestion? I would suggest looking at $_CONDOR_MACHINE_AD and, when in "auto-detect" mode, set the memory used by glidein to be equal to the Memory allocated in $_CONDOR_MACHINE_AD. Long-term, I would _love_ to allocate CMS the entire host at Nebraska to CMS, as we have 2.0-5.0 GB RAM / core. As workflows diversify, I believe providing access to the the large-memory-per-core hosts will be a significant service to CMS. However, we currently have to setup a new entry point per hardware type - not scalable at all!