Project

General

Profile

Feature #25549

Add support for site-specific allocations

Added by Marco Mambelli about 2 months ago. Updated about 2 months ago.

Status:
New
Priority:
Normal
Assignee:
-
Category:
-
Target version:
-
Start date:
02/23/2021
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

There is the need to support a new attribute with a string to specify the allocation used on an Entry
1. Job attribute ProjectName is overloaded and cannot be shared between XSEDE and OSG, a new one is needed. Let's leave ProjectName for OSG accounting
2. Support for multiple allocations might be needed. (I say might cause AFAIK we do not have that use case right now).
3. Guarantee that pilots in user’s allocations will only start jobs from the users of that allocation. Without hardcoding the user allocation in the START expression.

Full email from Mats

Marco,

During the AHM, we will have a session about using XSEDE allocation from OSG submit points. We had a call yesterday to discuss the state of things, and identified some issues I wanted to provide back to you.

We are talking about single-PI allocations here - that is smaller users than the big project we have supported so far. In an ideal world, the PI would get an allocation from XSEDE and then be able to submit jobs with some attribute that will make the frontend/factory submit pilots under that allocation. It would be great if this happened with just generic configs in the frontend, factory and hostedce components.

Currently, we can get halfway there with:

https://indico.fnal.gov/event/10571/contributions/3618/attachments/2432/2907/LigoOnStampede.pdf

https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_glideinWMS_glideinwms_blob_fdb898406e81937c8c9d8484cbac50378119bf88_creation_lib_cgWCreate.py-23L215&d=DwIDaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EF06-Wh4L9CNLgD8bnIjNQ&m=2r4Yu4SsLFcbsVuLsUg0C8nDXsnDZ3ASJcjGjbDB0Z0&s=s7mCLORSwVGZaQ0uRc9Py0Z547mSs8jHMAyEh4XlitU&e= 
This assumes that the OSG ProjectName is the same as the XSEDE project name and that is the one to use on the XSEDE site in question. Turns out, these do not always line up. For example, my values would be:

OSG ProjectName: OSG-Staff
XSEDE allocation: TG-DDM160003
SDSC allocation name: nca118

So in this case, the OSG submit host would enforce +ProjectName="OSG-Staff" but SDSC would want to see nca118 in the Slurm submit script.

So my first request would be to not overload the ProjectName attribute for this. ProjectName should be used by OSG accounting, and if we need to specify a target allocation, we should at least have it as a separate attribute such as "AllocationName" or something like that.

But this could also be extended to being able to target multiple sites with different allocations, for example one XSEDE site and NERSC at the same time. One solution could be a mapping like:

+AllocationNames = "SDSC-Expanse:nca118 NERSC-Cori:osg_backfill" 

We can discuss this use case later, but please start tracking this as a feature request.

A second issues is that when submitting these pilots, they need START expressions to only run jobs targeting that allocation (that is, not running other users jobs under this metered allocation). It was not clear if such a feature already exists. Do you know?

So in summary:

1. Job attribute ProjectName is overloaded and cannot be shared between XSEDE and OSG
2. Support for multiple allocations might be needed. (I say might cause AFAIK we do not have that use case right now).
3. Guarantee that pilots in user’s allocations will only start jobs from the users of that allocation. Without hardcoding the user allocation in the START expression.

Thanks,

-- 
Mats Rynge
USC/ISI - Pegasus Team <https://urldefense.proofpoint.com/v2/url?u=https-3A__pegasus.isi.edu&d=DwIDaQ&c=gRgGjJ3BkIsb5y6s49QqsA&r=EF06-Wh4L9CNLgD8bnIjNQ&m=2r4Yu4SsLFcbsVuLsUg0C8nDXsnDZ3ASJcjGjbDB0Z0&s=exdYGcj0ZESSX65lK22OSvlEAwd7Vc_YwDac7nou1ro&e= >

Some notes about implementation ideas:
  • We have a string represented dictionary in singularity_lib.sh that could be used, e.g. "default:ncd118,NERSC-Cori:ncd118c"
  • If there is some processing needed this should be done in the job wrapper. What can be done for non-Singularity jobs? Other possibility to use Job attributes? Doing it directly in HTCondor? Using the map
  • Dynamic start expression, 2 options, can the Job attribute be using a map? Clarify the no hardcoding part
Clarification about 3, start jobs only from the user w/o hardcoding the user in the start expression.
  • How do you plan to submit glideins? I.e. to specify which allocation they use?
    • Will you have different groups in the Frontend?
    • Do you want to use information from the credentials (token, certificate)?
    • Do you want the Frontend to extract information from condor_q and use it in the glidein request? This affects clustering, may be possible to use the query attribute both in the request and the start expression
  • Do you want glideins to start in a generic allocation and then change allocation depending on the one specified in the job?

History

#1 Updated by Marco Mambelli about 2 months ago

  • Description updated (diff)

Also available in: Atom PDF