Project

General

Profile

Weekly Meeting Notes

Jump to the current Weekly Meeting Notes
Jump to the old Weekly Meeting Notes 2016
Jump to the old Weekly Meeting Notes 2017
Jump to the old Weekly Meeting Notes 2018


December 4, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Dave Dykstra, Bruno Coimbra

  • Singularity (Updates from Dave)
    • 3.5.1 will be released tomorrow, 3.5.0 had too many bugs
  • CVMFS
    • CVMFS 2.7 out in testing, released on Tuesday, will have FUSE3 support (able to use fuse-mount)
  • CMS: Stephen Lemmel, email thread, Site is both TIER1 and TIER2, they want to switch on the fly depending on the job (where it is coming from)
  • Releases
    • 3.6.1 in OSG testing
      • ITB factories are 3.6.1
        TODO: Add tests to the testing matrix
    • 3.7 possibly in 1 week
      • No updates on 3.7
      • Dennis for token-auth
      • Marco for Leonardo's log and glidein_startup changes
  • Developers
    • Marco Mascheroni
      • Worked on 2 3.6.2 tickets
      • Fixed condor_q format
      • Fixed javascript for the monitoring page (the fixes on the glideinDowntime caused a different condor_q XML and an exception in the javascrit)
      • Will work next on the handling of held jobs
    • Dennis
      • Being sidetracked on Jobsub issue
      • Working on 3.7 features [#23092]. The frontend is generating and forwarding it decrypted to the factory
    • Bruno
      • Back working on pychirp, will be completely done by the end of the week
      • htchirp pull request accepted
    • Marco Mambelli
      • Upgrading 4 worker nodes of fermicloud121, fixed CVMFS on 025 and 121
      • Frontend sending output to alternate collector
      • Testing 3.6.1.rc1
      • Packaging GlideinMonitor
  • Proposal for code review:
    • Frontend structure and discuss what we want to do for modularize it

TODO: Bring up at the SI meeting about switching to the GWMS Singularity wrapper
TODO: make ticket about shipping it w/ the Glidein
TODO: 3.6.2 before the end of the year. Start moving tickets for 3.6.3
SCHEDULE: Next week email only meeting


November 27, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Bruno Coimbra

  • Releases
    • 3.6.1 relesed
      • Marco Mambelli tested
      • Marco Mascheroni, upgrade ITB dev to 3.6.1.rc1 and testing w/ CMS frontends. Will upgrade to 3.6.1. All OK so far
      • Dennis, tests went fine, will update the testing matrix page
      • check systemctl journals
    • 3.7 possibly in 1 week
      • Dennis for token-auth
      • Marco for Leonardo's log and glidein_startup changes
  • Developers
    • Dennis
      • Working on 3.7 features [#23092]. The frontend is generating and forwarding it decrypted to the factory
    • Marco Mascheroni
      • working on the broken monitoring #203441: Entries in Downtime cause a problem and there is an exception in the factory status now xml code.
    • Marco Mambelli
    • Bruno
      • PyChirp work

TODO: Marco Mmb: add email reminder for this meeting and update participant list
SCHEDULE: CERN closed 12/23 to 1/6


November 20, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Dave Dykstra

  • Singularity (Updates from Dave)
    • 3.5.0 released upstream, OSG found a problem in a regression test, there will be 3.5.1
  • CVMFS
    • CVMFS 2.7 out soon, will have FUSE3 support (able to use fuse-mount)
  • Releases
    • 3.6.1 RC still waiting for RC
      • Marco Mambelli tested
      • Marco Mascheroni, will upgrade ITB dev to 3.6.1.rc1 and test w/ CMS frontends. Will test also a CMS Singularity job.
      • Dennis, regular smoke test on SL6/7 on 3.4 and SL7 on 3.5 completed and fine.
      • Marco Mascheroni,
    • 3.7 possibly in 1 week
      • Dennis for token-auth
      • Marco for Leonardo's log and glidein_startup changes
  • Developers
    • Marco Mascheroni
      • Worked on three 3.6.1 tickets
      • Next thing he'll work is [#23340]: reverse the logic, glideins are killed when not recoverable (MMB: branch off master, will decide whether merge in 3.7 or 3.6)
    • Dennis
      • Working on 3.7 features [#23092]. The frontend is generating and forwarding it decrypted to the factory
    • Marco Mambelli
      • Upgrading 4 worker nodes of fermicloud121, fixed CVMFS on 025 and 121
      • Frontend sending output to alternate collector
      • Testing 3.6.1.rc1
      • Packaging GlideinMonitor

TODO: Marco Mascheroni will give [#23340] for review and Marco Mambelli will release 3.6.1 (final) w/ this included


November 6, 2019

Marco Mambelli, Lorena Lobato, Bruno Coimbra, Dennis Box, Dave Dykstra

  • Singularity
    • 3.4.2 out with some bug fixes
    • 3.5 in pre-release
  • CVMFS
    • CVMFS-exec would allow running on sites w/o singularity or CVMFS
    • 2.7 out soon, will have FUSE3 support (able to use fuse-mount)
  • Releases
    • 3.6.1 RC still waiting for RC
      • Removal of held jobs is OK,
      • Marco Mascheroni would like to include also the not so gentle draining ticket
    • 3.7 in 2 weeks
      • Dennis for token-auth
      • Marco for Leonardo's log and glidein_startup changes
  • Developers
    • Marco Mascheroni
      • Worked on modified Factory Ops scripts
      • Singularity CVMFS restrictions
      • Improvement of draining mechanism
      • CHEP paper presented tomorrow by James
    • Lorena
      • Working on tickets feedback
      • https support
      • knowledge transfer to Bruno, working on notes for us, especially related to tickets she was working on
    • Dennis
      • 23372 Condor 8.8 not working in OSG 3.5. There was also
      • Diving back in token-auth
    • Bruno
      • Working on Frontend and Factory configuration/installation (will complete tomorrow w/ Lorena)
      • pychirp - Found a bug in htchirp (HTCondor python implementation of Chirp), 2 functions working
    • Marco Mambelli
      • Installing new HTCondor CEs on 025 and 121
      • Frontend sending output to alternate collector
      • Feedbacks and troubleshooting (e.g. GWMS w/ HTCondor 8.8 in OSG 3.5)
TODO:
  • Marco Mascheroni received an email about a Lithuanian Intern program for next Spring. Marco Mambelli will send a project proposal
  • Prepare slides for stakeholders meeting

October 30, 2019

Marco Mambelli, Lorena Lobato, Bruno Coimbra, Joe Boyd

  • Releases
    • 3.6.1 RC later today or tomorrow
      • https and Factory tools tickets are the blockers
    • 3.7 also for the end of the month
  • Developers
    • Lorena
      • Clean up tickets assigned, 11/15 will be the last day
      • Will work on Feedback
    • Bruno
      • Working on Frontend and Factory configuration/installation (will complete tomorrow w/ Lorena)
      • 2 tickets, will start working on it.
    • Marco
      • Working on Singularity
      • Troubleshooting singularity problem at UNL and Syracuse
      • Troubleshooting HEPCloud factory monitoring problem
TODO:
  • Prepare slide for stakeholders meeting

October 16, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato, Bruno Coimbra

  • Releases
    • 3.6 in OSG testing
    • 3.6.1 for the end of the month
    • 3.7 also for the end of the month
  • Developers
    • Marco Mascheroni
      • Ticket to remove the CVMFS singularity requirement, testing it
      • Not considering held limits
      • Having the frontend not consider constant parameters (3.7)
      • New ticket from 3.6 update (Jeff's feedback) - high priority
      • Will send CHEP's CRIC slides for feedback
    • Bruno
      • Ramp-up
    • Dennis
      • Token-auth changes - progressing
    • Lorena
      • Clean up tickets assigned
      • Coordinating w/ the new group about her availability
      • https support
      • Factory reload problem
    • Marco
      • Working on roadmap
      • Troubleshooting singularity problem at UNL and Syracuse
      • Troubleshooting HEPCloud factory monitoring problem
  • TODO
    • Send an email to Zach and Todd about condor tickets

October 9, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena

  • Releases
    • 3.6 in OSG development, soon OSG testing
      • There seems to be problems w/ OSG 3.5 and HTCondor 8.8, Frontend and Factory do not communicate correctly
    • 3.6.1 end of October
    • 3.7 end of October
  • Developers
    • Marco Mascheroni
      • Working on Glidein Singularity requirements [#23370]
      • [#23342]
      • To include in 3.6.1 improvements in draining of the sites
    • Lorena
      • Last week in Grace Hopper conference interviewing people
      • https support ticket
      • Frontend reload failing [#23062]
      • Dropping tarball installation [#23185]
      • Feedback for tag file ticket [#23304]
      • Is there anything to change in the CS switch? Not for now
    • Dennis
      • Default condor configuration seems not to work in OSG [#23372] - TODO:sen
      • Reviewing [#22579], https ticket
      • Security default auth [#2531
      • Condor token auth #23092
    • Marco Mambelli
      • revision of 3.4 amd 3.5 tickets, resolution of some and assignment of >3.6 releases
      • Work on tag and history generation
      • Feedback sent for Marco Marcheroni's tickets
      • Started working on Factory monitoring problem
TODO:
  • email to Brian's about hackathon
  • schedule code review
  • email to FIFE stakeholders about monitoring job restarts
  • email about branches and process

October 2, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box

  • Releases
    • 3.6 in OSG development, soon OSG testing
      • There seems to be problems w/ OSG 3.5 and HTCondor 8.8, Frontend and Factory do not communicate correctly
    • 3.6.1 end of October
    • 3.7 in 2 weeks?
      • The token has a string w/ the collector info and this is causing problems w/ secondary collectors
  • Developers
    • Marco Mascheroni
      • Catching up from last week
      • Working on removing the held glideins from the total limit: patched production, being tested []
      • Plan to work on improving Glidein draining []
      • Marco interested in reviewing Leonardo's work
    • Dennis
      • Working on token-auth
      • Review of https token [#22579]
      • Tested 3.6 on OSG 3.5 w/ automatic tests and found some communication problems
      • Jobsub ITB troubleshooting. Problems w/ dev factory (fermicloud062) and the certificate not being accepted (jobs held)
    • Marco Mambelli
      • Working w/ M.Mascheroni and J.Dost on Factory Ops priorities
      • Plan roadmap
      • Singularity having problems w/ LD_LIBRARY_PATH [23350]
      • Work w/ Thomas on glideinmonitor

TODO: Check the status of a service certificate whitelisted w/ most sites


September 25, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Dave Dykstra

  • Dave
    • Singularity 3.4 released. Problem w/ sandbox building. Problem w/ sandbox building. Same problem
    • 3.2.1 is still the best release out
    • The OSG problem was unrelated: Some images did not completely unpack, crashed and were published incompletely. The problem was that the python2 library did not handle extended attributes. They worked around that and will switch to python3
    • CVMS 2.6.3 fixed the problems with /etc/host containing blanks
    • Unprivileged CVMFS
      • Requires Fuse mounts
      • Unprivileged user spaces (had to be enabled by the admin) - to put it in /cvmfs
      • Singularity is not relocatable, has hardcoded paths, so the one in /cvmfs cannot be used if you mount cvmfs in a different place
      • You need to manage also unmounting in EL7
      • With EL8 when you remove the user namespace also the mounts are removed, so it is safer
  • Releases
    • 3.4.6 in OSG
    • 3.6 will be release
      • OSG ITB will be upgraded to 3.6
      • Internal testing of 3.6
      • Reminder
        • Condor 8.8 in OSG 3.4 does support CREAM
        • Condor 8.8 in OSG 3.5 does not support CREAM
        • The only HTCondor supporting tokens is > 8.9.2
        • Will be problematic support both sci-token and CREAM
  • Developers
    • Lorena
      • Blackhole
      • Https
      • Onboarding w/ production
      • Deleting the RSA key in workdir fixed
    • Dennis
      • Documented a way to use token auth w/ 3.4.6 HTCondor 8.9.2 in Frontend, in the tarball, and user collector. W/ some configuration things are working [#23092]
      • Will test RC
    • Leonardo
      • Token system works, pass the token from the Factory to the Glidein. Will be possible also
      • Plan to finish shellcheck work and do some code fix/test
    • Marco Mambelli
      • Release 3.5,1
    • Marco Mascheroni
      • Plan to work on improving Glidein draining []
      • Draining of constant parameter [#23052]

September 4, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Dave Dykstra

  • Dave
    • XSEDE people interested in CVMFS and installing it natively
    • Small set of scripts to mount CVMFS on systems that don't have it, using FUSE mounts and user-namespaces (EL7 mounts have to happen outside unprivileged user namespace so it's tricky). Will allow custom CVMFS changing depending on the VO requirements (2 processes per repository, plus 2 for the cache)
    • Singularity 3.4.0 in testing in EPEL and OSG build
    • New option in Frontier squid, automatic register with web-proxy autodiscovery (these could be dynamically provisioned)
  • Releases
    • 3.4.6 in osg-testing
      • ITB Factory, Edgar tested on a Frontend.
    • 3.5.1 later today
      • Lorena tested SL6 and SL7 w/ 3.5.1_rc1
      • Marco Mambelli tested SL6
  • Developers
    • Marco Mascheroni
      • Documentation of manual_glidein_startup
      • Other issues
      • Travel plans 24-26 at Fermilab, Jeff will be here as well. Operations discussions
    • Dennis
      • Restarted working on token-auth, building a new site. Working on getting the startd on the CE talking to the collector
      • Trouble testing 3.5.1_rc1, no more in the repo
    • Lorena
      • Troubleshooting, testing 3.5.1_rc1
      • Policy for ITB frontend
      • Splitting the blackhole ticket letting out logging
    • Leonardo
      • Finish the exporting of glidien utility functions, including in glidien_startup.sh
      • Script that runs shellcheck on the branch
      • Security mechanism for glidien logging: https server and JWT system to authenticate the messages
    • Marco Mambelli
      • Discussion on policy
      • Feedback
      • Work w/ Thomas on glidienmonitoring
      • Test 3.5.1_rc1
      • Presented at the DUNE computing meeting
  • Stakeholders meeting is next week, please send slides for feedback
    • Feedback provided to Lorena

September 4, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Antonio Pérez-Calero

  • Releases
    • 3.4.6 out
      • OSG would like a production/ITB Frontend upgraded before promoting to production
    • 3.5.1 later today
  • Developers
    • Marco Mascheroni
      • Feedback from FactoryOps: every Monday there is a process ran to do a defrag (huge IO, sometimes unresponsive). The RSA key got corrupted and the operator could not manage to restart the Factory and there were no messages
    • Dennis
      • working on using token-auth, will update base release to 3.5
    • Lorena
      • Was on shift
      • Testing 3.5.1
      • Blackhole detection
      • Testing a ticket when there are no entries
      • Could be nice for if we can have some policy to notify what we are touching on machines
      • Worked on links that were not working
      • Requested access to the new ITB Factory
    • Leonardo
      • Refactoring of glidien_startup.sh, clean form shellcheck errors and warnings
      • Mechanism to share functions between scripts without including them in heredoc segments (disables syntax highlighting but still allows to be all in one file). Working on a way to make it more automatic
    • Marco Mambelli

Person on shift until next meeting: Dennis


August 28, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Joe Boyd, Dave Dykstra

  • Dave Dykstra
    • Singularity 3.4 RC1 out, includes fuse mount option fo mount CVMS inside singularity privileged containers.
    • Would like to have all sites w/ unprivileged singularity: they could uninstall singularity if unprivileged namespaces is enabled and have cvmfs
    • singularity is in go and compiled and distributed as a single binary: compiled in el6, runs fine in el6 el7
    • CVMFS 2.6.3 will be out soon (2.6.2 has a bug, goes into a loop if a line in /etc/host has an infinite loop)
  • TODO: Marco will check w/ Dave that the default path for singularity in GWMS is correct
  • Releases
    • 3.4.6 out
      • OSG ITB factory is 3.4.6, used w/ production frontend 3.4.5
      • OSG would like a production/ITB Frontend upgraded before promoting to production
    • 3.5.1 RC for later today
      • Moved the last tickets that will not be completed today
      • Marco Mascheroni will continue to test the condor incompatibility [#22245]
      • Tickets in Feedback will be closed
  • Shift
    • Testing of Singularity w/ GPUs, service-now tickets w/ Alex Himmel
    • Fermilab Frontend singularity script with CRLF
  • Developers
    • Marco Mascheroni
      • testing condor compatibility issues
      • ticket to check if the Factory. fact_chown_check returns 0 if all is alright, >0 if problems, it checks if the current user is the owner of logs and condor directory. Called by initd script, it is possible to disable
    • Leonardo
      • Working on new logging mechanism for glideins: more robust scripts, more metadata
      • Found and reported a bug in unit tests
      • Refactoring code of glidein_startup.sh
    • Dennis
      • working on using token-auth (reading documentation and source code)
      • help Lorena figuring out the errors w/ unit tests for python bindings (in RH6 they do not run)
        **Marco
      • Working ticket to publish Singularity mode [#22875]
      • Added editorconfig
    • Lorena
      • Continue the blackhole detections focus on logs and testing Checked with Kevin about monitoring for blackhole detection
      • Finished updating all the obsolete links to HTCondor manual in GlideinWMS website
      • Fixed CI errors related to pylint for review20190820 and discussed and investigated with Dennis possible issues and unittests with python bindings. Updated #22846 and followed up with him #23176: Unit test failures after changing to python htcondor bindings. Created code review meeting notes section in the GlideinWMS wiki
      • Detected several problems with “with statement” and some of the functions when reviewing #22470:, mostly for tarballs and obsolete libraries. It was fixed for sl7 but keeps complaining about sl6 since we have python 2.6 there and some of the with statement compatibilities were introduced in 2.7. Reverted changes back and added TODO comment about modernising it when we finally get rid of SL6. Created Support #23166: Apply the TODOs from ##22470 related to “with statement” compatibility.
      • Kept working and investing with FIFE and GlideinWMS team about ITB Frontend workloads over singularity and problems with DUNE submissions to Factory production
      • Had discussions with TJ and Greg from HTCondor team about STARTD stats and logs
      • Had Discussions with Krista and Factory operators about IT Operations and the new ITB Factory
  • TODO Next:
    • Marco Mmb: add page w/ shift tasks

Person on shift until next meeting: Lorena


August 21, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Jeff Dost, Joe Boyd, James Letts, Frank Wuerthwein

  • Special topic discussion:
    1. Customizable start expressions and mechanisms to affect jobs matching: the start expression normally comes from the frontend, what else is desirable and sound? Factory attributes, Site or node attributes (Environment variables, Files a the node, ...)?
    2. (If there is no time we'll postpone this) Would be OK to publish the Glidein Logs? Should access be restricted?
  • Discussion details on ST190821_Glidein_custom_start_and_Log_publishing
  • Summary:
    1. customizable start expressions are used by CMS via GLIDEIN_Custom_Start. Sites and the site description in CVMS can affect the jobs matching at sites. This mechanism is hidden from Factories and frontends and can bring inconsistencies. Resource characteristics can be better expressed via queues and attributes. This is a faster mechanism to solve emergencies (network problems, jobs definition errors, ...). Workflow management may not be capable to deal with the increasingly specialized resources, need to schedule on a specific resource instead of specifying the job requirements. Worry about the proliferation of Frontend groups and Factory entries.
    2. Publishing Glidein Logs would be extremely useful, especially for ITB. There are requirements for data coming from servers in the EU, GDPR (PII data should be scrubbed, username, DN, IP). VO can already rsync from the Factory especially the new ITB at UCSD. It is not public, requests are evaluated individually and there are restrictions on exporting
  • Developers
    • Lorena
      • working on Blackhole detection ticket
      • ITB frontend
      • updated all links to all HTCondor manual Cut 4.3.6 RC1, testing compatibility test
    • Dennis
      • ticket to review
      • token auth
      • jobsub-gwms-singularity integration
    • Leonardo
      • improved logging mechanism for glideins
      • will extend the metadata
    • Marco Mambelli
      • jobs/singularity testing
      • installed production factory replica on fermicloud062
  • Person on shift last week and until next meeting: Marco Mambelli
  • TODO:
    • Another special topic meeting in 2 weeks. Invite Frank, Sakib, James, Antonio, Joe B.

August 14, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Jeff Dost

  • Releases
    • 3.4.6 out
      • 3.4.6 RC1 testing was confusing. A problem came out during Lorena's testing. Instructions and a note in the factory install document would have solved it. Marco Mmb was out and Dennis and Marco Msc thought as well there was a new bug in the release, it stopped the testing and kept people busy for a whole day.
      • This highlights problems both in our testing procedure (we tend to follow our notes and not go back to the instructions) and in the documentation that is confusing (too many documents, information in different places). We should act on both.
    • 3.5.1 end of the month
      • Please move the tickets that require more than 2-3 weeks
      • Blocker is having all is needed for a single user factory (including reliable documentation)
  • Guests
    • Jeff Dost:
      • working on a new ITB Factory

Frontend and Factory configuration are more complex.

  • Developers
    • Marco Mascheroni
      • Doing tests of the release candidate
      • Continue the work for attributes
      • Constant attributes not published in the frontend (will do a regression test or git-blame to understand whether was introduced recently or was the regular behavior)
    • Lorena
      • Cut 4.3.6 RC1, testing compatibility test
      • Blackhole detection
      • Testing singularity on ITB frontend
    • Dennis
      • jobsub server hooked up to ITB frontend
      • merged the DN with commas ticket
    • Leonardo
      • Working on new logging mechanism for glideins
      • 2 functions, to log shards and
    • Marco Mambelli
      • singularity testing
      • website paperwork
  • TODO Next:
    • Marco Mmb explained the email he sent:
      • reviewed the 3 development streams (branch_v3_4, master, branch_v3_7)
      • proposal for shifts for monitoring the mailing lists and GWMS requests; will prepare a ShiftChecklist document, will learn as we go
    • Marco Mmb will send an email about special topic discussion
    • Lorena will write a document about what went wrong in the testing of 3.4.6 RC1

August 7, 2019

Dave Dykstra, Marco Mambelli, Marco Mascheroni, Dennis Box, Lorena Lobato Pardavila, Leonardo Lai, Jeff Dost

  • Dave Dykstra
    • Singularity fuse support moving forward
    • CVMFS plans to have 2.7 with only some of the features to get fuse support and other completed features, timed w/ centos8
    • epel8 is out.
    • Discussion about uniform behavior wrt Singularity for jobs w and w/o glideins. Could be interesting to separate functionalities in glideins (testing+setup+singul_invocation vs starting condor)
      Glow and HCC may do direct submission as well. And could be interested in this separate tests-setup functionality. Would bring uniformity and consistency
  • Releases
    • Branch off the correct branch depending on the release! Marco Mmb will send an email about 3 development streams
    • 3.4.6 will go out ASAP, probably a RC later today
      • Lorena will rebase and merge
      • Dennis will merge his ticket and work on the documentation, additional checks
      • Marco Msc branched off master, will have to rebase his branches to be able to merge in branch_v3_4
      • Marco Mmb will state the incompatibility policy in the GWMS documentation [#23080]
    • 3.5.1 end of the month
      • Please move the tickets that require more than 2-3 weeks
      • Blocker is having all is needed for a single user factory (including reliable documentation)
      • Marco Msc will open a ticket about steps to complete migration: Adding a check that the script to modify the user was run
      • 3.5 has been already tested in ITB factory at CERN. The Migration procedure is documented
  • Guests
    • Jeff Dost:
      • all Singularity patches seem to have fixed the issues
      • Question about GLIDEIN_SINGULARITY_REQUIRE: REQUIRE vs REQUIRE_GWMS. Marco Mmb explained and suggested to use REQUIRE not to interfere w/ custom scripts
  • Developers
    • Marco Mascheroni
      • Was in vacation
    • Lorena
      • testing version incompatibility got the string/boolean ticket (3.4.5, 3.4.2)
      • fixing ITB frontend: condor submission is working
      • meeting to review her tickets
      • working on blackhole detection
    • Dennis
      • jobsub server hooked up to ITB frontend
      • merged the DN with commas ticket
    • Leonardo
      • Frontend+Factory installed
      • Started working on his project: more reliable mechanism for logging the glideins information
    • Marco Mambelli
      • Work on Singularity tickets (fixed, provided patched for Factories and Frontends)
  • TODO Next:
    • Marco Mmb will send an email about special topic discussion

July 31, 2019

Dave Dykstra, Marco Mambelli, Dennis Box, Lorena Lobato Pardavila, Lorenzo Lai, Jeff Dost

  • Dave Dykstra
    • Tie in info about job waiting for new sw published in CVMFS. How it is done, how to notify about it.
    • singularity 3.3.0 released, the main new feature is fake-root. Unprivileged users can see a root-like shell. Can work also w/ no setuid Singularity but will use shadow-utils RPM (setuid tools). Sysadmin have to add the permissions. Imitates a feature in podman.
    • Some RHEL8 features (podman, shadow-utils, unprivileged fuse) may come in RHEL7.7
    • CVMFS 2.6 soon out
  • Releases
    • There will be today or tomorrow Singularity patches for the Factory and Frontend (changes that will go in 3.4.6)
    • 3.4.6 will be out soon
      • Blockers: compatibility across all 3.4.x, certificates w/ comma, Fixing Singularity
    • 3.5.1 mid-August
      • Move the tickets that do not fit w/ the timeline
  • Guests
    • Jeff
      • Marco Mascheroni and Jeff @ Quilt conference 9/23 in Minneapolis. Possible visit to Fermilab before or after
  • Developers
    • Dennis
      • working on #22779 not escaped comma in the GSI daemon name
    • Lorena
      • back from vacation, catching up, continue the work w/ black-hole ticket
      • will focus on the 2 variables (compatibility across 3.4.x)
    • Fernando
      • mockup of configuration in different formats: Python and YAML
    • Marco Mambelli
      • Troubleshooting and working w/ Summer interns
      • Work on the Singularity tickets

July 17, 2019

Dave Dykstra, Marco Mambelli, Marco Mascheroni, Dennis Box, Kiana Mohammadian , Lorena Lobato Pardavila,

  • Dave Dykstra
    • Singularity re-doing root capability, using newuid and newgid, using standard RH tools to privilege escalation (using sudo config files). Will be more standard. Will be in 3.3
    • Singularity will slow down new features and work more on bug fixing (point releases)
    • WLCG container group pushing for unprivileged Singularity off CVMFS
      • Marco will talk to Mats about changes in the OSG Singularity script (or is all already in the GWMS script)
  • Releases
    • Mascheroni checked and the Frontends connected are all 3.4.5 except the CMS ones.
    • Marco Mambelli will send an email to Factory ops
    • 3.4.6 end of July
      • Dennis to test w/ Mascheroni and Edgar
      • Marco working on Singularity
    • 3.5.1 mid-August
      • Move the tickets that do not fit w/ the timeline
  • Developers
    • Marco Mascheroni
      • Added the tickets he's been working on last week
    • Lorena
      • Proposing to review code improvement in the next code review
      • Follow up w/ Ken
      • Blackhole detection ticket, testing it
    • Kiana
      • Moved the files on a text editor. Working on translating/editing documents
    • Dennis
      • No other
    • Marco Mambelli
      • Troubleshooting and working w/ Summer interns
      • Limited work on the Singularity ticket

July 3, 2019

Marco Mambelli, Lorena Lobato Pardavila, Marco Mascheroni, Dennis Box, Javier Rodriguez, Kiana Mohammadian, James Letts, Jeff Dost

  • Vacation plans - no long vacations planned, OK for releases
    • Dennis - 1 week towards the end of July
    • Lorena - 10 days towards the end of July
    • Marco Mascheroni - 7/29 8/2
  • Release status
    • 3.5 in OSG upcoming-testing
    • 3.4.6 planned in 1 month, branch off branch_v3_4
    • 3.5.1, mid August, Everyone moves the tickets that you think will not fit in the timeline
    • 3.7 for python3
  • Developers
    • Mascheroni
      • Manual submit glidein
      • Started looking at constant attributes
    • Lorena
      • Blackhole ticket
      • Working w/ Kiana - working on new website
      • other minor tickets
      • w/ FIFE team, set new ITB Frontend
    • Kiana
      • working on website
      • evaluating if Jakyll could be
      • Mascheroni: would be nice if the documentation about attributes could be also machine parsable, so that could be used in CRIG
    • Javier
      • python classes and GlideinWMS
      • Getting familiar wirh VMs GitHub and installing Factory and FE
    • Mambelli
      • Work w/ TARGET program
      • Mentor students
      • Multi-glidein ticket
      • Singularity ticket
  • James
    • Raised a problem w/ the worker nodes starting draining to soon . opened [#22867]
  • Jeff
    • HTCondor python bindings are already installed at the Factories, OK to rely on them
    • There is a problem selecting the correct operating system for the python tar balls. They used to send SL6 for all nodes, but there are some problems on SL7 systems (the condor transfer plugins break). The option AUTO for CONDOR_OS, to select the correct OS and corresponding tar ball has different keywords from the ones used in the config file. So the correct tar ball is not selected. A workaround may be to use a list of OSes in the configuration (like the example below). Jeff will try. Marco will open a ticket [#22868]

June 26, 2019

Marco Mambelli, Lorena Lobato Pardavila, Marco Mascheroni, Dave Dykstra, Javier Rodriguez, Kiana Mohammadian

  • Dave Dykstra
    • Unprivileged Singularity 3.2.1 fails to mount read only file systems. Will be fixed in 3.3.0. 2.6 is OK
    • Officially recommended is Singularity 3
    • Pull-req for Singularity 3.4 is being developed to be able to support a plug-in to mount fuse (this would allow CVMFS to run inside the container). RH8 allows fuse mount in a namespace (you'd be able to do it w/ a workaround)
    • Request to install by default unprivileged in RH8 (would require to add squash-fuse and fuse-overlay-fs). Essential to compete w/ Podman, included in RH8
    • There is a Monthly Singularity developers' call. Email Dave to be added to the calendar: https://groups.google.com/a/lbl.gov/forum/#!topic/singularity/-_PVDIV9cXk
  • Release status
    • 3.5 in OSG upcoming-testing
  • Developers
    • Lorena
      • Blackhole ticket
      • Working w/ Kiana
    • Kiana
      • Static site generators
    • Javier
      • Getting familiar w/ terminal
      • Working on Python classes
    • Mascheroni
      • Manual submission improvements. Using a different frontend group works. Some pylot were removed if held reason was 1. Asking FE name instead of the security name.
      • Log files exposure. There is already a script exporting files to GRACC. Edgar, Mascheroni, Jeff, Mats, Javier, Thomas
    • Mambelli
      • Work w/ TARGET
      • Singularity ticket
  • Stakeholders meeting in 2 weeks, prepare presentations by next week

June 19, 2019

Marco Mambelli, Lorena Lobato Pardavila, Marco Mascheroni, Dennis Box, Javier Rodriguez, Kiana Mohammadian

  • Release status
    • 3.5 in OSG upcoming-development (approved for testing)
    • Sorting tickets for 3.5.1, in about one month
    • Considering 3.4.6
  • All developers are in favor of dropping tarball installation
  • Developers
    • Dennis
      • Working on CI [#22483] and unit test ticket. Has been very busy w/ Jobsub, should be more available
      • Will work on the DN ticket [#22779]
    • Lorena
      • Preparing and doing the presentations for Summer Students
      • Working on back hole ticket
    • Marco Mascheroni
      • Working on attributes/parameters (Marco Mambelli will send email)
      • Automatic config generation [#20799]
      • CMS testing a way to customize the pilot start expression at the site level: GLIDEIN_CUSTOM_START (will create a ticket)
      • Use absolute imports: from future import absolute_import to maintain consistency [#22437]
      • Factories DN updates [#19744]
    • Marco Mambelli
      • Troubleshooting for [#22779] and Attributes/parameters
      • Getting summer interns started, clarifying projects
      • Singularity started by HTCondor
    • Kiana
      • Familiarizing w/ GlideinWMS
    • Javier
      • Working mainly on the Python classes for TARGET
  • GWMS group code review will be 6/25 9:30-12:30. Summer interns are invited

June 3, 2019

Marco Mambelli, Lorena Lobato Pardavila, Marco Mascheroni, Dennis Box

  • Release status
    • RC works fine
    • Merge tickets in feedback and release RC2
    • Dennis tests work fine
    • Ready for release
  • Developers
    • Mascheroni
      • Work on manual _submit_glidein
      • If a factory attribute is not constant, then it is not published in the Frontend, then it cannot be used in the Frontend
    • Lorena
      • Focusing on the presentations for Summer Students
      • 3.5.1 tickets, mainly Black-hole and Blacklist
    • Dennis
      • Fixed smoke test handling
    • Mambelli
      • Test 3.5
      • Prepare for Summer interns
        • Migrate documentation
        • Glidein monitoring and troubleshooting

May 29, 2019

Marco Mambelli, Parag Mhashilkar, Marco Mascheroni, Dennis Box

  • Release status
    • RC was cut last week. Mambelli testing and running into few auth problems. Think its his issue
    • Dennis ran into problems too but he thinks it is his scripts
    • Worked for Mascheroni with htcondor 8.6. Did tests for script changing owner across the board. First test was not successful when ran it with htcondor still running as owner was still old on. After stopping condor, ran script and did condor release and everything is fine. This should be well documented.
  • Mambelli
    • at htcondor week had discussions with Greg and TJ and gave suggestions on how to go ahead with Singularity. and condor ssh to job should be working but it is not. Did test with root installed condor but with glideins condor ssh to job doesn't work when condor is no started started
    • htcondor support custom user defined map/dict structures

May 22, 2019

Marco Mambelli, Parag Mhashilkar, Lorena Lobato, Marco Mascheroni, Dave Dykstra, Dennis Box

  • Release status
    • pylint failures since version was changed. Now we catch more SL7 errors. Need to confirm if its not related to the pylint version
    • Lorena and Marco test single user factory. Marco found one issue with permissions which was fixed with changes to spec file. Also testing if we can drop OS users on which frontend since we are moving to single factory user. Checking with htcondor users on how to do it without different OS ids
    • Mambelli will cut a release candidate later today
  • Dave Dykstra
    • Having several discussions related to singularity with Mambelli.
    • Mambelli: Everything works fine with system installed condor and tarball installed condor. Will need condor 8.8.2 for pilot

May 15, 2019

Marco Mambelli, Dennis Box, Parag Mhashilkar, Lorena Lobato, Dave Dykstra

  • Singularity
    • Security release announced yesterday. Building it. Released 3.2 that has major changes. Building a patched version 3.1.1.1-1 Few things should have been in epel testing which were not there. But now with this release it has those changes. Impacts unprivileged mode.
    • Will go in osg 3.2
    • Next will put singularity 3.2 in production osg
  • v3.5 Release Status
    • working on singularity tickets and
    • Need to wrap up.
    • Parag: Need to get release out right away to give users chance to try them out.
    • Mambelli: HTCondor with no switchboard support will not go into OSG production until June or so because of the delay
    • Transition of file and job ownerships in factory for switchboard changes. HTCondor team helped with the migration scripts and steps that are needed
  • Developers
    • Mascheroni
      • Working on testing script
    • Lorena
      • Get everything done for blackhole detection
      • Working with Diego and fixed periodic script
    • Dennis
      • Mostly on vacation
      • While testing found small bugs
    • Mambelli
      • Working on feedback tickets and assigned them. Submission of singularity jobs in HTCondor
      • Working on coordinating planning for summer students
      • Thomas, 2 target students, 1 quark net student and Italian student in Aug-Sep

May 01, 2019

Marco Mambelli, Dennis Box, Parag Mhashilkar, Lorena Lobato, Dave Dykstra

  • Singularity
    • 3.2 in rc is ready should be released any time
    • singularity dev is working on fuse command option that Dave proposed which should work cvmfs provided it is linked with fuse3lib
    • working on fuse3 in epel. submitted pull request. got permission from fuse3 dev and gave permissions after several days.
    • singularity wrapper. cms is in process of discussions and switching to glideinwms provided wrapper instead of using their own.
  • Action items
    • Marco sent email to Egdar about students but the email thread died after that
    • Roadmap in Wiki
    • Working on moving artifacts to gitlab free account.
  • Stakeholder slides
    • Going through the slides

April 24, 2019

Marco Mambelli, Dennis Box, Parag Mhashilkar, Lorena Lobato, Marco Mascheroni

  • Release Status
    • 3.4.5 is out and Diego tested and will be in next OSG release 3.4.28. currently in OSG testing. We still support SL6.
    • Working on 3.5. Current list is long. Need to trim once single user is tested.
  • Marco Mascheroni
    • Nothing to report
  • Lorena Lobato
    • Talking with Krista on periodic scripts
    • Testing 3.4.5
  • Dennis
    • Not many cycles last week, closed
  • Marco Mambelli
    • Mainly on condor and singularity
  • Next week we need developers slides for stakeholders meeting

April 03, 2019

Marco Mambelli, Dennis Box, Dave Dykstra, Marco Mascheroni

  • Singularity report from Dave
    • Singularity 3.1.1 fully released in OSG-upcoming and epel testing, epel in 2 weeks
    • Singularity core team will add fuse3
    • Fuse3 will be supported in epel soon probably
    • From Dirk: @ TACC their worker nodes are running RH7 and allow fuse mounts. This means that CVMFS could be mounted as an unprivileged user. Mounted in a directory where you have write access and then bind mount in the right place when starting Singularity. Unprivileged namespaces would make it easier: we could start an unprivileged namespace, start CVMFS inside it and then run Singularity. Dave and Dirk will check if they can change the kernel option to enable that
  • Release:
    • GWMS 3.4.5 RC1 has been released yesterday. Marco Mambelli's moke tests are OK (SL6 and SL7 upgrades). Dennis will start his automated tests.
  • Developers
    • Marco Mascheroni
      • Fix for 3.4.5, boolean comparison more robust
      • Optimization of Frontend code for production; will push it to a branch. Added also the code that that dumps the data. Can be enabled by uncommenting some lines in the code (will add detailed explanation in the comments). Profiler code will be added to the unittest directory. Will add a comment with instruction to factor out the inner function to have detailed profiling but will not integrate that in the code (makes it less legible).
      • Since Krista added a new Frontend with a new DN a script form Diego is not getting correctly the status: system control-status is reporting the frontend as inactive. This may be because of the behavior in SL7 (systemctl instead of system). Marco Mascheroni will investigate
    • Dennis Box
      • Finished 21940, unit testing
      • Will close the testing of incommon certificates ticket
    • Marco Mambelli
      • Released last week 3.4.4 and troubleshoot the problem reported by OSG integration
      • Released and tested 3.4.5 RC1

April 10, 2019

Marco Mambelli, Lorena Lobato, Dennis Box, Parag Mhashilkar, Dave Dykstra, Marco Mascheroni

  • Release Status 3.4.4
    • In OSG testing
    • Edgar tested. Matches are not working and he confirmed that its a 3.4.4 Marco Mascheroni looking into it. Last time it was caused by bool and string matching. Mascheroni tried Edgar's setting in testing and was working fine. Needs more investigation.
  • Developers
    • Mascheroni
      • Disk filing up because of pilot stdout and stderr logging. Does glideinwms cleanup the logs? If frontend is not asking for entry will it get cleaned up for that entry?
        • Mambelli: Not sure about glidein logs.
      • Glidein off issue faced by FIFE. We dont have access to the credentials so cant troubleshoot
    • Dennis
      • Working on #21940. Made progress on it last night
      • #21844 done
    • Lorena
      • Providing feedback on 3.4.4 and troubleshooting blacklist
    • Mambelli
      • Testing on pending issues on 3.4.4
      • Moving singularity wrapper
      • Get started container test for Nova
      • Python 3 migration: On hold until 3.5 finalized.
      • Need to fast track the glideinwms 3.5 if it needs to go in the upcoming.

April 03, 2019

Marco Mambelli, Lorena Lobato, Dennis Box, Parag Mhashilkar, Dave Dykstra, Marco Mascheroni

  • Singularity
    • 3.1.1 released in OSG upcoming-development, fedora and planning for EPEL as well
      • Fixes last known problem about incompatibility with 2.6
    • In last few days figured out how to mount fuse file system as privileged in HPC system and run fuse system in side the container so can run CVMFS inside. Way to run CVMFS in HPC. It should avoid need for huge containers with CVMFS inside it. It depends on libfuse-3. CVMFS developer Jacob managed to get it working in development mode.
    • Submitted a request to update install singularity documentation on how to install it and set it unprivileged. Running of CVMFS is already in 3.4.4. Once experiments adopt it, we can start telling experiments to remove singularity installation.
    • CMS is thinking about moving to unprivileged singularity. Brian pushing for pilot sites first before asking other sites.
  • Release Status v3.4.4
    • rc4 test all positive. If everything is ok Marco will release later today or tomorrow morning
  • Mambelli
    • Started working on condor invoking singularity for 3.5. Created branch for 3.5. Master -> 3.4
  • Mascheroni
    • Heard a talk about glideinwms and submission infrastructure from CMS side
    • Currently working on tests of improvements for count match function. Apply hotfix for cms frontend and test it.
    • Fix for downtime entries. Adding option in frontend to ignore entires in downtime and consider them for un-matched.
    • With Edgar found some problems related to schedd 8.8.1 and frontend communication. Frontend is running 8.6. Couple of options in ticket with glideinwms.
  • Dennis
    • Will work on smoke tests later today
  • Lorena
    • Working on Couple of tickets, providing feedback and testing. Fixing review errors on branch used for code review. troubleshooting fife script.
    • Will talk to condor team relate to any problem related to black list script.
    • Working on configuration on black hole detection

March 27, 2019

Marco Mambelli, Lorena Lobato, Dennis Box, Parag Mhashilkar

  • v3.4.4 release update
    • Features are almost done. Will have release candidate later today. #21916
    • One of the unit test will be in next release. #21940
  • 3.5
    • Plan is to get changes during the review and release them
    • Single user factory
    • Use of condor to start singularity. Everything will be done through condor. condor ssh to job will work by doing this.
    • #20799 will be done for v3.5
    • Will go in osg upcoming
  • FIFE & GCO glidein_off not working with the way infrastructure is deployed and supported here. Its not glideinwms problem but we should help them arrive at an agreement and then close the ticket.

March 6, 2019

Marco Mascheroni, Marco Mambelli, Parag Mhashilkar, Maria Zvada

Dave Dykstra, Dennis Box, Marco Mascheroni, Marco Mambelli, Lorena Lobato

  • Singularity (Dave Dykstra)
    • Singularity 3.1.0 released and built for Fedora. Incompatibility w/ 2.6 (if there is a duplicate bind path behaves differently: 2.6 accepted it, 3.1 gives a fatal error)
    • The show stopper is an issue w/ unprivileged Singularity pulling from Docker (not working in 3.1), will be fixed soon, high priority
    • Future feature: Potential to be able to do nested Singularity (outside would need setuid root, inside could be unprivileged), will require the most recent kernel from EL7. Would allow using Singularity in the node and run it from the glidein
    • Next week will be at the Singularity users meeting
  • Release 3.4.4
    • Tickets halting:
      • Factory monitoring for HEPClous
      • Unit test for boolean values (Lorens will work on it since Dennis is taking some days off)
    • There may be a ticket about parsing metasite configuration
  • Developers
    • Marco Mascheroni
      • Discussed w/ Factory operation prototype for configuration generation
        • Mostly happy, some small changes requested
        • Will start testing it in production for a small set of entries
    • Lorena
      • Mostly training and sick leave
      • Provided feedback to some tickets
      • Troubleshooting w/ Shreyas FIFE periodic script for back holes
      • Working on black hole ticket w/ condor team
    • Dennis
      • Checking CI infrastructure
    • Marco Mambelli
      • Working on Factory monitoring
      • Feedback to Brian Lin for a fix in the proxy renewal script
      • Troubleshooting a couple of factory issues
      • Meeting about containers in FIFE
  • Next week there is the stakeholders meeting
    • Marco gave feedback to Dennis's and Lorena's slides
    • Marco Mambelli and Marco Mascheroni will provide the slides to Parag within the day

February 27, 2019

Marco Mascheroni, Marco Mambelli, Parag Mhashilkar, Maria Zvada

  • Developers
    • Marco Mambelli
      • Made 3.4.4
      • Working on problem where monitoring stats going to 0 but was not able to reproduce it
      • Schedd downtime is reported incorrectly in xml as is updated only when there is work
      • Working on troubleshooting factory with Krista
      • Will adapt to singularity solution provided by HTCondor after 3.4.4
    • Marco Mascheroni
      • Manual glidein startup
      • Setting attr to constant = False prevents publishing to factory
      • Working on issues related to FIFE support and Shreyas about glidein shutdown
      • CRIC site config generation

February 20, 2019

Dave Dykstra, Dennis Box, Marco Mascheroni, Marco Mambelli, Parag Mhashilkar, Lorena Lobato

  • Singularity (Dave Dykstra)
    • 3.0.3 is released in Fedora now. Waiting on another fix before it will be in osg production. There are some features that Atlas need but do not work
    • Its currently in RC3. Only works for root users and planning for unprivileged users
  • v3.5
    • Couple tickets are in feedback mode
    • There are couple of values that are still string/boolean for True/False
    • #21884 Testing is looking good. Will also test with new format
  • Developers
    • Dennis
      • One of the tickets (#20215) from Lorena broke unittests. Resulted in the improvement of the unittests!
    • Marco Mascheroni
      • #19949 Got feedback from Lorena should be done quickly
      • #21898 Lorena provided the fix
      • Will feedback for #20861
      • New project TODAS working with CMS to launch pilot. They start glidein_startup script and connect to another pool. There are validation scripts that need to be tackled.
      • In 3.4.3 we could not change parameter that were const if attr is cont in global and not in entry
    • Lorena

February 13, 2019

Marco Mascheroni, Dennis Box, Parag Mhashilkar, Lorena, Lobato

  • Developers
    • Dennis Box
      • No progress on CI side
      • Testing new CAs on gws-dev factory and frontend and htcondor ce
      • Would like to handle monitoring
      • Gave feedback on #15176
    • Marco Mascheroni
      • Doing test of 3.4.3 and found couple of issues. Meta sites related issues.
      • Started working with factory operator who will have cycles for development on work related to CRIC.
      • Estimation of memory on sites with glidein cpus are auto
    • Lorena Lobato
      • Testing 3.4.3 with htcondor 8.8 and working with TJ for enabling statistics
      • Handling classad from frontend to factory glidein job classad. Publish is not available in frontend side

February 06, 2019

Marco Mascheroni, Dennis Box, Parag Mhashilkar, Lorena, Lobato, Marco Mambelli, Dave Dykstra

  • Dave Dykstra
    • Singularity 3.03 is ready for osg upcoming
      • Known issue with unprivileged node. When executing from docker requires privilege. Singularity dev team plan to fix it.
      • WLCG working group meeting. More testing before rolling out unprivileged mode. On the order of 6 months. Takes long because of the Singularity audit going and scheduled to be done by mid June. Some members want to point to audit before making recommendation
      • Atlas want to be able to read from docker on worker nodes. Download the docker containers on WN. Thats a lot of overhead and sounds crazy. They don't want to maintain image repo.
      • Marco: CMS wants condor ssh to job to work but that required startd to be started as root which glideinwms cannot.
        • Dave thinks he can provide some help in that direction
      • Travel to WLCG workgroup. They are asking SI lab and they maybe able to pay for Dave's travel. CMS is already paying for CVMFS workshop.
      • Dave and Marco to work together on providing solution for CMS
  • Dennis Box
    • Reviewing #21682. Will be done and go back to working on #2531
    • No progress on travis ci and getting artifacts
  • Marco Mascheroni
    • Couple of issues from CMS. Frontend crashing because there is one of the attribute in schedd that evaluated to error/undefined causing the exception. We need to add more protection. Leak in the fork.py. Changes may not be propagating to the frontend process.
    • Factory operator added an entry. She couldn't get logs from pilots because the pilots were removed based on frontends request. Mambelli, added a disable to fix it. Getting log when you kill the job depends on batch system. if it is translated to kill -9 you don't get it back.
    • CPU = auto and memory set to zero
    • Operations team meeting
      • Session on auto generation of config. Address problem at abstract level, trying to identify category of items required for config.
      • Topics based on migration of services. Not focused on different factory/services etc
  • Lorena
    • Testing 3.4.3 glideinwms + htcondor 8.4.8 identify black hole
  • Marco Mambelli
    • Working mainly on troubleshooting issues about frontend crashing.
    • HTCondor survives the glidein. Made changes on glidein and condor startup. There is trap in place to forward the signal. Glideins were killed write after starting. Making script more responsive. Working with Diego and sysadmin at Purdue to troubleshoot. Their pbs is sending sig term and sig kill one after the other. So we dont get time to react. Working with OSG team since their wrapper script is not forwarding signals correctly.
    • Release of 3.4.3 has been promoted to testing.
    • Started working on the multi node glidein ticket. Added an option as multi glidein.
    • glidein_off problem reported by Shreyas. Mascheroni to follow up.
  • Project News
    • There is a possibility of moving the project from Redmine to GitHub
    • Marco submitted 4 student requests.

January 30, 2019

Marco Mambelli, Dennis Box, Parag Mhashilkar

  • Marco Mambelli
    • Move code review to Thursday and Friday during OSG All hands meeting.
    • Talk to OSG. They released osg release. They will release glideinwms in the coming release in 2-3 weeks
    • There is still issues about condor daemons surviving past glidein startup script
    • Started working on Singularity to consider release distributed by OSG in CVMFS and consider it in the path.
    • Wrote possible projects for summer interns and there was some communication with Sandra
  • Dennis Box
    • Working on #2531 store number of jobs restarts in frontend.

January 23, 2019

Marco Mambelli, Marco Mascheroni, Dennis Box, Parag Mhashilkar, Dave Dykstra

  • Singularity
    • OSG releasing singularity 3.0.2 in upcoming (current release in EPEL)
    • The problem seen at OSC w/ Singularity 3 (Too many symbolic links, was giving a permission error from the kernel to Singularity, was working w/ 2.6) seemed more a site problem: updating to RHEL 7.5 fixed the problem
    • Singularity 3.0.3 released and will be soon in EPEL
  • v3.4.3 Release Status
    • Mambelli:
      • RC2 out in osg-development, tests are OK so far
      • Release expected for Thursday or Friday
      • Still investigating some worker nodes where glidein is killed but condor keeps running and accepting jobs, moved the ticket to 3.5
  • Developers
    • Mascheroni
      • Busy w/ operations this past week
      • Will work more on interfacing with CRIC
      • Will check w/ Frank about skipping Thursday at OSG all-hands to do GlideinWMS code review then
    • Dennis
      • kicked off automated tests, so far all OK
    • Mambelli
      • Completed 3.4.3 tickets
      • Prepared RC and started tests
      • Troubleshooting HTCondor surviving glidein. Possible race condition?
  • Tentative code review dates: April 1, 2 or March 21, 22 (after OSG all-hands)

January 16, 2019

Marco Mambelli, Marco Mascheroni, Lorena Lobato, Dennis Box, Parag Mhashilkar

  • v3.5 Release Status
    • Mambelli:
    • Waiting on feedback on couple of tickets. Cut RC but does not include those changes. It should be in the osg-development soon. It is in minefield
    • Need to check with Steve ticket resolves what he needs.
    • There might be some worker nodes where glidein is killed but condor keeps running and accepting jobs.
      • Singularity support added process group and there is a condor warning that it may prevent you from condor to be killed.
  • Developers
    • Lorena
      • Mainly working feedback of tickets and getting ready for release candidate
    • Mambelli
      • Monitoring tickets and working with Thomas. Last week for his last week. Frontend was reporting and Factory had some problems.
      • Dennis interested in picking up the monitoring work from Thomas.
    • Dennis
      • One ticket for #21763. Parsing files into other config files. Not sure if it should go in this release? As per Marco some changes are necessary.
    • Mascheroni
      • Couple of fixes for the release
      • Looking at the process group issue on worker node
      • Working with the CRIC developers for interfacing with CRIC
  • Tentative code review dates: April 1, 2