Project

General

Profile

Bug #3841

V2.7.1 RPM testing

Added by John Weigand over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
John Weigand
Category:
test category
Target version:
Start date:
05/09/2013
Due date:
% Done:

0%

Estimated time:
First Occurred:
Occurs In:
Stakeholders:
Duration:

Description

SL5 - glideinwms-vofrontend (osg-development)
  • basic test of start/stop on gwms-frontend/condor successful
SL6 - glideinwms-vofrontend (osg-development)
  • condor is not getting installed
  • see fermicloud009:/opt/ferpm-install.2013-05-09.log for full details on install.
  • for SL5 comparison, see fermicloud002:/opt/ferpm-install.2013-05-09.log

John Weigand


Related issues

Related to GlideinWMS - Bug #3900: Starting a factory with all Entries set to enable=False should print out helpful messageClosed05/16/2013

Related to GlideinWMS - Bug #3901: Need better error reporting when reconfiging the factory with wrong schedd_name in the entryClosed05/16/2013

History

#1 Updated by John Weigand over 6 years ago

SL5 and SL6 - glideinwms-vofrontend and glideinwms-frontend-standalone (osg-development)
  • when checking runlevels for the gwms-frontend initd service,
    it is for run levels 3 and 5 only. Is this correct? Why not
    run level 2 and 4 also which is what condor uses.

John Weigand

#2 Updated by John Weigand over 6 years ago

glideinwms-usercollector / glideinwms-userschedd
  • not showing or bringing down the osg-version rpm

And this is just a thought... should we have a
glideinwms-version module?

John Weigand

#3 Updated by Parag Mhashilkar over 6 years ago

We dont need glideinwms version module. Our versioning is already in there and handled in a different way. If someone really needs we can provide a tool glideinwms-version in future.

#4 Updated by John Weigand over 6 years ago

From a basic install perspective, all modes of glideinwms rpm installation
appear good. You can view the various nodes these are currently installed
on here: http://home.fnal.gov/~weigand/test_nodes/fermicloud.html#GLIDEINWMS
NOTE: The data on this page can change over time.

These are the types:
ferpm - single node frontend, submit, user collector
fe - just the frontend
fecol - just the submit service
fesub - just the user collector service

The numeric suffix indicates sl5/6.
These were installed from the osg-development repo.
The sl6 installations did required doing an independent
yum install due to the empty_condor issue in sl6.

The "package details" link shows the results of the yum
installs performed.

John Weigand

#5 Updated by Parag Mhashilkar over 6 years ago

This is cool.

One more question, did you try to get a test job running on any of the installs?

#6 Updated by John Weigand over 6 years ago

No

For this release, all you asked for was a basic
rpm install verification. To perform a test like you
are asking for will require significantly more (repeat
significantly more) effort due to all the manual configuration
that has to be done. I can shoot for the next release on
that or wait a week if needed on this one.

John Weigand

#7 Updated by Parag Mhashilkar over 6 years ago

  • Assignee changed from Parag Mhashilkar to John Weigand

Lets do the full test for atleast one combo with the rpm. Can you please do it for this release? I am not holding off v2.7.1 for this since I do the tarball install+tests which turned out ok. So a week is fine too.

Your rpm tests will validate running glideinwms services from rpms as well.

Reassigning the ticket to you since you are doing the work :)

#8 Updated by Parag Mhashilkar over 6 years ago

  • Subject changed from V2.7.1 testing to V2.7.1 RPM testing

#9 Updated by John Weigand over 6 years ago

SL5 - glideinwms-vofrontend (osg-development)
  • Successful test of submitting jobs.
SL6 - glideinwms-vofrontend (osg-development)
  • frontend/frontend.20130514.info.log / err.log
    [2013-05-14T10:10:58-05:00 26583] Checking groups ['main']
    [2013-05-14T10:10:58-05:00 26583] WARNING: [<subprocess.Popen object at 0x116d21
    0>]: /usr/lib/python2.6/site-packages/glideinwms/frontend/glideinFrontendInterfa
    ce.py:20: DeprecationWarning: the sets module is deprecated
    from sets import Set
  • Message appears only on the 1st iteration then never again.
    No classads are advertised to factory from there on out.

John Weigand

#10 Updated by John Weigand over 6 years ago

SL6 - glideinwms-vofrontend (osg-development)
  • Now successfully processing jobs. Classad error previously
    reported was to a bad configuration on my part.
  • The import error is still an issue.

John Weigand

#11 Updated by John Weigand over 6 years ago

Final testing results I think for 2.7.1-1.0 rpms.

sl5 all version
  • looks good
sl6 all version
  • looks good
  • import issue has been resolved
Open, maybe non-critical issue still open
  • run levels for init.d service gwms-frontend are still missing
    run level 4.

John Weigand

#12 Updated by John Weigand over 6 years ago

sl5 glideinwms-factory-2.7.1-1.0
  • when doing an initial reconfig (which is required), the following
    is shown
    [root@fermicloud320 init.d]$ service gwms-factory reconfig
    ~/work-dir /
    Warning: Cannot find /var/lib/gwms-factory/work-dir/glideinWMS.xml
    If this is the first reconfig, you can ignore this message.
    

    The warning message is ok but why is it showing '~/work-dir /'. Actually,
    this line is output on all reconfigs. Appears to be debugging line that
    never got removed or commented.
  • A 2nd reconfig results in this error.
    Failed to create base clientlog dir (user frontend): 
    Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard mkdir 0 2'. 
    Details: Command '/usr/bin/../sbin/condor_root_switchboard mkdir 0 2' 
    returned non-zero exit status 1: invalid caller gid (5111)
    

    There are 2 questions:
    1. Why did I not get this error the 1st time
    2. This was installed on a fermicloud node that apparently already had a gfactory
    user defined as uid=43680(gfactory) gid=5111(e875) groups=5111(e875),3302(condor)
    My guess is that when the rpm installs, it will create a gfactory user as gfactory.gfactory.
    But if the user already exists, it should either
    ... use a group already assigned and update the /etc/condor/privsep_config file accordingly.
    ... or create the new gfactory group if it does not exist
    I would think a similar problem would occur with the frontend user. In this case,
    the frontend user did not already exist, so this was not a problem... but would likely be.
  • The initial gwms-factory.xml file comes down with
     <entry name="TEST_ENTRY" enabled="False" 
    

    1. The doc does not tell you to change this.
    2. When all entry elements are False,
    ... a reconfig works successfully
    ... however, it fails on start up with this (only if you remove the /dev/null
    of stdout/err)
    Starting glideinWMS factory: Traceback (most recent call last):
      File "/usr/sbin/glideFactory.py", line 539, in ?
        main(sys.argv[1])
      File "/usr/sbin/glideFactory.py", line 431, in main
        write_descript(glideinDescript,frontendDescript,os.path.join(startup_dir, 'monitor/'))
      File "/usr/sbin/glideFactory.py", line 77, in write_descript
        entryDescript = glideFactoryConfig.JobDescript(entry)
      File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 250, in __init__
        repr) # convert everything in strings
      File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 79, in __init__
        ConfigFile.__init__(self,os.path.join("entry_"+entry_name,config_file),convert_function)
      File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 54, in __init__
        self.load(config_file,convert_function)
      File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 58, in load
        fd=open(fname,"r")
    IOError: [Errno 2] No such file or directory: 'entry_/job.descript'
    

#13 Updated by John Weigand over 6 years ago

sl5 glideinwms-factory-2.7.1-1.0
  • I copied and pasted an entry point element into my fermicloud320 rpm install
    from another factory I had. It referenced the wrong schedd_name.
    ... entry name="ress_ITB_INSTALL_TEST_2"... schedd_name=" cms-xen21.fnal.gov "
    a reconfig resulted in this stacktrace
    Traceback (most recent call last):
      File "/usr/sbin/reconfig_glidein", line 218, in ?
        main(params, old_params, update_scripts, update_def_cfg)
      File "/usr/sbin/reconfig_glidein", line 46, in main
        glidein_dicts_obj.populate()
      File "/usr/lib/python2.4/site-packages/glideinwms/creation/lib/cgWParamDict.py", line 430, in populate
        self.local_populate(params)
      File "/usr/lib/python2.4/site-packages/glideinwms/creation/lib/cgWParamDict.py", line 459, in local_populate
        global_schedd_count[params.entries[sub_name].schedd_name]+=1
    KeyError: u'cms-xen21.fnal.gov'
    

John Weigand

#14 Updated by John Weigand over 6 years ago

sl5 and sl6 glideinwms-factory-2.7.1-1.0
  • no more issues found

John Weigand

#15 Updated by John Weigand over 6 years ago

sl5 and sl6 glideinwms-factory-2.7.1-1.0
  • I should note that for the factory, I did fail to
    to notice that in the doc it says to do an upgrade and
    not a reconfig. This caused a problem in one instance
    and was corrected by doing the upgrade.
  • Should one always do an upgrade?
    And maybe the reconfig be disabled for the factory?
    I don't know the real differences between the 2 at this
    point in time.

John Weigand

#16 Updated by Parag Mhashilkar over 6 years ago

  • Status changed from New to Closed

Update, updates the scripts in the glideinwms work dir while reconfig does not. You need to do upgrade atleast once to start with. Closing this ticket since the issues reported were taken care of or new tickets were opened to track them.



Also available in: Atom PDF