V2.7.1 RPM testing
- basic test of start/stop on gwms-frontend/condor successful
- condor is not getting installed
- see fermicloud009:/opt/ferpm-install.2013-05-09.log for full details on install.
- for SL5 comparison, see fermicloud002:/opt/ferpm-install.2013-05-09.log
#4 Updated by John Weigand over 6 years ago
From a basic install perspective, all modes of glideinwms rpm installation
appear good. You can view the various nodes these are currently installed
on here: http://home.fnal.gov/~weigand/test_nodes/fermicloud.html#GLIDEINWMS
NOTE: The data on this page can change over time.
These are the types:
ferpm - single node frontend, submit, user collector
fe - just the frontend
fecol - just the submit service
fesub - just the user collector service
The numeric suffix indicates sl5/6.
These were installed from the osg-development repo.
The sl6 installations did required doing an independent
yum install due to the empty_condor issue in sl6.
The "package details" link shows the results of the yum
#6 Updated by John Weigand over 6 years ago
For this release, all you asked for was a basic
rpm install verification. To perform a test like you
are asking for will require significantly more (repeat
significantly more) effort due to all the manual configuration
that has to be done. I can shoot for the next release on
that or wait a week if needed on this one.
#7 Updated by Parag Mhashilkar over 6 years ago
- Assignee changed from Parag Mhashilkar to John Weigand
Lets do the full test for atleast one combo with the rpm. Can you please do it for this release? I am not holding off v2.7.1 for this since I do the tarball install+tests which turned out ok. So a week is fine too.
Your rpm tests will validate running glideinwms services from rpms as well.
Reassigning the ticket to you since you are doing the work :)
#9 Updated by John Weigand over 6 years ago
- Successful test of submitting jobs.
- frontend/frontend.20130514.info.log / err.log
[2013-05-14T10:10:58-05:00 26583] Checking groups ['main']
[2013-05-14T10:10:58-05:00 26583] WARNING: [<subprocess.Popen object at 0x116d21
ce.py:20: DeprecationWarning: the sets module is deprecated
from sets import Set
- Message appears only on the 1st iteration then never again.
No classads are advertised to factory from there on out.
#12 Updated by John Weigand over 6 years ago
- when doing an initial reconfig (which is required), the following
[root@fermicloud320 init.d]$ service gwms-factory reconfig ~/work-dir / Warning: Cannot find /var/lib/gwms-factory/work-dir/glideinWMS.xml If this is the first reconfig, you can ignore this message.
The warning message is ok but why is it showing '~/work-dir /'. Actually,
this line is output on all reconfigs. Appears to be debugging line that
never got removed or commented.
- A 2nd reconfig results in this error.
Failed to create base clientlog dir (user frontend): Unexpected Error running '/usr/bin/../sbin/condor_root_switchboard mkdir 0 2'. Details: Command '/usr/bin/../sbin/condor_root_switchboard mkdir 0 2' returned non-zero exit status 1: invalid caller gid (5111)
There are 2 questions:
1. Why did I get this error the 1st time
2. This was installed on a fermicloud node that apparently already had a gfactory
user defined as uid=43680(gfactory) gid=5111(e875) groups=5111(e875),3302(condor)
My guess is that when the rpm installs, it will create a gfactory user as gfactory.gfactory.
But if the user already exists, it should either
... use a group already assigned and update the /etc/condor/privsep_config file accordingly.
... or create the new gfactory group if it does not exist
I would think a similar problem would occur with the frontend user. In this case,
the frontend user did not already exist, so this was not a problem... but would likely be.
- The initial gwms-factory.xml file comes down with
<entry name="TEST_ENTRY" enabled="False"
1. The doc does not tell you to change this.
2. When all entry elements are False,
... a reconfig works successfully
... however, it fails on start up with this (only if you remove the /dev/null
Starting glideinWMS factory: Traceback (most recent call last): File "/usr/sbin/glideFactory.py", line 539, in ? main(sys.argv) File "/usr/sbin/glideFactory.py", line 431, in main write_descript(glideinDescript,frontendDescript,os.path.join(startup_dir, 'monitor/')) File "/usr/sbin/glideFactory.py", line 77, in write_descript entryDescript = glideFactoryConfig.JobDescript(entry) File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 250, in __init__ repr) # convert everything in strings File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 79, in __init__ ConfigFile.__init__(self,os.path.join("entry_"+entry_name,config_file),convert_function) File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 54, in __init__ self.load(config_file,convert_function) File "/usr/lib/python2.4/site-packages/glideinwms/factory/glideFactoryConfig.py", line 58, in load fd=open(fname,"r") IOError: [Errno 2] No such file or directory: 'entry_/job.descript'
#13 Updated by John Weigand over 6 years ago
- I copied and pasted an entry point element into my fermicloud320 rpm install
from another factory I had. It referenced the wrong schedd_name.
... entry name="ress_ITB_INSTALL_TEST_2"... schedd_name=" cms-xen21.fnal.gov "
a reconfig resulted in this stacktrace
Traceback (most recent call last): File "/usr/sbin/reconfig_glidein", line 218, in ? main(params, old_params, update_scripts, update_def_cfg) File "/usr/sbin/reconfig_glidein", line 46, in main glidein_dicts_obj.populate() File "/usr/lib/python2.4/site-packages/glideinwms/creation/lib/cgWParamDict.py", line 430, in populate self.local_populate(params) File "/usr/lib/python2.4/site-packages/glideinwms/creation/lib/cgWParamDict.py", line 459, in local_populate global_schedd_count[params.entries[sub_name].schedd_name]+=1 KeyError: u'cms-xen21.fnal.gov'
#15 Updated by John Weigand over 6 years ago
- I should note that for the factory, I did fail to
to notice that in the doc it says to do an upgrade and
not a reconfig. This caused a problem in one instance
and was corrected by doing the upgrade.
- Should one always do an upgrade?
And maybe the reconfig be disabled for the factory?
I don't know the real differences between the 2 at this
point in time.
#16 Updated by Parag Mhashilkar over 6 years ago
- Status changed from New to Closed
Update, updates the scripts in the glideinwms work dir while reconfig does not. You need to do upgrade atleast once to start with. Closing this ticket since the issues reported were taken care of or new tickets were opened to track them.