Project

General

Profile

Feature #11900

Forwarding glidein information from the factory to a monitoring server

Added by Marco Mambelli over 4 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Category:
-
Target version:
Start date:
03/04/2016
Due date:
% Done:

0%

Estimated time:
Stakeholders:
Duration:

Description

The glidein collects in the factory information about all the jobs it ran and also the condor log files.

Brian expressed interest in collecting this information:

Hi Jeff et al,

I want to build an improved picture of our operations from the “factory and pilot” point of view.  Toward this end, I think there are a few pieces of information I want to index in elastic search:
1) Pilot ads (trivial - I can pull them remotely, you don’t need to be involved).
2) condor_startd history ads.  Harder - I need to get them from pilot’s logfiles.  What’s the best way to pull these?  Giant cron job on each factory?
3) Pilot summary.  Doesn’t each pilot have an XML summary, independent from the condor classad (i.e., in case validation failed?)?  Is this something I could vacuum up and put into ElasticSearch?

Do we have all pilot logs on a web server yet?

Thanks,

Brian

History

#1 Updated by Marco Mambelli over 4 years ago

  • Status changed from Assigned to Feedback
  • Assignee changed from Marco Mambelli to Parag Mhashilkar

The code is in the branch mmdev_logcat_batch and is a new version of gwms-logcat

It has an option to forward information to a folder or a URL (via post)
I tested only the file forwarding
If run periodically, e.g. via cron on the factory it could send to you the information needed

./gwms-logcat.sh -f file:///tmp/folder startd
Will copy the startd logs from all the pilots.
The script operates chronologically for all the glideins that still have logs in the log directory and saves the timestamp to avoid resending the same files when restarted.

You need both:
./gwms-logcat.sh -f file:///tmp/folder startd
and
./gwms-logcat.sh -f file:///tmp/folder xml

They will use different timestamps so they will not step into each other.

I see 2 options:
1. writing to a folder that is served via web
2. you provide a server with log stash or something else where the script could post via curl (not tested but should work)

You can also pull the glidein stderr files to your server and run gwms-logcat or your own script to extract the log files.

#2 Updated by Marco Mambelli over 4 years ago

  • Target version set to v3_2_13

#3 Updated by Parag Mhashilkar over 4 years ago

  • Assignee changed from Parag Mhashilkar to Marco Mambelli

We should also consider https:// along with http:// Rest looks ok to merge after these changes.

#4 Updated by Parag Mhashilkar over 4 years ago

  • Status changed from Feedback to Closed


Also available in: Atom PDF