Project

General

Profile

Feature #4505

PMI Lookup name failures at LNGS

Added by Kurt Biery about 6 years ago. Updated over 5 years ago.

Status:
Closed
Priority:
High
Assignee:
-
Target version:
-
Start date:
08/06/2013
Due date:
% Done:

0%

Estimated time:
Duration:

Description

When trying to run the latest ds50daq code at LNGS, I ran into problems starting the system, as shown below.

We should investigate, and fix, the source of these errors.

[dsfr1:1003:0]~/system-multi-aggregator/profile$ startCommPhase2System.sh
Log file name: /daqlogs/pmt/pmt-7021.1-20130806155803.log
[2013-08-06 15:58:03] INFO WEBrick 1.3.1
[2013-08-06 15:58:03] INFO ruby 1.8.7 (2011-06-30) [x86_64-linux]
[2013-08-06 15:58:03] INFO WEBrick::HTTPServer#start: pid=7021 port=5467
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsfr1:boardreader:5441
boardreader on dsfr1 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsfr1:boardreader:5440
boardreader on dsfr1 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsfr1:boardreader:5442
boardreader on dsfr1 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsfr1:boardreader:5443
boardreader on dsfr1 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsfr2:boardreader:5445
boardreader on dsfr2 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsfr2:boardreader:5444
boardreader on dsfr2 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dseb4:eventbuilder:5452
eventbuilder on dseb4 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dseb5:eventbuilder:5453
eventbuilder on dseb5 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dseb1:eventbuilder:5450
eventbuilder on dseb1 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dseb2:eventbuilder:5451
eventbuilder on dseb2 is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsag:aggregator:5460
aggregator on dsag is starting.
Tue Aug 06 15:58:03 +0200 2013: STARTING:dsag:aggregator:5461
aggregator on dsag is starting.
Tue Aug 06 15:58:05 +0200 2013: [11] Abort: PMI Lookup name failed
Tue Aug 06 15:58:05 +0200 2013: at line 951 in file /var/tmp/OFED_topdir/BUILD/mvapich2-1.7-r5140/src/mpid/ch3/channels/common/src/rdma_cm/rdma_cm.c
Tue Aug 06 15:58:05 +0200 2013: [10] Abort: PMI Lookup name failed
Tue Aug 06 15:58:05 +0200 2013: at line 951 in file /var/tmp/OFED_topdir/BUILD/mvapich2-1.7-r5140/src/mpid/ch3/channels/common/src/rdma_cm/rdma_cm.c
Tue Aug 06 15:58:05 +0200 2013: [ring_startup.c:184]: PMI_KVS_Get error
Tue Aug 06 15:58:05 +0200 2013: [ring_startup.c:184]: PMI_KVS_Get error
Tue Aug 06 15:58:05 +0200 2013: EXITING:dsfr1:1:boardreader:5440
boardreader on dsfr1 is exiting.
Tue Aug 06 15:58:05 +0200 2013: EXITING:dsag:253:aggregator:5461
aggregator on dsag is exiting.
Tue Aug 06 15:58:05 +0200 2013: EXITING:dseb5:1:eventbuilder:5453
eventbuilder on dseb5 is exiting.

^CCleaning up. Please wait for PMT to exit...
[2013-08-06 15:58:09] INFO going to shutdown ...
[2013-08-06 15:58:09] INFO WEBrick::HTTPServer#start done.

History

#1 Updated by Kurt Biery about 6 years ago

Rebooting dseb5 may have helped, but I haven't yet rebooted dsag. That will be the definitive test. Hopefully, Alessandro will either reboot dsag or give me the OK to do that.

#2 Updated by Kurt Biery over 5 years ago

  • Status changed from New to Closed

I'm not sure what the resolution of this issue was, but it did get resolved...



Also available in: Atom PDF