Pengfei, Yuri, Craig G., Adam, Leon, Alec, Zukai, Andrew and Martin (taking notes)
- The external products area is getting full, so we need to remove some old versions.
- v03_00_01 was the first one deployed on FD and it depends on novadaq v03_00_00.
- I send out an email asking whether anyone in the DDT group still needs versions before v03_00_00.
- 20,000 hits would cause memory issues according to Andrew's and Leon's private calculation.
- Andrew suggests a cut at 5,000 hits per slice for the Hough Tracker.
- Craig G. brought up the point that we could regain some of our live time by having the Hough Tracker not process the large hit-multiplicity events.
- With Craig's point in mind Leon and Andrew came up with 1,200 hits per slice.
- Yuri will add this cut and commit it and then I will cut a new release.
- Leon was asking whether we can deploy this in stages.
- We discussed the possibility of rolling out the new release on a subset of the buffer nodes. However, this would then take a long time to run into one of these large memory usage events.
- Andrew proposes that we deploy system wide.
- Leon then asks that we are vigilant and investigate any DDT crash immediately.
- We can then roll back if there are too many issues.
- Perhaps we need a Ganglia metric that tells us how many DDT processes are running.
- Perhaps the DDTManager could report this.
- Andrew thinks that this should be done on each host, so DDTManager is not a good place since it runs on one host.
- Alec explains that gmond.conf needs to be changed to pick up such a new DDT process count metric.
- Alec volunteered to write such a monitoring script.
- Leon has been looking through the documentation and will send comments around about things that could improve.
- I reported on the time that we take with 1 process versus 13 processes on one host.
- Andrew is not convinced that there is actually an increase there, but notes that it could simply be a statistical fluctuation.
- My main point here is that we are too slow to keep up.