Notes from May 42011 planning meeting¶
Present: Jason A., Stu F., Steve K., Michael H-D, Art K., Nelly S., Svetlana L., Lee L.
There are several areas we need to address wrt databases for the Intensity Frontier.
1. We discussed the problems we are having serving MINOS conditions data to both MINOS and Minerva. Art explained the plan, for May 19th, to move the "main" MINOS conditions DB from minos54 back to its original server minos-mysql2. Then they will move the "farm replica" to minos54, a machine that is at least 10x more powerful. This, with software fixes as well, should resolve the MINOS/MINERVA conditions DB contention.
2. MINOS is using a custom built replication procedure that has some issues. In general, we need to adopt "standard" replication solutions supported by the Database Group. The requirements are one-way replication, and generally the data in the master copy is not continuously updated, but only once every few hours.
3. We have several new experiments asking for DB services, and we would like to have a supportable solution to provide these needs. Although we plan to encourage the use of postgressql for conditions DB applications, there may be other applications that are better suited to MySQL.
Q1. Can we have multiple, independent, instances of databases on a single machine? What are the hardware and software specifications for such a machine?
A1. Yes, DSG has experience running multiple "clusters" of databases on common hardware. They have a baseline spec for the hardware and how it is configured.
Q2. Can we run both MySQL and Postgress on the same machine?
A2. This has not been done in the past, but seems reasonable. DSG is willing to test installing both, side-by-side, and understand any configuration issues or other contentions. Alternatively, if we were to consider MySQL to be the minority platform, it could be possible to use the existing general purpose CD MySQLservers.
Q3. How many servers would meet the IF needs, and how large? How will they be sized?
A3. Three machines seem appropriate, 1) int/dev, 2) prod, and 3) replica. These should be large servers (eg 16 core) with as much memory as we can put on them (eg 64GB). We expect each experiment to need ~100GB for their disk storage. The disks should be physically on the machines.
Q4. What Service Level is required for each server?
A4. At lease prod should have 24 x 7 hardware support and 12 x 7 operational support. Some fail-over plan should be in place for replica to prod and prod to dev (or something like that). Dev only needs 8 x 5 support. Replica SL depends on fail-over arrangement, etc. but probably is same a prod.
Q5. Which CD Department would be responsible for the OS and admin?
A5. Normally, database servers are managed by the database side sys admins (USS - Tom's group). However, they require RHEL on all their platforms to make it more manageable. RHEL is required for oracle servers, but not for MySQL/Postgressql servers. FEF admins can manage SLF machines and there are many instances of critical services running this at Fermilab. Using SLF for three servers also saves a few hundred $'s a year for RHEL licenses.
Agreement and Action Items.
1. Purchase 3 servers machines with identical specs: 16 core, 64GB memory, disk and other specs to be worked out. Thes need to be spec'd and req's ready in the next 2 weeks.
2. The system support will be in FEF (Jason's Department).
3. Database instances will be supported by DSG (Nelly's group).
4. A test will be done to load both MySQL and PostgresSQL and confirm there are no problems. A platform for this will be identified.
5. Svetlana will provide hardware requirements needed to succeed with the two platforms (MySQL/PostgreSQL) on the same box .
6. DSG will make a recommendation for replication solutions for both MySQL and Postgress. The time frame for implementing this is 6 months.
Intensity Frontier Mtg - May 5, 2011 9am
Replace some old Minos Servers
Resolve issue with Minerva Monte Carlo Access
Define home for Minos Condition DB
Define O/S support team
Master DB - on a modern server current on a temporary server and re-locating off minos54 to minos-mysql2 on May 19
Minos Farm Replica (minos-mysql1) Currently being overloaded.
Client doesn't detect load
Load average of 50
Tuned to support limited # of connections
Want to separate out Minos and Minerva Access
Need Minerva Replica vs sharing Minos Farm Replica NOTE: Although once the Master DB and the replica are relocated to their final location, the load will be revisited. The contingency plan is to purchase hardware in case the performance/load is unacceptable.
Need - Discussed cluster configurations for databases. Requirements are dictating different clusters and 1 server. Intent is to promote Postgres for new databases and standard the MySQL Replica deployment
Discussion on D1 Support its either 8x5 or 24x7. The Experiment requires 12x7 which is equivalent to 24x7.
100G size How many databases? Local drives will be used.
MEM - DBAs will provide hardware requirements. Discussed SL vs RH. Discussed baseline hardware disk needs
Eventual standardization of Replication
Hardware Support - The intent is to have a similar server around to use as spare parts or use the dev server
OPTION1 : DBAs will provide specifications on hardware. Specifications will be defined to support 1 server to run two types of services, i.e. Postgres & MySQL. This configuration will be support contingent on testing.
OPTION2: Have a General Purpose IF MySQL server and a General Purpose IF Postgres server
Need to define backup strategy to support a ?what? level turn around. Discussed backing up on the dev server, more on this later.
Dev Server (promoting Postgres for new DB)
Replica Server (to support Minos MySQL Replica)
DBAs provide hardware requirements
Test Postgres and MySQL services running on one server
Define backup strategy that support ?what? level of support (spare parts, copying to dev). etc etc.
Migration to Master DB scheduled for May 19, which results in vacating current Master DB to be redeployed as the Farm Replica - Check load average and performance here helping to decide # of Replica's needed.