Support #5787: FNAL CVMFS Stratum 1 support
give generate_replicas an option to add repository configuration only when full snapshot exists
NOTE: the purpose of the ticket has changed. See whole comment History.
When reinstalling one half of an cvmfs stratum 1 HA pair from scratch, the /storage data is expected to persist. Change the cvmfsha-add-repository -h command to recognize that and reuse that data, similar to the way that add_osg_repository was updated in https://jira.opensciencegrid.org/browse/OO-107.
#1 Updated by Dave Dykstra over 4 years ago
Once this is done, also add an option to generate_replicas to detect the situation when there is repository full snapshot data available (with a .cvmfs_last_snapshot file) yet there is no corresponding /etc/cvmfs/repositories.d config. Then this could be run on the backup machine from cron with the cvmfs-add-repository -h command (normally it runs only on the primary machine, without -h). Maybe it would be nicer to not be done from cron but from puppet only after a reinstall, but I'm not sure how to do that.
#2 Updated by Dave Dykstra about 4 years ago
I glanced at the add-repository command today, and I believe the -h option is already doing this, and that it always did since the option was added. The corresponding generate_replicas option does not yet exist, however. I will change the title of the ticket to reflect that.
#5 Updated by Dave Dykstra almost 4 years ago
So the main reason for the generate_replicas -h feature is to prevent clashes at runtime if a new repo is in the middle of being added by the master. That is, if a new large repository was just registered in OIM around the same time that a backup machine is being reinstalled, I do not want the backup machine to add it. It should only add when a full snapshot is already available, that is, if .cvmfs_last_snapshot exists.
It just occurred to me that since generate_replicas works with two parameters, one to add and one to clean up after failures, there's also going to have to be a remove-repository -h option added to only remove the local half. This -h should also not remove the data, just the config.
Don't change generate_replicas to add -h to the add/remove commands; assume those will be passed in on the command line with the commands.
#6 Updated by Dave Dykstra almost 4 years ago
When I went to implement this I gave it some more thought and decided to change it up a bit.
First, I realized that it doesn't matter how large of a repository that the production generate_replicas might be adding, because the first thing that add-repository does is create the configuration files on both machines, so a generate_replicas running on the backup machine will very quickly realize that there's nothing for it to do.
There's still a small race condition, but I convinced myself that if an add-repository is running on the master simultaneously with add-repository -h on a freshly installed backup, one of them will get an error code and nothing will be left in a bad state.
So, I decided that generate_replicas itself doesn't need to change for this purpose. Instead, I am adding another option to add-repository, a -H that can be used in place of -h and is almost the same as -h except that if there is no old storage data present it does nothing and returns a success code. Then the regular generate_replicas can be called from puppet passing in add-repository -H and remove-repository -fh as parameters. If there is any data there, whether a complete snapshot or not, it will save the data and add the repository configuration. If there is no data there, it will instead do nothing and wait for the master's generate_replicas cron to add it.