Project

General

Profile

Task #14090

OnDemand: Make load-balanced, autoscaled managed instance group of squid servers

Added by Steven Timm almost 4 years ago. Updated almost 4 years ago.

Status:
Closed
Priority:
High
Assignee:
Start date:
10/25/2016
Due date:
11/02/2016
% Done:

100%

Estimated time:
24.00 h
Duration: 9

History

#1 Updated by Steven Timm almost 4 years ago

  • Due date set to 10/17/2016

#2 Updated by Steven Timm almost 4 years ago

  • Estimated time set to 40.00 h

#3 Updated by Neha Sharma almost 4 years ago

  • Status changed from New to Work in progress
  • % Done changed from 0 to 20

Between yesterday and today, I

0. Played around in the squid VM currently launched, to get familiar with the environment
1. Read documentation on instance groups (managed vs unmanaged) - https://cloud.google.com/compute/docs/instance-groups
2. Read documentation on autoscaling managed groups - https://cloud.google.com/compute/docs/autoscaler/
3. Read documentation on load balancing - https://cloud.google.com/compute/docs/load-balancing-and-autoscaling
4. Created a managed group instance template - test-squid-managed-group-template-1
5. Tried to create a managed group (test-squid-managed-group-1), that would use template in step#4 to launch VMs (squid servers)

What I am currently stuck at is configuring autoscaling. My understanding is that autoscaling will need to be done based on 'Monitoring Metric' (EC2 service)
https://cloud.google.com/compute/docs/autoscaler/scaling-stackdriver-monitoring-metrics
https://cloud.google.com/monitoring/api/metrics#agent-network

Based on my brief understanding of autoscaling constraint used for AWS test, there are several metrics that come to mind - NetworkIn/Average, NetworkIn/Maximum, NetworkIn/Minimum, NetworkOut/Maximum, NetworkOut/Minimum, NetworkOut/Average.

I'll sit with Hyunwoo tomorrow to understand which metric makes most sense and what threshhold should be configured for it.

Once I have this info, I'll be able to create a managed group (multi zone) and make further progress

#4 Updated by Neha Sharma almost 4 years ago

Based on info provided by Steve (over IM) and brief discussion with Hyunwoo today, I ended up configuring autoscaling based on Stackdriver metric named 'aws.googleapis.com/EC2/NetworkOut/Maximum', with scale-up threshold set to 1.1GBytes/minute

There is now a test managed group (test-squid-managed-group-1) that uses autoscaling. As soon as I created it, and since I (initially, by mistake) configured minimum instance count to 3, 3 VMs were launched. They look similar to the one VM that was already running (launched by Steve, using the new 56GB image). I tested access and setup by logging in to 1 of the 3 new VMs.

New instances are running in following zones

us-central1-b
us-central1-c
us-central1-f

Next, since GCE lets you specify health checks for VMs in managed group, I asked Hyunwoo if anything similar was done for AWS. He remembered this being done at load balancer level, but will get back to me with exact details.

In the meantime, I configured a simple http check for VMs in GCE managed group.

Hyunwoo also made me realize, that there was no point in setting number of minimum instances to 3. As per his suggestion, and I agree, 1 is the right number that we want to begin with. I made that change (3 > 1), however doesn't look like GCE knows about it - as of writing this update, it still seems to ensure that 3 VMs run.

More as I find out.

Will discuss the progress with Steve tomorrow.

#5 Updated by Steven Timm almost 4 years ago

  • Subject changed from Make load-balanced, autoscaled managed instance group of squid servers to OnDemand: Make load-balanced, autoscaled managed instance group of squid servers

#6 Updated by Steven Timm almost 4 years ago

  • Parent task set to #13986

#7 Updated by Steven Timm almost 4 years ago

As of thursday afternoon the health check was failing. Investigation showed that health checkers run only from the internet in the 130.211 block of IP addresses. We opened the firewall of the squid server to take this IP range but still also have to change the squid server configuration to allow queries from that IP. More will be done when Neha is back on Monday.

#8 Updated by Steven Timm almost 4 years ago

I sent the following query to

This is a followup to last week's questions re. load balancing.

First a background of what we are trying to do:

a) our code is served via an http-backed FUSE file system called CVMFS. We fetch it from a remote

server and use a squid proxy or proxies local to the google cloud to cache it, since all jobs are fetching

mostly the same stuff. This is approximately 2.5GB per VM launched. there is caching on each instance

and also on the central squid server.

b) our database is served via FronTier which is a web service. We cache this on a central squid server and

again once on each node. There is about 500MB of action on any one job.

c) Our goal was to create a Managed Instance Group in each zone, which would be scaled via the load balancer

number of incoming requests.

d) we would then use Cross-region Load balancing to define one name so that no matter which zone we are

running in, we can have our on-vms configuration point to one fixed location for the squid server and then

have the cross-region load balancer sort out where to go.

e) In attempting to set up the cross-region load balancing, the problem we are having is that the http

health-check does not appear to be well-formed to get a good response from a squid server listening on port 3128.

We can see the health check coming in from the load balancers but the request is denied with a 400 error

130.211.1.179 - - [01/Nov/2016:15:32:48.807 0500] "GET NONE:// HTTP/0.0" 400 1740 TCP_DENIED:NONE 0 " " "" "-"

f) The only kind of http test that a squid server would respond to is one where we requested a URL that is not on the machine in question. i.e. trying to wget://my-squid-server-private-hostname:3128/ is never going to work. There does not apear to be any option to customize the health check to do that as far as we can tell.

g) Solomon said that global load balancing works only for https but all the documentation we have

seen indicates otherwise, that http load balancing works too.

So the questions:

1) Do you have any worked examples of someone who has used cross-region load balancing to have a set of squid servers served behind a load balancer? If so, what health check did they use?

And then various questions related to fall back positions:

2) if we are not doing load balancing, but just one squid server per zone, which would almost be enough, would there be any way to have a forwarder point to just the single squid server in the closest zone?

3) If we just had a managed instance group scaled via network load, which works, is there any way to access al members of that managed instance group with a single alias that doesn't involve load balancing?

This issue is on the critical path, we can't reasonably scale to more zones until we have this worked out. Any help is appreciated.

Steve Timm

#9 Updated by Steven Timm almost 4 years ago

  • Priority changed from Low to High
  • Parent task deleted (#13986)

Solomon responded,

Sounds good! IIUC, you're just blocked on the HealthCheck honestly (see inline).

On Wed, Nov 2, 2016 at 11:34 AM, Steven C Timm <> wrote:
This is a followup to last week's questions re. load balancing.

First a background of what we are trying to do:

a) our code is served via an http-backed FUSE file system called CVMFS. We fetch it from a remote
server and use a squid proxy or proxies local to the google cloud to cache it, since all jobs are fetching
mostly the same stuff. This is approximately 2.5GB per VM launched. there is caching on each instance
and also on the central squid server.

b) our database is served via FronTier which is a web service. We cache this on a central squid server and
again once on each node. There is about 500MB of action on any one job.

c) Our goal was to create a Managed Instance Group in each zone, which would be scaled via the load balancer
number of incoming requests.

Yep, just enable Autoscaling on the group (Note: Don't use the multi-zone / "regional" option on the Managed Instance Group, that forces it to be uniform across zones, which is not what you want).

d) we would then use Cross-region Load balancing to define one name so that no matter which zone we are
running in, we can have our on-vms configuration point to one fixed location for the squid server and then
have the cross-region load balancer sort out where to go.

Yep!

e) In attempting to set up the cross-region load balancing, the problem we are having is that the http
health-check does not appear to be well-formed to get a good response from a squid server listening on port 3128.
We can see the health check coming in from the load balancers but the request is denied with a 400 error

130.211.1.179 - - [01/Nov/2016:15:32:48.807 0500] "GET NONE:// HTTP/0.0" 400 1740 TCP_DENIED:NONE 0 " " "" "-"
f) The only kind of http test that a squid server would respond to is one where we requested a URL that is not on the machine in question. i.e. trying to wget://my-squid-server-private-hostname:3128/ is never going to work. There does not apear to be any option to customize the health check to do that as far as we can tell.

You can set the port and the path (https://cloud.google.com/compute/docs/reference/latest/httpHealthChecks). Do you need to do more?

Now that I have project viewer access, can you send me the URL to your console view of the load balancer? (Did you set it up via the UI or via gcloud / the API?).

g) Solomon said that global load balancing works only for https but all the documentation we have
seen indicates otherwise, that http load balancing works too.
I meant as opposed to raw TCP/UDP ;). Our L3 product (raw TCP/UDP) isn't global, while our L7 one for HTTP is.

So the questions:

1) Do you have any worked examples of someone who has used cross-region load balancing to have a set of squid servers served behind a load balancer? If so, what health check did they use?

Yes, the HTTP health check ;).

And then various questions related to fall back positions:

2) if we are not doing load balancing, but just one squid server per zone, which would almost be enough, would there be any way to have a forwarder point to just the single squid server in the closest zone?

You can name them the same thing in each zone (instance names are only unique per zone). So you can use the built-in DNS to hit: my-zone-squid (or even concat the known zone to the name). There isn't any sort of "choose the closest" built in.

3) If we just had a managed instance group scaled via network load, which works, is there any way to access al members of that managed instance group with a single alias that doesn't involve load balancing?
What do you mean by "access"? You can get the list of instances via the API, but you can't just forward

This issue is on the critical path, we can't reasonably scale to more zones until we have this worked out. Any help is appreciated.

Steve Timm

#10 Updated by Steven Timm almost 4 years ago

And my response to him:

HI Solomon--

You ask:

"You can set the port and the path (https://cloud.google.com/compute/docs/reference/latest/httpHealthChecks). Do you need to do more?"

HttpHealthChecks - Google Cloud Platform
cloud.google.com
Property name Value Description Notes; checkIntervalSec: integer: How often (in seconds) to send a health check. The default value is 5 seconds. creationTimestamp

I believe we do, because whatever path that we set, it tries to find the file being served on the instance it
is checking. For instance, if the squid server is 10.128.0.9, and the path is /index.html
then it will try to

wget http://10.128.0.9:3128/index.html

which will fail

but

export http_proxy=http://10.128.0.9:3128/
wget http://www.cnn.com

will work,.

i.e. it appears to me that there's no way to tell the health check it is checking a proxy and give an alternate HOST name in the URL.

Our health checks can be seen here:

https://console.cloud.google.com/compute/healthChecks?project=fermilab-poc

Google Cloud Platform
console.cloud.google.com
Google Cloud Platform lets you build, deploy, and scale applications, websites, and services on the same infrastructure as Google.

The fall-back answers you gave, look straightforward but we would really like to get the load balancer
working if we can.

Steve Timm

#11 Updated by Steven Timm almost 4 years ago

  • Due date set to 11/02/2016
  • Start date set to 10/25/2016
  • % Done changed from 0 to 70
  • Estimated time set to 24.00 h

#12 Updated by Steven Timm almost 4 years ago

  • Status changed from Work in progress to Resolved
  • % Done changed from 70 to 100

This got split among several tasks but now we can say that it is up and we have a working internal load balancer. Marking as resolved.

#13 Updated by Steven Timm almost 4 years ago

  • Status changed from Resolved to Closed


Also available in: Atom PDF