amazon web services - What is the best and correct way of hosting an endpoint running R code?

Question

Welcome To Ask or Share your Answers For Others

amazon web services - What is the best and correct way of hosting an endpoint running R code?

asked Oct 6, 2021 in Technique[技术] by 深蓝 (71.8m points)

amazon web services - What is the best and correct way of hosting an endpoint running R code?

I think it must be a relatively common use case to load a model and invoke an endpoint to call R's predict(object, newdata, ...) function. I wanted to do this with a custom AWS Sagemaker container, using plumber on the R side. This example gives all the details, I think, and this bit of documentation also explains how the container should be built and react. I followed the steps of these documents, but I get

The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.

in the Sagemaker console after a couple of long minutes, and the endpoint creation fails.

This is my container:

# --- Dockerfile
FROM rocker/r-base
RUN apt-get -y update && apt-get install -y libsodium-dev libcurl4-openssl-dev
RUN apt-get install -y  
    ca-certificates

RUN R -e "install.packages(c('lme4', 'plumber'))"

ADD ./plumber.R /

ENTRYPOINT ["R", "-e", "plumber::pr_run(plumber::pr('plumber.R'), port=8080)", 
            "--no-save"]

# --- plumber.R
library(plumber)
library(lme4)

prefix <- '/opt/ml'
print(dir('/opt/ml', recursive = TRUE))
model <- readRDS(file.path(prefix, 'model', 'model.RDS'))

#* @apiTitle Guess the likelihood of something

#' Ping to show server is there
#' @get /ping
function() {
  print(paste('successfully pinged at', Sys.time()))
  return('')}

#' Parse input and return prediction from model
#' @param req The http request sent
#' @post /invocations
function(req) {
  print(paste('invocation triggered at', Sys.time()))
  conn <- textConnection(gsub('\\n', '
', req$postBody))
  data <- read.csv(conn)
  close(conn)
  
  print(data)
  
  predict(model, data,
          allow.new.levels = TRUE,
          type = 'response')
}

And then the endpoint is created using this code:

# run_on_sagemaker.py
# [...]
create_model_response = sm.create_model(
    ModelName=model_name,
    ExecutionRoleArn=role,
    PrimaryContainer={
        'Image': image_uri,
        'ModelDataUrl': s3_model_location
    }
)
create_endpoint_config_response = sm.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[{
        'InstanceType': instance_type,
        'InitialInstanceCount': 1,
        'ModelName': model_name,
        'VariantName': 'AllTraffic'}])

print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])

print('Endpoint Response:')
create_endpoint_response = sm.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=endpoint_config_name)
print(create_endpoint_response['EndpointArn'])

resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)

try:
    sm.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)
finally:
    resp = sm.describe_endpoint(EndpointName=endpoint_name)
    status = resp['EndpointStatus']
    print("Arn: " + resp['EndpointArn'])
    print("Status: " + status)
    if status != 'InService':
        raise Exception('Endpoint creation did not succeed')
    print(create_model_response['ModelArn'])

Most code is actually copied from the abovementioned example, the most significant difference I note is that in my container the model is loaded right away while in the example it loads the model object every time an invocation is made (which must be slowing responses down, so i wonder, why?).

The logs on Cloudwatch equal the output of the container when it's run locally and indicate no failure. Locally I can query the container with curl -d "data in csv format" -i localhost:8080/invocations and it works fine and gives back a prediction for every row in the POST data. Also, curl localhost:8080/ping returns [""], as it should, I think. And it shows no signs of being slow, the model object is a 4.4MiB in size (although this is to be extended greatly once this simple version runs).

The error on the terminal is

Traceback (most recent call last):
  File "run_on_sagemaker.py", line 57, in <module>
    sm.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)
  File "[...]/lib/python3.8/site-packages/botocore/waiter.py", line 53, in wait
    Waiter.wait(self, **kwargs)
  File "[...]/lib/python3.8/site-packages/botocore/waiter.py", line 320, in wait
    raise WaiterError(
botocore.exceptions.WaiterError: Waiter EndpointInService failed: Waiter encountered a terminal failure state

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_on_sagemaker.py", line 64, in <module>
    raise Exception('Endpoint creation did not succeed')

So, why is this failing on the Sagemaker console? Is this a good way, are there better ways, and how can I do further diagnostics? Generally, I also could not get the AWS example (see above) for your own R container running, so I wonder what the best way to run R predictions of a Sagemaker model is.

question from:https://stackoverflow.com/questions/66048199/what-is-the-best-and-correct-way-of-hosting-an-endpoint-running-r-code

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

Categories

amazon web services - What is the best and correct way of hosting an endpoint running R code?

amazon web services - What is the best and correct way of hosting an endpoint running R code?

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags