I think it must be a relatively common use case to load a model and invoke an endpoint to call R's predict(object, newdata, ...)
function. I wanted to do this with a custom AWS Sagemaker container, using plumber
on the R side. This example gives all the details, I think, and this bit of documentation also explains how the container should be built and react.
I followed the steps of these documents, but I get
The primary container for production variant AllTraffic did not pass the ping health check. Please check CloudWatch logs for this endpoint.
in the Sagemaker console after a couple of long minutes, and the endpoint creation fails.
This is my container:
# --- Dockerfile
FROM rocker/r-base
RUN apt-get -y update && apt-get install -y libsodium-dev libcurl4-openssl-dev
RUN apt-get install -y
ca-certificates
RUN R -e "install.packages(c('lme4', 'plumber'))"
ADD ./plumber.R /
ENTRYPOINT ["R", "-e", "plumber::pr_run(plumber::pr('plumber.R'), port=8080)",
"--no-save"]
# --- plumber.R
library(plumber)
library(lme4)
prefix <- '/opt/ml'
print(dir('/opt/ml', recursive = TRUE))
model <- readRDS(file.path(prefix, 'model', 'model.RDS'))
#* @apiTitle Guess the likelihood of something
#' Ping to show server is there
#' @get /ping
function() {
print(paste('successfully pinged at', Sys.time()))
return('')}
#' Parse input and return prediction from model
#' @param req The http request sent
#' @post /invocations
function(req) {
print(paste('invocation triggered at', Sys.time()))
conn <- textConnection(gsub('\\n', '
', req$postBody))
data <- read.csv(conn)
close(conn)
print(data)
predict(model, data,
allow.new.levels = TRUE,
type = 'response')
}
And then the endpoint is created using this code:
# run_on_sagemaker.py
# [...]
create_model_response = sm.create_model(
ModelName=model_name,
ExecutionRoleArn=role,
PrimaryContainer={
'Image': image_uri,
'ModelDataUrl': s3_model_location
}
)
create_endpoint_config_response = sm.create_endpoint_config(
EndpointConfigName=endpoint_config_name,
ProductionVariants=[{
'InstanceType': instance_type,
'InitialInstanceCount': 1,
'ModelName': model_name,
'VariantName': 'AllTraffic'}])
print("Endpoint Config Arn: " + create_endpoint_config_response['EndpointConfigArn'])
print('Endpoint Response:')
create_endpoint_response = sm.create_endpoint(
EndpointName=endpoint_name,
EndpointConfigName=endpoint_config_name)
print(create_endpoint_response['EndpointArn'])
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Status: " + status)
try:
sm.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)
finally:
resp = sm.describe_endpoint(EndpointName=endpoint_name)
status = resp['EndpointStatus']
print("Arn: " + resp['EndpointArn'])
print("Status: " + status)
if status != 'InService':
raise Exception('Endpoint creation did not succeed')
print(create_model_response['ModelArn'])
Most code is actually copied from the abovementioned example, the most significant difference I note is that in my container the model is loaded right away while in the example it loads the model object every time an invocation is made (which must be slowing responses down, so i wonder, why?).
The logs on Cloudwatch equal the output of the container when it's run locally and indicate no failure. Locally I can query the container with
curl -d "data
in
csv
format" -i localhost:8080/invocations
and it works fine and gives back a prediction for every row in the POST data. Also, curl localhost:8080/ping
returns [""]
, as it should, I think. And it shows no signs of being slow, the model object is a 4.4MiB in size (although this is to be extended greatly once this simple version runs).
The error on the terminal is
Traceback (most recent call last):
File "run_on_sagemaker.py", line 57, in <module>
sm.get_waiter('endpoint_in_service').wait(EndpointName=endpoint_name)
File "[...]/lib/python3.8/site-packages/botocore/waiter.py", line 53, in wait
Waiter.wait(self, **kwargs)
File "[...]/lib/python3.8/site-packages/botocore/waiter.py", line 320, in wait
raise WaiterError(
botocore.exceptions.WaiterError: Waiter EndpointInService failed: Waiter encountered a terminal failure state
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run_on_sagemaker.py", line 64, in <module>
raise Exception('Endpoint creation did not succeed')
So, why is this failing on the Sagemaker console? Is this a good way, are there better ways, and how can I do further diagnostics? Generally, I also could not get the AWS example (see above) for your own R container running, so I wonder what the best way to run R predictions of a Sagemaker model is.
question from:
https://stackoverflow.com/questions/66048199/what-is-the-best-and-correct-way-of-hosting-an-endpoint-running-r-code