Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
99 views
in Technique[技术] by (71.8m points)

c - Differ between header and content of http server response (sockets)

i want to know, is there a possibility to find out where in the response Stream the header ends?

The background of the question is as following, i am using sockets in c to get content from a website, the content is encoded in gzip. I would like to read the content directly from stream and encode the gzip content with zlib. But how do i know the gzip content started and the http header is finished.

I roughly tried two ways which are giving me some, in my opinion, strange results. First, i read in the whole stream, and print it out in terminal, my http header ends with " " like i expected, but the secound time, i just retrieve the response once to get the header and then read the content with while loop, here the header ends without " ".

Why? And which way is the right way to read in the content?

I'll just give you the code so you could see how i'm getting the response from server.

//first way (gives rnrn)
char *output, *output_header, *output_content, **output_result;
size_t size;
FILE *stream;
stream = open_memstream (&output, &size);
char BUF[BUFSIZ];
while(recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0)
{
    fprintf (stream, "%s", BUF);
}
fflush(stream);
fclose(stream);

output_result = str_split(output, "

");
output_header = output_result[0];
output_content = output_result[1];

printf("Header:
%s
", output_header);
printf("Content:
%s
", output_content);

.

//second way (doesnt give rnrn)
char *content, *output_header;
size_t size;
FILE *stream;
stream = open_memstream (&content, &size);
char BUF[BUFSIZ];

if((recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0)
{
    output_header = BUF;
}

while(recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0)
{
    fprintf (stream, "%s", BUF); //i would just use this as input stream to zlib
}
fflush(stream);
fclose(stream);

printf("Header:
%s
", output_header);
printf("Content:
%s
", content);

Both give the same result printing them to terminal, but the secound one should print out some more breaks, at least i expect, because they get lost splitting the string.

I am new to c, so i might just oversee some easy stuff.

Question&Answers:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

You are calling recv() in a loop until the socket disconnects or fails (and writing the received data to your stream the wrong way), storing all of the raw data into your char* buffer. That is not the correct way to read an HTTP response, especially if HTTP keep-alives are used (in which case no disconnect will occur at the end of the response). You must follow the rules outlined in RFC 2616. Namely:

  1. Read until the " " sequence is encountered. This terminates the response headers. Do not read any more bytes past that yet.

  2. Analyze the received headers, per the rules in RFC 2616 Section 4.4. They tell you the actual format of the remaining response data.

  3. Read the remaining data, if any, per the format discovered in #2.

  4. Check the received headers for the presence of a Connection: close header if the response is using HTTP 1.1, or the lack of a Connection: keep-alive header if the response is using HTTP 0.9 or 1.0. If detected, close your end of the socket connection because the server is closing its end. Otherwise, keep the connection open and re-use it for subsequent requests (unless you are done using the connection, in which case do close it).

  5. Process the received data as needed.

In short, you need to do something more like this instead (pseudo code):

string headers[];
byte data[];

string statusLine = read a CRLF-delimited line;
int statusCode = extract from status line;
string responseVersion = extract from status line;

do
{
    string header = read a CRLF-delimited line;
    if (header == "") break;
    add header to headers list;
}
while (true);

if ( !((statusCode in [1xx, 204, 304]) || (request was "HEAD")) )
{
    if (headers["Transfer-Encoding"] ends with "chunked")
    {
        do
        {
            string chunk = read a CRLF delimited line;
            int chunkSize = extract from chunk line;
            if (chunkSize == 0) break;

            read exactly chunkSize number of bytes into data storage;

            read and discard until a CRLF has been read;
        }
        while (true);

        do
        {
            string header = read a CRLF-delimited line;
            if (header == "") break;
            add header to headers list;
        }
        while (true);
    }
    else if (headers["Content-Length"] is present)
    {
        read exactly Content-Length number of bytes into data storage;
    }
    else if (headers["Content-Type"] begins with "multipart/")
    {
        string boundary = extract from Content-Type header;
        read into data storage until terminating boundary has been read;
    }
    else
    {
        read bytes into data storage until disconnected;
    }
}

if (!disconnected)
{
    if (responseVersion == "HTTP/1.1")
    {
        if (headers["Connection"] == "close")
            close connection;
    }
    else
    {
        if (headers["Connection"] != "keep-alive")
            close connection;
    }
}

check statusCode for errors;
process data contents, per info in headers list;

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...