问题的背景如下,我在c中使用套接字从网站获取内容,内容以gzip编码。 我想直接从流中读取内容并使用zlib对gzip内容进行编码。 但我如何知道gzip内容已启动且http标头已完成。
在我看来,我大致尝试了两种给我一些奇怪结果的方法。 首先,我在整个流中读取,并在终端中打印出来,我的http标题以“\ r \ n \ n \ n \ n”结束,就像我预期的那样,但是时间紧迫,我只需要检索一次响应以获取标题然后使用while循环读取内容,此处标题结束时不带“\ r \ n \ n \ n \ n”。
为什么? 哪种方式是阅读内容的正确方法?
//first way (gives rnrn) char *output, *output_header, *output_content, **output_result; size_t size; FILE *stream; stream = open_memstream (&output, &size); char BUF[BUFSIZ]; while(recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0) { fprintf (stream, "%s", BUF); } fflush(stream); fclose(stream); output_result = str_split(output, "\r\n\r\n"); output_header = output_result[0]; output_content = output_result[1]; printf("Header:\n%s\n", output_header); printf("Content:\n%s\n", output_content);
//second way (doesnt give rnrn) char *content, *output_header; size_t size; FILE *stream; stream = open_memstream (&content, &size); char BUF[BUFSIZ]; if((recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0) { output_header = BUF; } while(recv(socket_desc, BUF, (BUFSIZ - 1), 0) > 0) { fprintf (stream, "%s", BUF); //i would just use this as input stream to zlib } fflush(stream); fclose(stream); printf("Header:\n%s\n", output_header); printf("Content:\n%s\n", content);
缓冲区中。 这不是读取HTTP响应的正确方法,尤其是在使用HTTP保持活动时(在这种情况下,在响应结束时不会发生断开连接)。 您必须遵循RFC 2616中列出的规则。 即:
序列。 不要再读取更多的字节了。 -
根据RFC 2616第4.4节中的规则分析收到的标头。 它们会告诉您剩余响应数据的实际格式。
如果响应使用HTTP 1.1,则检查收到的标头是否存在
Connection: close
标头;如果响应使用HTTP 0.9或1.0,则检查缺少Connection: keep-alive
标头。 如果检测到,请关闭套接字连接的末尾,因为服务器正在关闭它的末尾。 否则,保持连接打开并重新使用它以用于后续请求(除非您使用连接完成,在这种情况下请关闭它)。 -
string headers[]; byte data[]; string statusLine = read a CRLF-delimited line; int statusCode = extract from status line; string responseVersion = extract from status line; do { string header = read a CRLF-delimited line; if (header == "") break; add header to headers list; } while (true); if ( !((statusCode in [1xx, 204, 304]) || (request was "HEAD")) ) { if (headers["Transfer-Encoding"] ends with "chunked") { do { string chunk = read a CRLF delimited line; int chunkSize = extract from chunk line; if (chunkSize == 0) break; read exactly chunkSize number of bytes into data storage; read and discard until a CRLF has been read; } while (true); do { string header = read a CRLF-delimited line; if (header == "") break; add header to headers list; } while (true); } else if (headers["Content-Length"] is present) { read exactly Content-Length number of bytes into data storage; } else if (headers["Content-Type"] begins with "multipart/") { string boundary = extract from Content-Type header; read into data storage until terminating boundary has been read; } else { read bytes into data storage until disconnected; } } if (!disconnected) { if (responseVersion == "HTTP/1.1") { if (headers["Connection"] == "close") close connection; } else { if (headers["Connection"] != "keep-alive") close connection; } } check statusCode for errors; process data contents, per info in headers list;