使用libcurl获取cookies与HTTP消息头

2014年5月7日 由 Creater 留言 »

最近有种想翻译libcurl官方的英文文档的冲动,也想深入的学学这个库,我知道如果去阅读这种有很多选项的库的源代码是一件很痛苦的事,所以我也不打算去读他的源代码。近期计划就是翻译官方提供的文档和写一个蜘蛛。

 

curl是个很好的工具(这里有以前的一个链接),libcurl则提供了一个丰富的编程库可供我们使用(这里有以前一个链接),官方提供了详尽的示例与文档(这里是官方链接)。

 

既然是学习,就得有所记录,在贴源代码之前,我先贴下我的一个小工具,这个工具用来执行一些控制台命令或者Win32控制台程序,提供便利的结果复制,但是前提是程序中输出都是标准输出STDOUT。因为系统自带的命令提示符黑窗口,在复制这块确实恶心,而且输出太多的话不能向上滚动。

QQ图片20140507194257

至于libcurl的使用,手册我还在翻译与修改之中,后面完成后会贴出来,先示意下如何看消息头与cookies。

#include "stdafx.h"
#include <curl/curl.h>
#include <iostream>
using namespace std;
#pragma comment(lib, "curllib.lib")

size_t process_data(void *buffer, size_t size, size_t nmemb, void *user_p)
{
	FILE *fp = (FILE *)user_p;
	size_t return_size = fwrite(buffer, size, nmemb, fp);
	return return_size;
}

int main(int argc, _TCHAR* argv[])
{
	CURLcode return_code;
	return_code = curl_global_init(CURL_GLOBAL_WIN32);
	if (CURLE_OK != return_code)
	{
		cerr << "初始化libcurl失败" << endl;
		return 1;
	}

	CURL *easy_handle = curl_easy_init();
	if (NULL == easy_handle)
	{
		cerr << "获取CURL handler失败" << endl;
		curl_global_cleanup(); 
		return 1;
	}

	FILE *fp = fopen("data.html", "ab+");
	char *url = "http://www.baidu.com";	

	curl_easy_setopt(easy_handle, CURLOPT_URL, url);
	curl_easy_setopt(easy_handle, CURLOPT_WRITEFUNCTION, &process_data);
	curl_easy_setopt(easy_handle, CURLOPT_WRITEDATA, fp);

	curl_easy_setopt(easy_handle, CURLOPT_VERBOSE, 1L);
	curl_easy_setopt(easy_handle, CURLOPT_COOKIEFILE, ""); /* just to start the cookie engine */

	curl_easy_perform(easy_handle);
	cout<<"Starting ..."<<endl;
	struct curl_slist *cookies = NULL;  
    return_code = curl_easy_getinfo(easy_handle, CURLINFO_COOKIELIST, &cookies); 
	if(return_code != CURLE_OK)
	{
		curl_global_cleanup();
		fclose(fp);
		return 1;
	}
	struct curl_slist *nc = cookies;
	int i = 1;  
    while (nc) {  
		cout<<"["<<i<<"]:"<<nc->data<<endl;
        nc = nc->next;  
        i++;  
    }  
    if (i == 1) {  
		cout<<"none"<<endl;
    }  
    curl_slist_free_all(cookies);  
	fclose(fp);
	cout<<"End ..."<<endl;
	system("pause");
	return 0;

}

以下首先是提示信息,另外提供一个http协议的讲义

  HTTP协议详解.pdf (unknown, 499 hits)

* About to connect() to www.baidu.com port 80 (#0)
*   Trying 119.75.217.56... * connected
* Connected to www.baidu.com (119.75.217.56) port 80 (#0)

其次是向百度服务器的get请求

> GET / HTTP/1.1
Host: www.baidu.com
Accept: */*

接着是百度服务器返回的头部,返回码为200。

< HTTP/1.1 200 OK
< Date: Wed, 07 May 2014 11:45:29 GMT
< Content-Type: text/html; charset=utf-8
< Transfer-Encoding: chunked
< Connection: Keep-Alive
< Vary: Accept-Encoding
* Added cookie BAIDUID="03009B14A2FACCA77E057B525306E577:FG=1" for domain baidu.
com, path /, expire -748020531
< Set-Cookie: BAIDUID=03009B14A2FACCA77E057B525306E577:FG=1; expires=Thu, 31-Dec
-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
* Added cookie BDSVRTM="0" for domain www.baidu.com, path /, expire 0
< Set-Cookie: BDSVRTM=0; path=/
* Added cookie H_PS_PSSID="4392_1445_5223_4760_6017_6382_6400" for domain baidu.
com, path /, expire 0
< Set-Cookie: H_PS_PSSID=4392_1445_5223_4760_6017_6382_6400; path=/; domain=.bai
du.com
< P3P: CP=" OTI DSP COR IVA OUR IND COM "
< Cache-Control: private
< Expires: Wed, 07 May 2014 11:45:21 GMT
< X-Powered-By: HPHP
< Server: BWS/1.1
< BDPAGETYPE: 1
< BDQID: 0xf02f6c2900035fb1
< BDUSERID: 0
<
* Connection #0 to host www.baidu.com left intact

最后是输出cookies

Starting ...
[1]:.baidu.com  TRUE    /       FALSE   -748020531      BAIDUID 03009B14A2FACCA7
7E057B525306E577:FG=1
[2]:www.baidu.com       FALSE   /       FALSE   0       BDSVRTM 0
[3]:.baidu.com  TRUE    /       FALSE   0       H_PS_PSSID      4392_1445_5223_4
760_6017_6382_6400
End ...
请按任意键继续. . .
广告位

评论已关闭.