2013年6月13日 由 Creater 留言 »

网页抓取和ftp访问是目前很常见的一个应用需要,无论是搜索引擎的爬虫,分析程序,资源获取程序,WebService等等都是需要的,自己开发抓取库当然是最好了,不过开发需要时间和周期,使用现有的Open source程序是个更好的选择,一来别人已经写的很好了,二来自己使用起来非常快速,三来还能够学习一下别人程序的优点。


Libwww 是一个用C语言写成的高度模组化用户端的网页存取API 。


libcurl为一个免费开源的,客户端url传输库,支持FTP,FTPS,TFTP,HTTP,HTTPS,GOPHER,TELNET,DICT,FILE和LDAP,跨平台(支持 Windows,Unix,Linux等),线程安全,支持Ipv6,并且易于使用。


Free Software and Open Source projects have a long tradition of forks and duplicate efforts. We enjoy “doing it ourselves”, no matter if someone else has done something very similar already. Free/open libraries that cover parts of libcurl’s features:
libcurl (MIT)
a highly portable and easy-to-use client-side URL transfer library, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TELNET, DICT, FILE, TFTP and LDAP. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading, kerberos, HTTP form based upload, proxies, cookies, user+password authentication, file transfer resume, http proxy tunnelling and more!
libghttp (LGPL)
Having a glance at libghttp (a gnome http library), it looks as if it works rather similar to libcurl (for http). There’s no web page for this and the person who’s email is mentioned in the README of the latest release I found claims he has passed the leadership of the project to “eazel”. Popular choice among GNOME projects.
libwww (W3C license) comparison with libcurl
More complex, and and harder to use than libcurl is. Includes everything from multi-threading to HTML parsing. The most notable transfer-related feature that libcurl does not offer but libwww does, is caching.
libferit (GPL)
C++ library “for transferring files via http, ftp, gopher, proxy server”. Based on ‘snarf’ 2.0.9-code (formerly known as libsnarf). Quote from freshmeat: “As the author of snarf, I have to say this frightens me. Snarf’s networking system is far from robust and complete. It’s probably full of bugs, and although it works for maybe 85% of all current situations, I wouldn’t base a library on it.”
neon (LGPL)
An HTTP and WebDAV client library, with a C interface. I’ve mainly heard and seen people use this with WebDAV as their main interest.
(LGPL) comparison with libcurl
Part of glib (GNOME). Supports: HTTP 1.1, Persistent connections, Asynchronous DNS and transfers, Connection cache, Redirects, Basic, Digest, NTLM authentication, SSL with OpenSSL or Mozilla NSS, Proxy support including SSL, SOCKS support, POST data. Probably not very portable. Lacks: cookie support, NTLM for proxies, GSS, gzip encoding, trailers in chunked responses and more.
mozilla netlib (MPL)
Handles URLs, protocols, transports for the Mozilla browser.
mozilla libxpnet (MPL)
Minimal download library targeted to be much smaller than the above mentioned netlib. HTTP and FTP support.
wget (GPL)
While not a library at all, I’ve been told that people sometimes extract the network code from it and base their own hacks from there.
libfetch (BSD)
Does HTTP and FTP transfers (both ways), supports file: URLs, and an API for URL parsing. The utility fetch that is built on libfetch is an integral part of the FreeBSD operating system.
HTTP Fetcher (LGPL)
” a small, robust, flexible library for downloading files via HTTP using the GET method. ”
http-tiny (Artistic License)
” a very small C library to make http queries (GET, HEAD, PUT, DELETE, etc.) easily portable and embeddable ”
XMLHTTP Object also known as IXMLHTTPRequest (part of MSXML 3.0)
(Windows) Provides client-side protocol support for communication with HTTP servers. A client computer can use the XMLHTTP object to send an arbitrary HTTP request, receive the response, and have the Microsoft? XML Document Object Model (DOM) parse that response.
QHttp (GPL)
QHttp is a class in the Qt library from Troll Tech. Seems to be restricted to plain HTTP. Supports GET, POST and proxy. Asynchronous.
ftplib (GPL)
” a set of routines that implement the FTP protocol. They allow applications to create and access remote files through function calls instead of needing to fork and exec an interactive ftp client program.”
ftplibpp (GPL)
A C++ library for “easy FTP client functionality. It features resuming of up- and downloads, FXP support, SSL/TLS encryption, and logging functionality.”
GNU Common C++ library
Has a URLStream class. This C++ class allow you to download a file using HTTP. See demo/urlfetch.cpp in commoncpp2-1.3.19.tar.gz
Java HTTP client library.
Jakarta Commons HttpClient (Apache License)
A Java HTTP client library written by the Jakarta project.



你必须 登陆 方可发表评论.