摘要:[robot]送出HttpWebRequest(get,post)
以ie為例,觀察fiddler之後的範例
GET:
原始的fiddler的raw資料:
C#:
HttpWebRequest request;
CookieContainer cookies = new CookieContainer();
string url = "http://w2.land.taipei.gov.tw/land4/loina.asp";
string html = "";
request = WebRequest.Create(url) as HttpWebRequest;
//如果需要使用proxy的話....
WebProxy _proxy = new WebProxy("http://myproxy.com.tw:8888", true);
_proxy.Credentials = CredentialCache.DefaultCredentials;
request.Proxy = _proxy;
//end of proxy
request.Method = "GET";
request.Accept = "text/html, application/xhtml+xml, */*";
request.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
//client跟server說,我要使用的加密方式是gzip,server會看設定才決定是否採用
request.Headers.Set("Accept-Encoding", "gzip, deflate");
//如果對方回傳的資料有用gzip加密的話,會自動用gzip方式解開, 沒加這行的話,可能解不開
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Host = "w2.land.taipei.gov.tw";
request.CookieContainer = cookies;
//以下這是預設值true, 有時候故意設定為false,就會抓不到html囉
request.KeepAlive = true;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, Encoding.Default))
{
html = reader.ReadToEnd();
}
}
}
ps.20161009補充chrome的參考程式碼:
string result = "";
HttpWebRequest request;
CookieContainer cookies = new CookieContainer();
string url = "www.yoururl.com";
request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.Headers.Set("Accept-Encoding", "gzip, deflate, sdch");
request.Headers.Set("Accept-Language", "zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4,zh-CN;q=0.2");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
request.CookieContainer = cookies;
request.Headers.Set("Upgrade-Insecure-Requests", "1");
//以下這是預設值true, 有時候故意設定為false,就會抓不到html囉
request.KeepAlive = true;
using (var response = (HttpWebResponse)request.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, Encoding.UTF8))
{
result = reader.ReadToEnd();
}
}
}
return result;
POST:
fiddler原始raw資料:
C#:
HttpWebRequest requestPost;
CookieContainer cookiesPost = new CookieContainer();
requestPost = WebRequest.Create(url) as HttpWebRequest;
string html = "";
string postData = "destrict=03§ion=&land_mom=&land_son=";//行政區選擇中正區, 有特殊字元記得HttpUtility.UrlEncode
requestPost.Method = "POST";
requestPost.Accept = "text/html, application/xhtml+xml, */*";
requestPost.Referer = "http://w2.land.taipei.gov.tw/land4/loina.asp";
requestPost.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
requestPost.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
requestPost.ContentType = "application/x-www-form-urlencoded";
requestPost.Headers.Set("Accept-Encoding", "gzip, deflate");
requestPost.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
requestPost.ContentLength = postData.Length;
requestPost.Host = "w2.land.taipei.gov.tw";
requestPost.Headers.Set("Pragma", "no-cache");
requestPost.CookieContainer = cookiesPost;
//碰到(417) Expectation Failed錯誤的時候,把下面這行加上去
//System.Net.ServicePointManager.Expect100Continue = false;
using (var stream = requestPost.GetRequestStream())
{
using (var writer = new StreamWriter(stream))
{
writer.Write(postData.ToString());
writer.Flush();
writer.Close();
}
stream.Close();
}
using (var response = (HttpWebResponse)requestPost.GetResponse())
{
using (var responseStream = response.GetResponseStream())
{
using (var reader = new StreamReader(responseStream, Encoding.Default))
{
html = reader.ReadToEnd();
}
}
}
ps.記得如果重複使用request變數的話,每次都要重新設定Method,Accept,Referer,Accept-Language....
因為.net在每次送出request之後,會把上述header都reset掉
補充20151116:
一般來說,在網站要抓東西時,常常是連續好幾個request + response才能取得目標的資料,每一次每一次的request都要重新
設定相關header條件喔,例如不能偷懶只設定第一次request的request.Accept = "text/html, application/xhtml+xml, */*";
這樣子是抓不出資料來的...因為.net的預設似乎會將上一次的request設定的header的內容清空
(你看debug模式顯示的變數狀態,是顯示沒清空的,但是你如果只設定第一次request的header,事實上就是完全查不出資料喔)
除此之外,cookies也一定每次都要帶入喔,因為連續的request的狀態的連接,有時候是用ViewState,有時候是用cookies
ps.補充20151118:ViewState, ViewStateGenerator, EventValidation這三個參數會在傳統的asp.net web form出現,如果出現的話,三個要一起改喔
ps.補充20160325:如果要改用其他瀏覽器(例如:chrome),再利用fiddler觀察request以及response的內容與上面文章的差異之後,改成其他瀏覽器的header即可
PS. 補充20170720:如果是https且必需為TLS1.2較高等級的傳輸加密的話,需加上以下喔:(需要在電腦安裝framework4.5才能正常運作喔,程式碼專案的版本設定為4.0 or 4.5都可以)
ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol = (SecurityProtocolType)3072;
ServicePointManager.DefaultConnectionLimit = 9999;