[robot]送出HttpWebRequest以及接收Response(get,post)

  • 2688
  • 0
  • 2018-03-29

摘要:[robot]送出HttpWebRequest(get,post)

以ie為例,觀察fiddler之後的範例

GET:

原始的fiddler的raw資料:

GET http://w2.land.taipei.gov.tw/land4/loina.asp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Accept-Language: zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
Accept-Encoding: gzip, deflate
Host: w2.land.taipei.gov.tw
DNT: 1
Proxy-Connection: Keep-Alive
Cookie: ASPSESSIONIDQQDQDBBB=KPCIMLIBOFBMJBGAMPIBKAPL
 

C#:


HttpWebRequest request;
CookieContainer cookies = new CookieContainer();
string url = "http://w2.land.taipei.gov.tw/land4/loina.asp";

string html = "";

request = WebRequest.Create(url) as HttpWebRequest;
//如果需要使用proxy的話....
WebProxy _proxy = new WebProxy("http://myproxy.com.tw:8888", true);
_proxy.Credentials = CredentialCache.DefaultCredentials;                 
request.Proxy = _proxy;
//end of proxy
request.Method = "GET";
request.Accept = "text/html, application/xhtml+xml, */*";
request.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
//client跟server說,我要使用的加密方式是gzip,server會看設定才決定是否採用
request.Headers.Set("Accept-Encoding", "gzip, deflate");
//如果對方回傳的資料有用gzip加密的話,會自動用gzip方式解開, 沒加這行的話,可能解不開
request.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
request.Host = "w2.land.taipei.gov.tw";
request.CookieContainer = cookies;
//以下這是預設值true, 有時候故意設定為false,就會抓不到html囉
request.KeepAlive = true;
		   
using (var response = (HttpWebResponse)request.GetResponse())
{
	using (var responseStream = response.GetResponseStream())
	{
		using (var reader = new StreamReader(responseStream, Encoding.Default))
		{
			html = reader.ReadToEnd();
		}
	}
}          

ps.20161009補充chrome的參考程式碼:

string result = "";
HttpWebRequest request;
CookieContainer cookies = new CookieContainer();            
string url = "www.yoururl.com";
request = WebRequest.Create(url) as HttpWebRequest;
request.Method = "GET";
request.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8";
request.Headers.Set("Accept-Encoding", "gzip, deflate, sdch");
request.Headers.Set("Accept-Language", "zh-TW,zh;q=0.8,en-US;q=0.6,en;q=0.4,zh-CN;q=0.2");
request.UserAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36";
request.CookieContainer = cookies;
request.Headers.Set("Upgrade-Insecure-Requests", "1");
//以下這是預設值true, 有時候故意設定為false,就會抓不到html囉
request.KeepAlive = true;

using (var response = (HttpWebResponse)request.GetResponse())
{
	using (var responseStream = response.GetResponseStream())
	{
		using (var reader = new StreamReader(responseStream, Encoding.UTF8))
		{
			result = reader.ReadToEnd();
		}
	}
}     

return result;

POST:

fiddler原始raw資料:

POST http://w2.land.taipei.gov.tw/land4/loina.asp HTTP/1.1
Accept: text/html, application/xhtml+xml, */*
Referer: http://w2.land.taipei.gov.tw/land4/loina.asp
Accept-Language: zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip, deflate
Proxy-Connection: Keep-Alive
Content-Length: 40
DNT: 1
Host: w2.land.taipei.gov.tw
Pragma: no-cache
Cookie: ASPSESSIONIDQQDQDBBB=KPCIMLIBOFBMJBGAMPIBKAPL
 
destrict=03&section=&land_mom=&land_son=
 

C#:


HttpWebRequest requestPost;
CookieContainer cookiesPost = new CookieContainer();
requestPost = WebRequest.Create(url) as HttpWebRequest;
string html = "";
string postData = "destrict=03§ion=&land_mom=&land_son=";//行政區選擇中正區, 有特殊字元記得HttpUtility.UrlEncode
requestPost.Method = "POST";
requestPost.Accept = "text/html, application/xhtml+xml, */*";
requestPost.Referer = "http://w2.land.taipei.gov.tw/land4/loina.asp";
requestPost.Headers.Set("Accept-Language", "zh-Hant-TW,zh-Hant;q=0.8,en-US;q=0.5,en;q=0.3");
requestPost.UserAgent = "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko";
requestPost.ContentType = "application/x-www-form-urlencoded";
requestPost.Headers.Set("Accept-Encoding", "gzip, deflate");
requestPost.AutomaticDecompression = DecompressionMethods.Deflate | DecompressionMethods.GZip;
requestPost.ContentLength = postData.Length;
requestPost.Host = "w2.land.taipei.gov.tw";
requestPost.Headers.Set("Pragma", "no-cache");
requestPost.CookieContainer = cookiesPost;
//碰到(417) Expectation Failed錯誤的時候,把下面這行加上去
//System.Net.ServicePointManager.Expect100Continue = false;

using (var stream = requestPost.GetRequestStream())
{
	using (var writer = new StreamWriter(stream))
	{
		writer.Write(postData.ToString());
		writer.Flush();
		writer.Close();
	}
	stream.Close();
}

using (var response = (HttpWebResponse)requestPost.GetResponse())
{
	using (var responseStream = response.GetResponseStream())
	{
		using (var reader = new StreamReader(responseStream, Encoding.Default))
		{
			html = reader.ReadToEnd();
		}
	}
}

ps.記得如果重複使用request變數的話,每次都要重新設定Method,Accept,Referer,Accept-Language....

因為.net在每次送出request之後,會把上述header都reset掉

補充20151116:

一般來說,在網站要抓東西時,常常是連續好幾個request + response才能取得目標的資料,每一次每一次的request都要重新

設定相關header條件喔,例如不能偷懶只設定第一次request的request.Accept = "text/html, application/xhtml+xml, */*";

這樣子是抓不出資料來的...因為.net的預設似乎會將上一次的request設定的header的內容清空

(你看debug模式顯示的變數狀態,是顯示沒清空的,但是你如果只設定第一次request的header,事實上就是完全查不出資料喔)

除此之外,cookies也一定每次都要帶入喔,因為連續的request的狀態的連接,有時候是用ViewState,有時候是用cookies

ps.補充20151118:ViewState, ViewStateGenerator, EventValidation這三個參數會在傳統的asp.net web form出現,如果出現的話,三個要一起改

ps.補充20160325:如果要改用其他瀏覽器(例如:chrome),再利用fiddler觀察request以及response的內容與上面文章的差異之後,改成其他瀏覽器的header即可

PS. 補充20170720:如果是https且必需為TLS1.2較高等級的傳輸加密的話,需加上以下喔:(需要在電腦安裝framework4.5才能正常運作喔,程式碼專案的版本設定為4.0 or 4.5都可以)

ServicePointManager.Expect100Continue = true;
ServicePointManager.SecurityProtocol = (SecurityProtocolType)3072; 
ServicePointManager.DefaultConnectionLimit = 9999;