商务网站建设与维护 ppt,一键生成微信小程序,前端简历,wordpress 数字交易CefSharp作为专门为爬虫工具开发的库比Selenium这种开发目的是页面测试工具然后用来做爬虫的工具要贴心得多。我们操作网页的时候发送或者做了某个动作提交表单之后需要知道我们的动作或者提交是否成功#xff0c;因为有的页面会因为网络延迟问题提交失败#xff0c;需要准确… CefSharp作为专门为爬虫工具开发的库比Selenium这种开发目的是页面测试工具然后用来做爬虫的工具要贴心得多。我们操作网页的时候发送或者做了某个动作提交表单之后需要知道我们的动作或者提交是否成功因为有的页面会因为网络延迟问题提交失败需要准确的获取到发送消息后服务器的返回值如果直接通过页面的弹窗获取发送消息后的结果会非常麻烦有时候一个消息发送后会产生多种不同的返回结果可能提交成功可能提交失败可能消息超时等等如果能够直接获取到发送消息的Request无疑会大大方便我们判断。
例如这是点击百度搜索框时产生的GET消息的返回值 CefSharp贴心的为开发者提供了网页运行的不同阶段的回调函数类似于VUE前端框架的钩子函数。CefSharp允许开发者在POST或GET消息发送时修改提交的参数也就是postData还可以拦截修改图片JS文件CSS样式等等这篇文章只是记录如何获取GET或者POST消息提交后直接获取JSON、XML、HTML数据。 这些自定义功能都基于IResourceRequestHandler类首先我们要创建一个新的类继承重写这个类中的方法。
public class ResourceRequestHandler : IResourceRequestHandler{/// summary/// Called on the CEF IO thread before a resource request is loaded. To optionally filter cookies for the request return a/// see crefICookieAccessFilter/ object./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - can be modified in this callback./param/// returnsTo optionally filter cookies for the request return a ICookieAccessFilter instance otherwise return null./returnsICookieAccessFilter IResourceRequestHandler.GetCookieAccessFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request){return GetCookieAccessFilter(chromiumWebBrowser, browser, frame, request);}/// summary/// Called on the CEF IO thread before a resource request is loaded. To optionally filter cookies for the request return a/// see crefICookieAccessFilter/ object./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - can be modified in this callback./param/// returnsTo optionally filter cookies for the request return a ICookieAccessFilter instance otherwise return null./returnsprotected virtual ICookieAccessFilter GetCookieAccessFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request){return null;}/// summary/// Called on the CEF IO thread before a resource is loaded. To specify a handler for the resource return a/// see crefIResourceHandler/ object./// /summary/// param namechromiumWebBrowserThe browser UI control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// returns/// To allow the resource to load using the default network loader return null otherwise return an instance of/// see crefIResourceHandler/ with a valid stream./// /returnsIResourceHandler IResourceRequestHandler.GetResourceHandler(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request){return GetResourceHandler(chromiumWebBrowser, browser, frame, request);}/// summary/// Called on the CEF IO thread before a resource is loaded. To specify a handler for the resource return a/// see crefIResourceHandler/ object./// /summary/// param namechromiumWebBrowserThe browser UI control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// returns/// To allow the resource to load using the default network loader return null otherwise return an instance of/// see crefIResourceHandler/ with a valid stream./// /returnsprotected virtual IResourceHandler GetResourceHandler(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request){return null;}/// summaryCalled on the CEF IO thread to optionally filter resource response content./summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// param nameresponsethe response object - cannot be modified in this callback./param/// returnsReturn an IResponseFilter to intercept this response, otherwise return null./returnsIResponseFilter IResourceRequestHandler.GetResourceResponseFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response){return GetResourceResponseFilter(chromiumWebBrowser, browser, frame, request, response);}/// summaryCalled on the CEF IO thread to optionally filter resource response content./summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// param nameresponsethe response object - cannot be modified in this callback./param/// returnsReturn an IResponseFilter to intercept this response, otherwise return null./returnsprotected virtual IResponseFilter GetResourceResponseFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response){return null;}/// summary/// Called on the CEF IO thread before a resource request is loaded. To redirect or change the resource load optionally modify/// paramref namerequest/. Modification of the request URL will be treated as a redirect./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - can be modified in this callback./param/// param namecallbackCallback interface used for asynchronous continuation of url requests./param/// returns/// Return see crefCefReturnValue.Continue/ to continue the request immediately. Return/// see crefCefReturnValue.ContinueAsync/ and call see crefIRequestCallback.Continue/ or/// see crefIRequestCallback.Cancel/ at a later time to continue or the cancel the request asynchronously. Return/// see crefCefReturnValue.Cancel/ to cancel the request immediately./// /returnsCefReturnValue IResourceRequestHandler.OnBeforeResourceLoad(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IRequestCallback callback){return OnBeforeResourceLoad(chromiumWebBrowser, browser, frame, request, callback);}/// summary/// Called on the CEF IO thread before a resource request is loaded. To redirect or change the resource load optionally modify/// paramref namerequest/. Modification of the request URL will be treated as a redirect./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - can be modified in this callback./param/// param namecallbackCallback interface used for asynchronous continuation of url requests./param/// returns/// Return see crefCefReturnValue.Continue/ to continue the request immediately. Return/// see crefCefReturnValue.ContinueAsync/ and call see crefIRequestCallback.Continue/ or/// see crefIRequestCallback.Cancel/ at a later time to continue or the cancel the request asynchronously. Return/// see crefCefReturnValue.Cancel/ to cancel the request immediately./// /returnsprotected virtual CefReturnValue OnBeforeResourceLoad(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IRequestCallback callback){return CefReturnValue.Continue;}/// summary/// Called on the CEF UI thread to handle requests for URLs with an unknown protocol component. SECURITY WARNING: YOU SHOULD USE/// THIS METHOD TO ENFORCE RESTRICTIONS BASED ON SCHEME, HOST OR OTHER URL ANALYSIS BEFORE ALLOWING OS EXECUTION./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// returns/// return to true to attempt execution via the registered OS protocol handler, if any. Otherwise return false./// /returnsbool IResourceRequestHandler.OnProtocolExecution(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request){return OnProtocolExecution(chromiumWebBrowser, browser, frame, request);}/// summary/// Called on the CEF UI thread to handle requests for URLs with an unknown protocol component. SECURITY WARNING: YOU SHOULD USE/// THIS METHOD TO ENFORCE RESTRICTIONS BASED ON SCHEME, HOST OR OTHER URL ANALYSIS BEFORE ALLOWING OS EXECUTION./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// returns/// return to true to attempt execution via the registered OS protocol handler, if any. Otherwise return false./// /returnsprotected virtual bool OnProtocolExecution(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request){return false;}/// summary/// Called on the CEF IO thread when a resource load has completed. This method will be called for all requests, including/// requests that are aborted due to CEF shutdown or destruction of the associated browser. In cases where the associated browser/// is destroyed this callback may arrive after the see crefILifeSpanHandler.OnBeforeClose/ callback for that browser. The/// see crefIFrame.IsValid/ method can be used to test for this situation, and care/// should be taken not to call paramref namebrowser/ or paramref nameframe/ methods that modify state (like LoadURL,/// SendProcessMessage, etc.) if the frame is invalid./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// param nameresponsethe response object - cannot be modified in this callback./param/// param namestatusindicates the load completion status./param/// param namereceivedContentLengthis the number of response bytes actually read./paramvoid IResourceRequestHandler.OnResourceLoadComplete(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response, UrlRequestStatus status, long receivedContentLength){OnResourceLoadComplete(chromiumWebBrowser, browser, frame, request, response, status, receivedContentLength);}/// summary/// Called on the CEF IO thread when a resource load has completed. This method will be called for all requests, including/// requests that are aborted due to CEF shutdown or destruction of the associated browser. In cases where the associated browser/// is destroyed this callback may arrive after the see crefILifeSpanHandler.OnBeforeClose/ callback for that browser. The/// see crefIFrame.IsValid/ method can be used to test for this situation, and care/// should be taken not to call paramref namebrowser/ or paramref nameframe/ methods that modify state (like LoadURL,/// SendProcessMessage, etc.) if the frame is invalid./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// param nameresponsethe response object - cannot be modified in this callback./param/// param namestatusindicates the load completion status./param/// param namereceivedContentLengthis the number of response bytes actually read./paramprotected virtual void OnResourceLoadComplete(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response, UrlRequestStatus status, long receivedContentLength){}/// summary/// Called on the CEF IO thread when a resource load is redirected. The paramref namerequest/ parameter will contain the old/// URL and other request-related information. The paramref nameresponse/ parameter will contain the response that resulted/// in the redirect. The paramref namenewUrl/ parameter will contain the new URL and can be changed if desired./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// param nameresponsethe response object - cannot be modified in this callback./param/// param namenewUrl[in,out] the new URL and can be changed if desired./paramvoid IResourceRequestHandler.OnResourceRedirect(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response, ref string newUrl){OnResourceRedirect(chromiumWebBrowser, browser, frame, request, response, ref newUrl);}/// summary/// Called on the CEF IO thread when a resource load is redirected. The paramref namerequest/ parameter will contain the old/// URL and other request-related information. The paramref nameresponse/ parameter will contain the response that resulted/// in the redirect. The paramref namenewUrl/ parameter will contain the new URL and can be changed if desired./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object - cannot be modified in this callback./param/// param nameresponsethe response object - cannot be modified in this callback./param/// param namenewUrl[in,out] the new URL and can be changed if desired./paramprotected virtual void OnResourceRedirect(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response, ref string newUrl){}/// summary/// Called on the CEF IO thread when a resource response is received. To allow the resource load to proceed without modification/// return false. To redirect or retry the resource load optionally modify paramref namerequest/ and return true./// Modification of the request URL will be treated as a redirect. Requests handled using the default network loader cannot be/// redirected in this callback./// /// WARNING: Redirecting using this method is deprecated. Use OnBeforeResourceLoad or GetResourceHandler to perform redirects./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object./param/// param nameresponsethe response object - cannot be modified in this callback./param/// returns/// To allow the resource load to proceed without modification return false. To redirect or retry the resource load optionally/// modify paramref namerequest/ and return true. Modification of the request URL will be treated as a redirect. Requests/// handled using the default network loader cannot be redirected in this callback./// /returnsbool IResourceRequestHandler.OnResourceResponse(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response){return OnResourceResponse(chromiumWebBrowser, browser, frame, request, response);}/// summary/// Called on the CEF IO thread when a resource response is received. To allow the resource load to proceed without modification/// return false. To redirect or retry the resource load optionally modify paramref namerequest/ and return true./// Modification of the request URL will be treated as a redirect. Requests handled using the default network loader cannot be/// redirected in this callback./// /// WARNING: Redirecting using this method is deprecated. Use OnBeforeResourceLoad or GetResourceHandler to perform redirects./// /summary/// param namechromiumWebBrowserThe ChromiumWebBrowser control./param/// param namebrowserthe browser object - may be null if originating from ServiceWorker or CefURLRequest./param/// param nameframethe frame object - may be null if originating from ServiceWorker or CefURLRequest./param/// param namerequestthe request object./param/// param nameresponsethe response object - cannot be modified in this callback./param/// returns/// To allow the resource load to proceed without modification return false. To redirect or retry the resource load optionally/// modify paramref namerequest/ and return true. Modification of the request URL will be treated as a redirect. Requests/// handled using the default network loader cannot be redirected in this callback./// /returnsprotected virtual bool OnResourceResponse(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response){return false;}/// summary/// Called when the unamanged resource is freed./// Unmanaged resources are ref counted and freed when/// the last reference is released, this works differently/// to .Net garbage collection./// /summaryprotected virtual void Dispose(){}void IDisposable.Dispose(){Dispose();}} 然后获取消息发送后的返回值则是在IResponseFilter类的方法中接收也新建一个类继承IResponseFilter类。 public class TestJsonFilter : IResponseFilter{public Listbyte DataAll new Listbyte();public FilterStatus Filter(System.IO.Stream dataIn, out long dataInRead, System.IO.Stream dataOut, out long dataOutWritten){try{if (dataIn null || dataIn.Length 0){dataInRead 0;dataOutWritten 0;return FilterStatus.Done;}dataInRead dataIn.Length;dataOutWritten Math.Min(dataInRead, dataOut.Length);dataIn.CopyTo(dataOut);dataIn.Seek(0, SeekOrigin.Begin);byte[] bs new byte[dataIn.Length];dataIn.Read(bs, 0, bs.Length);DataAll.AddRange(bs);dataInRead dataIn.Length;dataOutWritten dataIn.Length;return FilterStatus.NeedMoreData;}catch (Exception ex){dataInRead dataIn.Length;dataOutWritten dataIn.Length;return FilterStatus.Done;}}public bool InitFilter(){return true;}public void Dispose(){}}
再创建一个类用于配合读取返回值。 public class FilterManager{private static Dictionarystring, IResponseFilter dataList new Dictionarystring, IResponseFilter();public static IResponseFilter CreateFilter(string guid){lock (dataList){var filter new TestJsonFilter();dataList.Add(guid, filter);return filter;}}public static IResponseFilter GetFileter(string guid){lock (dataList){return dataList[guid];}}} 然后重写IResponseFilter、OnResourceLoadComplete两个接口在OnResourceLoadComplete接口中就能接收返回值了。返回值会返回到函数的request参数下此参数是一个结构体可以自行在 if (request.Url.ToLower().Contains(sugrec)) 这一句上下断点查看结构体的内容然后自行加判断来过滤返回值这里鄙人先判断发送类型为GET消息然后再根据发送消息URL里的关键字来过滤返回值最后显示到WinForm窗口程序绑定的控制台窗口里。 public class WinFormResourceRequestHandler : ResourceRequestHandler{protected override IResponseFilter GetResourceResponseFilter(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response){var filter FilterManager.CreateFilter(request.Identifier.ToString());return filter;}protected override void OnResourceLoadComplete(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, IResponse response, UrlRequestStatus status, long receivedContentLength){if (request.Method GET){//先指定消息类型POST或者GETif (request.Url.ToLower().Contains(sugrec)){//以URL为过滤条件var filter FilterManager.GetFileter(request.Identifier.ToString()) as TestJsonFilter;UTF8Encoding encoding new UTF8Encoding();//这里截获返回的数据var data encoding.GetString(filter.DataAll.ToArray());System.Console.WriteLine(742行: data);}}}}public class WinFormsRequestHandler : RequestHandler{protected override IResourceRequestHandler GetResourceRequestHandler(IWebBrowser chromiumWebBrowser, IBrowser browser, IFrame frame, IRequest request, bool isNavigation, bool isDownload, string requestInitiator, ref bool disableDefaultHandling){//NOTE: In most cases you examine the request.Url and only handle requests you are interested inif (request.Url.ToLower().Contains(login.ToLower())){using (var postData request.PostData){if (postData ! null){var elements postData.Elements;var charSet request.GetCharSet();foreach (var element in elements){if (element.Type PostDataElementType.Bytes){var body element.GetBody(charSet);}}}}}return new WinFormResourceRequestHandler();}} 运行程序后 如何运用这个自定义的类呢
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Windows.Forms;
using CefSharp;
using CefSharp.WinForms;
using CefSharp.Handler;
using System.Runtime.InteropServices;
using System.Threading;
using System.Text.RegularExpressions;
using System.Security.Cryptography.X509Certificates;
using System.IO;public partial class Form1 : Form
{ChromiumWebBrowser different;[DllImport(kernel32.dll)]public static extern bool AllocConsole();[DllImport(kernel32.dll)]public static extern bool FreeConsole();public Form1(){InitializeComponent();AllocConsole(); //关联一个控制台窗口用于显示信息}private void Form1_Load(object sender, EventArgs e){CefSettings settings new CefSettings();settings.CachePath Environment.GetFolderPath(Environment.SpecialFolder.LocalApplicationData) \Know;//设置cookie存储目录 C:\Users\×××(系统用户名)\AppData\Local\KnowCef.Initialize(settings);//初始化Cef组件different new ChromiumWebBrowser(https://www.baidu.com);different.RequestHandler new WinFormsRequestHandler();//应用拦截规则different.LifeSpanHandler new CefLifeSpanHandler();//让新页面在当前页面打开different.BrowserSettings new BrowserSettings(){WebGl CefState.Enabled,ImageLoading CefState.Enabled,RemoteFonts CefState.Enabled,AcceptLanguageList zh-CN};tableLayoutPanel1.Controls.Add(different, 0, 1);//把浏览器空间加入布局容器}private void Form1_FormClosing(object sender, FormClosingEventArgs e){//窗口关闭前 回调函数FreeConsole();//释放关联的控制台不然会报错}} 参考资料https://www.cnblogs.com/heifengwll/p/13277232.html
如何拦截替换页面资源JS,CSS等CefSharp请求资源拦截及自定义处理-腾讯云开发者社区-腾讯云