C# URL Crawler not getting enough links? -

C# URL Crawler not getting enough links? -

I have the following code, however, when I launch it, I only limit it to get some URLS.

  While (stopFlag! = True) {WebRCC Request = WebRequest.Create (urlList [i]); (WebResponse response = request.GetResponse ()) using {StreamReader Reader = New Stream Reader (Response .GetterSponsScream (), Encoding.UTF8) {string sitecontent = reader.ReadToEnd (); // Add links to the list // Process the content // Create a text box for the HTML code // Regex URLRx = New Regex (@ "((| (| | | | | | ftp | File] \: // www) [A -zA-Z0-9 \\ -.] (/ [One-zA-Z0-9 \ & amp; ==?!? \ '\ (\) \ * \ - \ ._ ~%] *) * ", RegexOptions.IgnoreCase); Reggaez urlRx = New Reggae (@ "(? & Lt; url & gt; (http: [/] [/] | www.) ([एज] | [AG] | [0-9] | [/.] | [~]) *) ", RegexOptions.IgnoreCase; Match Collection Matches = urlRx.Matches (Siteignant); Foreign matches (match matches in matches) {string cleanMatch = cleanUP (match.Value); UrlList.Add (cleanMatch); Update succeeded (result, "\" "+ CleanMatch +" \ ", \ n"); }}}}    I think the error is within regex.  
 What I'm trying to achieve pulls out a webpage, then drag all links from that page - add them to the list, then get the next page for each list item and repeat the process . Instead of trying to use, I suggest using a good HTML parser -   
 
    What is actually the HTML agility pack (HAP)?   
 This is a tight HTML parser that reads / writes domes and plain XPath or XSLT (you really do not have to understand XPAT nor do not use XSLT to use it , Do not worry ...). This is a .NET code library that allows you to parse "outside the web" HTML files. Parser is very tolerant with the "real world" HTML deformed object model system. XML offers, but is very similar to HTML documents (or streams).    

 


  

















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment







Popular Posts




apache - How to do an URL rewrite that works with a
pre-existing .htaccess file? -



    I have a platform that is currently available through the URL:    Www.websitecom / index.php? App = forum    Instead, I want people to be able to go here:      & lt; Ifmodule mod_rewrite.c & gt; Option - Multi Views RewriteEngine RewriteBase / RewriteCond on% {REQUEST_FILENAME}! -f Rewrite Convert% {REQUEST_FILENAME}! -d rewriteable /index.php [L] & lt; / IfModule & gt;    It seems that it should be possible, but I'm not sure how to do it.   Thank you!      This has not been tested, but something like that should be done. Add it before the first Rev.Directivity.    RewriteRule ^ forum $ /index.php?app=forums    Alternatively, you redirect from a 301 / forum to index.php URL .   By doing this:    301 / redirect the forum "http://www.website.com/index.php?app = forum    However, it seems that you want to make a proper rewrite, so the first option is better,     






c++ - Which is the fastest algorthm for selecting kth largest number in
an unsorted array containing non -unique elements? -



      संभव डुप्लिकेट:      तत्वों की संख्या 1 से 10 लाख तक भिन्न हो सकती है। इस उद्देश्य के लिए उपलब्ध सबसे तेज़ चयन एल्गोरिदम कौन सा है? कृपया ध्यान दें मुझे लगता है कि एवीएल पेड़ जैसी डेटा संरचनाएं सरणी तत्वों की दोहराव के कारण काम नहीं करेंगी?      ए हे (एन) समय में चला सकता है।   सरणी के माध्यम से एक पास बनाने का सबसे सामान्य तरीका है, अब तक के सबसे बड़े नंबरों को देखते हुए रखें। उस सूची के अंतिम तत्व लौटें @ चेसिया के रूप में  std :: nth_element  (दस्तावेज) इस दृष्टिकोण का उपयोग करने का सबसे तेज़ तरीका है।   यदि आपको हमेशा सबसे ऊपर का सबसे बड़ा आइटम (और डेटा कभी-कभी बदल जाता है), फिर डेटा को एक में संग्रहीत करने पर विचार करें। यह अधिक महंगा है, लेकिन आपको डेटा की "लाइव" संरचना देता है।    






android - How to build apk without eclipse or Modify the apk building
with a config file? -



    I want to create an APK with some big configurations with an XML file.  I want to know if there is any method which can control the APK's building process or encoding at our source, there is no way to create APK with our source from our source, with a slight variation based on our SQF XML file. . Or any other way to create an APK file    When the app is running, I do not want to read my config file every time, I have to include a change in the application        If you want to manually create your application:    If you are developing in Eclipse, then the ADT plugin The enhanced project as you make changes to the source code. Eclipse automatically produces an .apk file in the project's bin folder, so you do not need to do anything extra to generate .apk.   If you are developing in a non-eclipse environment, then you create your project with the Generated Build.xml Ant file in the project directory that creates ant file calls which automatically targets you for Call the buil...








Powered by Blogger