c# - Cause for Inconsistent Extractions of PDF Annotations with iText(Sharp) -


Scenario:

I have an application that uses iTextSharp for pdf files for hyperlinks. Uses.

In hyperlinks file structure in PDF, there is a subtype of an "annotation object", so my code essentially reads a file (1), loops through pages, (3) Annotations get archive for the page, and (4) removes hyperlink annotations for the page.

points

Sometimes the "PDF dictionary" object that represents a given page does not have a collection of annotations ) Is not a key, thus attempting to bring back such a collection, null is an issue because it happens now when the question clearly shows on the page and Clickable links

note that clicks to click Other There has been important because I think that can be the address of the URL in plain text, but I do not care about them, only hyperlinks in actual real life.

code

I got the same question from the given answer (http://stackoverflow.com/questions/6959076/reading-hyperlinks-from-pdf-file) almost exactly I'm already using the code. The important difference is:

  // My code var pdf annotation = (pdf array) pdf reader.getpdf object (pageddict.get (pdfnmnnts)); Forex Currency (Miscellaneous Annotations in PDF Annotations) {} {// Chris' Code Ver Annotsere = Pagedict.GETAASRRA (PDFNN.NOTOS); Foreach (var annotation in annotsArray.ArrayList) {} // My page Det Go () and Chris's pagedict GetAsArray () Methods Both // returns are empty because no ANNOTS key exists in the page.   

question

Why zero value? There is no comment archive in a PDF document with clearly visible / clickable links?

Thanks

Let me try with an estimate. (There is no way to do anything, with no specimen to analyze.)

BTW, never never / ANNOTS inside the PDF code < / Code> - PDF keys are case sensitive! - It's always / Annots .

In PDF source code, an ASCII string such as / Annots can be represented in any of the following objects according to the PDF device according to the following alternative methods. '(See paragraph 7.3.5, name objects ):

  / annots / # 41nnots #' # 41 'PSD / A # 6 Annotations #' # 6 A 'in Hex Representative PDF / A # 6 It's #' A 'in ASCII' A 'is a hex represenation of ASCII' N '. The hex represenation of ASCII 'N' in the hex represenation PDF of SCI / A # 6E # 6 It's # '# 6A' PDF ... / Annot # 73 # '# 73' is.   

You get this idea ... (If my quick count is correct, you can create 32 variations of this ...)

This , BTW, one of the easiest ways that blackhoot hackers obscure a / # 4Aava # 53cript key in their malware pdf! See a complete list of your potential approaches.)

Maybe your version of iTextSharp (which you have not ruled) correctly named / annots name?

If so, my suggestion for you is to copy to each PDF before searching for your / Annots . You can get it successfully with the help of command line tool (and API):

  qpdf --qdf helloworld.pdf qdf --- helloworld.pdf   < P> Let's see:  
  kp @ mbp: ~ $ grep nnots helloworld.pdf / # 41nnots 57 0 r kp @ mbp: ~ $ qpdf --qdf helloworld.pdf qdf --- Helloworld.pdf kp @ mbp: ~ $ grep nnots qdf --- helloworld.pdf qdf --- helloworld.pdf: / annots57 r    

Comments