1.6 KiB
1.6 KiB
The PDF File Format
Structure
The PDF Header contains meta data and starts with
%PDF-<version.number>
The Body contains objects and a cross-reference table to locate objects inside the file. An objects start and end looks like the following example
1 0 obj
<<
[...]
>>endobj
The footer, or trailer, contains the start of the cross-reference table and the end of file marker
trailer
<>
<cross-reference-table>
%%EOF
Multi Media Keywords
PDF format contains properties for multi media in a single document. An example is given by zeltser's Analysing Malicious Documents
/OpenAction and /AA specify the script or action to
run automatically.
/JavaScript, /JS, /AcroForm, and /XFA can specify
JavaScript to run.
/URI accesses a URL, perhaps for phishing.
/SubmitForm and /GoToR can send data to URL.
/ObjStm can hide objects inside an object stream.
/XObject can embed an image for phishing.
Be mindful of obfuscation with hex codes, such as
/JavaScript vs. /J#61vaScript
Triage keywords
To triage keywords use jesparza's peepdf or Didie Stevens' PDF tools like pdfid.py. Parsing is done via pdf-parser.py.
pdf-parser.py --search <keyword> file.pdf
pdf-parser.py --object <objectNo.> file.pdf
Peepdf decodes values of an object in interactive mode
peepdf -i file.pdf
[..]
PPDF> object <No.>