killchain-compendium/Forensics/PDF.md

1.6 KiB

The PDF File Format

Structure

The PDF Header contains meta data and starts with

%PDF-<version.number>

The Body contains objects and a cross-reference table to locate objects inside the file. An objects start and end looks like the following example

1 0 obj
<<
[...]
>>endobj

The footer, or trailer, contains the start of the cross-reference table and the end of file marker

trailer
<>
<cross-reference-table>
%%EOF

Multi Media Keywords

PDF format contains properties for multi media in a single document. An example is given by zeltser's Analysing Malicious Documents

/OpenAction and /AA specify the script or action to
run automatically.
/JavaScript, /JS, /AcroForm, and /XFA can specify
JavaScript to run.
/URI accesses a URL, perhaps for phishing.
/SubmitForm and /GoToR can send data to URL.
/ObjStm can hide objects inside an object stream.
/XObject can embed an image for phishing.
Be mindful of obfuscation with hex codes, such as
/JavaScript vs. /J#61vaScript

Triage keywords

To triage keywords use jesparza's peepdf or Didie Stevens' PDF tools like pdfid.py. Parsing is done via pdf-parser.py.

pdf-parser.py --search <keyword> file.pdf
pdf-parser.py --object <objectNo.> file.pdf

Peepdf decodes values of an object in interactive mode

peepdf -i file.pdf
[..]
PPDF> object <No.>