# The PDF File Format ## Structure The PDF Header contains meta data and starts with ``` %PDF- ``` The Body contains objects and a cross-reference table to locate objects inside the file. An objects start and end looks like the following example ``` 1 0 obj << [...] >>endobj ``` The footer, or trailer, contains the start of the cross-reference table and the end of file marker ``` trailer <> %%EOF ``` ## Multi Media Keywords PDF format contains properties for multi media in a single document. An example is given by [zeltser's Analysing Malicious Documents](https://zeltser.com/media/docs/analyzing-malicious-document-files.pdf) ``` /OpenAction and /AA specify the script or action to run automatically. /JavaScript, /JS, /AcroForm, and /XFA can specify JavaScript to run. /URI accesses a URL, perhaps for phishing. /SubmitForm and /GoToR can send data to URL. /ObjStm can hide objects inside an object stream. /XObject can embed an image for phishing. Be mindful of obfuscation with hex codes, such as /JavaScript vs. /J#61vaScript ``` ### Triage keywords To triage keywords use [jesparza's peepdf](https://github.com/jesparza/peepdf) or [Didie Stevens' PDF tools](https://blog.didierstevens.com/programs/pdf-tools/) like pdfid.py. Parsing is done via pdf-parser.py. ```sh pdf-parser.py --search file.pdf pdf-parser.py --object file.pdf ``` Peepdf decodes values of an object in interactive mode ```sh peepdf -i file.pdf [..] PPDF> object ```