killchain-compendium/Forensics/PDF.md

70 lines
1.6 KiB
Markdown

# The PDF File Format
## Structure
The PDF Header contains meta data and starts with
```
%PDF-<version.number>
```
The Body contains objects and a cross-reference table to locate objects inside
the file. An objects start and end looks like the following example
```
1 0 obj
<<
[...]
>>endobj
```
The footer, or trailer, contains the start of the cross-reference table and the
end of file marker
```
trailer
<>
<cross-reference-table>
%%EOF
```
## Multi Media Keywords
PDF format contains properties for multi media in a single document.
An example is given by [zeltser's Analysing Malicious Documents](https://zeltser.com/media/docs/analyzing-malicious-document-files.pdf)
```
/OpenAction and /AA specify the script or action to
run automatically.
/JavaScript, /JS, /AcroForm, and /XFA can specify
JavaScript to run.
/URI accesses a URL, perhaps for phishing.
/SubmitForm and /GoToR can send data to URL.
/ObjStm can hide objects inside an object stream.
/XObject can embed an image for phishing.
Be mindful of obfuscation with hex codes, such as
/JavaScript vs. /J#61vaScript
```
<embed src="./CheatSheets/analyzing-malicious-document-files.pdf" type="application/pdf">
### Triage keywords
To triage keywords use [jesparza's peepdf](https://github.com/jesparza/peepdf)
or [Didie Stevens' PDF
tools](https://blog.didierstevens.com/programs/pdf-tools/) like pdfid.py.
Parsing is done via pdf-parser.py.
```sh
pdf-parser.py --search <keyword> file.pdf
pdf-parser.py --object <objectNo.> file.pdf
```
Peepdf decodes values of an object in interactive mode
```sh
peepdf -i file.pdf
[..]
PPDF> object <No.>
```