70 lines
1.6 KiB
Markdown
70 lines
1.6 KiB
Markdown
# The PDF File Format
|
|
|
|
## Structure
|
|
|
|
The PDF Header contains meta data and starts with
|
|
|
|
```
|
|
%PDF-<version.number>
|
|
```
|
|
|
|
The Body contains objects and a cross-reference table to locate objects inside
|
|
the file. An objects start and end looks like the following example
|
|
|
|
```
|
|
1 0 obj
|
|
<<
|
|
[...]
|
|
>>endobj
|
|
```
|
|
|
|
The footer, or trailer, contains the start of the cross-reference table and the
|
|
end of file marker
|
|
|
|
```
|
|
trailer
|
|
<>
|
|
<cross-reference-table>
|
|
%%EOF
|
|
```
|
|
|
|
## Multi Media Keywords
|
|
|
|
PDF format contains properties for multi media in a single document.
|
|
An example is given by [zeltser's Analysing Malicious Documents](https://zeltser.com/media/docs/analyzing-malicious-document-files.pdf)
|
|
|
|
```
|
|
/OpenAction and /AA specify the script or action to
|
|
run automatically.
|
|
/JavaScript, /JS, /AcroForm, and /XFA can specify
|
|
JavaScript to run.
|
|
/URI accesses a URL, perhaps for phishing.
|
|
/SubmitForm and /GoToR can send data to URL.
|
|
/ObjStm can hide objects inside an object stream.
|
|
/XObject can embed an image for phishing.
|
|
Be mindful of obfuscation with hex codes, such as
|
|
/JavaScript vs. /J#61vaScript
|
|
```
|
|
|
|
<embed src="./CheatSheets/analyzing-malicious-document-files.pdf" type="application/pdf">
|
|
|
|
### Triage keywords
|
|
|
|
To triage keywords use [jesparza's peepdf](https://github.com/jesparza/peepdf)
|
|
or [Didie Stevens' PDF
|
|
tools](https://blog.didierstevens.com/programs/pdf-tools/) like pdfid.py.
|
|
Parsing is done via pdf-parser.py.
|
|
|
|
```sh
|
|
pdf-parser.py --search <keyword> file.pdf
|
|
pdf-parser.py --object <objectNo.> file.pdf
|
|
```
|
|
|
|
Peepdf decodes values of an object in interactive mode
|
|
|
|
```sh
|
|
peepdf -i file.pdf
|
|
[..]
|
|
PPDF> object <No.>
|
|
```
|