tpcpr/README.md

7.0 KiB

This is experimental

The software is written in Rust (2018, safe mode only). At the current state I am having fun writing in Rust and testing language features. The code should be modular enough to change any function you deem awful enough. Error handling is subpar at the moment. There is no real unit testing to speak of since switching to asynchronous functionality. Testing will come back.

This version is a successor of the _POSIX_C_SOURCE 200809L implementation in which all data parsed from a pcap/pcapng files is written as a single and simple query. The ingestion time is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but the procedure may be insecure. See the other repository for more information. The idea of this iteration is to use a prepared statement and chunk the data according to maximum input. Postgres databases have a custom maximum limit on each insert query of prepared statements. Said chunk size is initialized through the config/interface file called parser.json as insert_max. Data can be read from PCAP/PCANG files, as well as network devices.

UPDATE 0.2.0: Chunking can be omitted completely when using PostgreSQL's COPY transferring binary data instead of using Insert. This is not only somewhat faster, but there are quite a few lines of code less in the end. Only parsing from network device uses needs chunks, at the moment. The other recent change is that only none NULL protocol data of a packet is serialized to json. Table insertion should be smaller this way. Further an mspc sync_channel is implemented which reduces main memory pressure.

Currently, ethernet, IPv4, IPV6, TCP, UDP and ARP/RARP network protocols are handled any additional session layer/wrapped data can be found in packet->data[u8] -- for now. Because of testing purposes, layout of the table is serialized json. Table layout is somewhat "dynamic". Any procotols not recognized in a parsed packet will be marked as NULL inside a resulting table row. A query may look like this select packet->>'ipv4_header' from json_dump; or this select packet from json_dump where packet->>'reg_res' is not null; to show parsed datai via regex.

Another subgoal was the ability to compile a static binary, which --last time I tested-- works without dependencies, but the need for libpcap itself. It even executes on oracle linux, after linking against the elf64 interpreter in a direct manner. If you ever had the pleasure using this derivate it may come as a suprise to you. The key is to compile via x86_64-unknown-linux-musl target. See: https://doc.rust-lang.org/edition-guide/rust-2018/platform-and-target-support/musl-support-for-fully-static-binaries.html

Caveats: Regex Syntax is limited and needs soome love. Escaping common regular expression synta does not work properly, but character classes do. I have to fiddle the correct synctactical way to get it out of the json file and into a rawstring. For already supported regular expression syntax see: https://docs.rs/regex/1.3.9/regex/#syntax , also see the example in parser.json which parses some toplevel domains.

If this whole thing turns out to be viable, some future features may be:

  • Database containing the already implemented file hash map to compare file status/sizes after the parser may have crashed, or to join a complete overview of any existing PCAP files inserted at previous CTFs.
  • Update file hashmap through inotify crate, during runtime.
  • Restoration of fragmented ipv4 packages.
  • SIMD (via autovectorization). Which is easy enough to do in Rust.
  • Support more network protocols

There are many other things left to be desired.

Benchmarking was done with the identical file that was used in the previous C implementation, at first. Inserting none chunked data resulted in ~20 minutes of querying to database. Now, chunked data is below 12 seconds after compiler optimization.

Speaking of optimization: Do yourself a favor an run release code not debug code: cargo run --release. The compiler does a rather hefty optimization and you will save some time waiting for your precious data do be inserted. I did no further optimization besides trying to enable the compiler to do a better job. Just blackboxing, no assembly tweaking yet.

TESTRUNS:

  • Run 001 at 24.06.2020 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows, cargo run --release 3627,41s user 156,47s system 23% cpu 4:29:19,27 total . PostgreSQL12 server used was a vanilla docker pull postgres container on a 2008 Macbook, 2,4GHz dual core, 6GB RAM connected via wifi. Memory usage of the Client was at about 11.5GB out of 14.7GB which results in 0.78 utilization. (An tokio mpsc pipe will be the next improvement. Thus, memory usage may be less afterwards)
  • Run 002 at 25.06.2020 of iCTF2020 PCAP (bpf filter: 'tcp') files resulted in a table size of roundabout 74GB size ('&du -hs')m 30808676 rows,cargo run --release 3669,68s user 163,23s system 23% cpu 4:27:19,14 total. PostgreSQL12 server used was a vanilla docker pull postgres container on a 2008 Macbook, 2,4GHz dual core, 6GB RAM connected via wifi. Memory usage of the Client was at about 11.5GB out of 14.7GB which results in 0.78 utilization. (An tokio mpsc pipe will be the next improvement. Thus, memory usage may be less afterwards)
  • Run 003 at 26.06.2020 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,cargo run --release 3847,69s user 236,93s system 25% cpu 4:22:45,90 total
  • Run 004 at 26.06.2020 cargo run of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,--release 1176,24s user 146,11s system 30% cpu 1:12:49,93 total on localhost docker
  • Run 005 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,cargo run --release 1181,33s user 139,35s system 29% cpu 1:15:40,24 total on localhost docker
  • Run 006 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,at 29.06.2020 cargo run --release 1640,72s user 224,14s system 44% cpu 1:09:49,42 total on localhost docker, std::mpsc::sync_channel
  • Run 007 at 29.06.2020 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,cargo run --release 1243,53s user 166,47s system 33% cpu 1:09:24,14 total on localhost docker, std::mpsc::sync_channel
  • Run 008 at 29.06.2020 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,cargo run --release 1518,17s user 162,07s system 37% cpu 1:13:42,22 total on localhost docker, std::mpsc::sync_channel
  • Run 009 at 30.06.2020 of complete iCTF2020 PCAPs (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size ($du -hs), 30808676 rows,cargo run --release 1359,90s user 148,15s system 36% cpu 1:09:03,58 total on localhost docker, std::mpsc::sync_channel