solved logic bug,clean up

This commit is contained in:
gurkenhabicht 2020-06-19 02:03:28 +02:00
parent 0e1b9435b1
commit 2e2c8c5e5b
3 changed files with 9 additions and 6 deletions

View File

@ -1,9 +1,9 @@
# This is experimental
The software is written in Rust (2018, safe mode only). At the current state I have some fun writing and testing language features. The code should be modular enough to change any function you deem awfull enough.
The software is written in Rust (2018, safe mode only). At the current state I am having fun writing in Rust and testing language features. The code should be modular enough to change any function you deem awful enough.
Error handling is subpar at the moment. There is no real unit testing to speak of since switching to asynchronous functionality. Testing will come back.
This version is a successor of the _POSIX_C_SOURCE 200809L implementation in which all data parsed from a cap/pcapng files is written as a single and simple query. The ingestion time is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but may be insecure. See the other repository.
This version is a successor of the _POSIX_C_SOURCE 200809L implementation in which all data parsed from a cap/pcapng files is written as a single and simple query. The ingestion time is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but may be insecure. See the other repository for more information.
The idea of this iteration is to use a prepared statement and chunk the data according to maximum input. Postgres databases have a custom maximum limit on each insert query of prepared statements. Said chunk size is initialized through the config/interface file called parser.json as `insert_max`. Data can be read from PCAP/PCANG files, as well as network devices.
Process is as follows:
@ -27,6 +27,9 @@ Speaking of serialization: After profiling it turns out that ~20% of cpu time is
Another subgoal was the ability to compile a static binary, which --last time I tested-- works without dependencies, but the need for libpcap itself. It even executes on oracle linux, after linking against the elf64 interpreter in a direct manner. If you ever had the pleasure using this derivate it may come as a suprise to you. The key is to compile via `x86_64-unknown-linux-musl` target. See: https://doc.rust-lang.org/edition-guide/rust-2018/platform-and-target-support/musl-support-for-fully-static-binaries.html
Caveats: Regex Syntax is limited at the moment, because it is not compiled from a Rawstring, but a common one. Escaping does not work properly, character classes do. I have to fiddle the correct synctactical way to get it out of the json file and into a raw. For already supported regular expression syntax see: https://docs.rs/regex/1.3.9/regex/#syntax , also see the example in `parser.json`.
Transmitting the data of the formerly described testing table layout results in a rather big table size. HDD space was no issue so far.
If this whole thing turns out to be viable, some future features may be:
- Database containing file hash map to compare file status/sizes after the parser may have crashed, or to join a complete overview of any existing PCAP files.

View File

@ -74,7 +74,7 @@ async fn main() -> Result<(), Error> {
// packets_serialized.extend(serializer::serialize_packets(v));
/* Do chunks and query data */
let chunker = &packets_serialized.len() < &config.insert_max;
let chunker = (&packets_serialized.len() < &config.insert_max) && (0 < packets_serialized.len()) ;
match chunker {
NON_CHUNKED => {
let insert_str = query_string(&packets_serialized.len(), &config.tablename);

View File

@ -1,11 +1,11 @@
{
"insert_max": 16000,
"filter": "!ip6 && tcp",
"insert_max": 20000,
"filter": "tcp",
"regex_filter": "(?:http|https)[[:punct:]]+[[:alnum:]]+[[:punct:]][[:alnum:]]+[[:punct:]](?:com|de|org)",
"from_device": false,
"parse_device": "enp7s0",
"pcap_file": "not in use right now",
"pcap_dir": "../target/files",
"pcap_dir": "../target",
"database_tablename": "json_dump",
"database_user": "postgres",
"database_host": "localhost",