grammar n stuff

This commit is contained in:
gurkenhabicht 2020-06-15 21:00:55 +02:00
parent 4708292353
commit d453fafc30
1 changed files with 6 additions and 6 deletions

View File

@ -1,17 +1,17 @@
# This is experimental # This is experimental
This version is a successor to the _POSIX_C_SOURCE 200809L implementation in which all of the data of a parsed a pcap/pcapng files is written as a single and simple query. The ingestion result is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but may be insecure. This version is a successor of the _POSIX_C_SOURCE 200809L implementation in which all of the data of a parsed a pcap/pcapng files is written as a single and simple query. The ingestion result is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but may be insecure.
The idea of this iteration is to use a prepared statement and chunk the data according to maximum input. Postgres databases have a custom maximum limit on each insert query of prepared statements. Said chunk size is initialized through the config/interface file called parser.json as `insert_max`. Data can be read from PCAP/PCANG files, as well as network devices. The idea of this iteration is to use a prepared statement and chunk the data according to maximum input. Postgres databases have a custom maximum limit on each insert query of prepared statements. Said chunk size is initialized through the config/interface file called parser.json as `insert_max`. Data can be read from PCAP/PCANG files, as well as network devices.
The software is written in Rust (no unsafe mode). At the current state I am testing language features. The code should be modular enough to change any awfully written function. The software is written in Rust (no unsafe mode). At the current state I am testing language features. The code should be modular enough to change any awfully written function.
Error handling is subpar at the moment. There is no real unit testing to speak of. Error handling is subpar at the moment. There is no real unit testing to speak of since switching to asynchronous functionality.
Process is as follows: Process is as follows:
- Choose between network device (which should be used as well) or file input - Choose between network device (which should be used as well) or file input
- Choosing device is straight forward -> data gets parsed, chunked and queries prepared according to `insert_max` size - Choosing device is straight forward -> data gets parsed, chunked and queries prepared according to `insert_max` size
- Encapsulation type / Linktype is chosen in beforehand - Encapsulation type / Linktype is chosen in beforehand. Currently Ethernet and RawIp is supported.
- Choosing file input means selecting a directory where your PCAP/PCAPNG files reside. - Choosing file input means selecting a directory where your PCAP/PCAPNG files reside.
- A hash map is created out of key(paths):value(metadata) of the pcap files in the specified directory. - A hash map is created out of key(paths):value(metadata) of the pcap files in the specified directory.
- The parser gets invoked, which itself calls the appropriate protocol handler on to the byte data of yielded packets. A vector of type QryData is returned after EOF has been hit. - The parser gets invoked, which itself calls the appropriate protocol handler on to the byte data of yielded packets. A vector of type QryData is returned after EOF has been hit.
@ -20,7 +20,7 @@ Process is as follows:
- Prepared statements are prepared according to chunksize - Prepared statements are prepared according to chunksize
- Queried data gets queried in chunks afterwards - Queried data gets queried in chunks afterwards
Currently, ethernet, IPv4, Ipv6, TCP, UDP and ARP/RARP network protocols are handled. Currently, ethernet, IPv4, IPV6, TCP, UDP and ARP/RARP network protocols are handled.
Because of testing purposes, layout of the table is serialized json. Only procotols inside the packet are not null inside serialized json data. Because of testing purposes, layout of the table is serialized json. Only procotols inside the packet are not null inside serialized json data.
A query may look like this `select packet from json_dump where packet->>'ipv4_header' is not null;` A query may look like this `select packet from json_dump where packet->>'ipv4_header' is not null;`
@ -31,13 +31,13 @@ Another subgoal was the ability to compile a static binary, which --last time I
If this whole thing turns out to be viable, some future features may be: If this whole thing turns out to be viable, some future features may be:
- Database containing file hash map to compare file status/sizes after the parser may have crashed, or to join a complete overview of any existing PCAP files. - Database containing file hash map to compare file status/sizes after the parser may have crashed, or to join a complete overview of any existing PCAP files.
- Concurrency. There are some interresting ways of parallelization I am working on to find a model that really benefits the use case. MPSC looks promising at the moment. - Concurrency. There are some interresting ways of parallelization I am working on to find a model that really benefits the use case. MPSC looks promising at the moment. Thats why tokio carte is already implemented for db queries, but has no performance benefit at the moment.
- Update file hashmap through inotify crate, during runtime. - Update file hashmap through inotify crate, during runtime.
- Restoration of fragmented ipv4 packages. - Restoration of fragmented ipv4 packages.
- SIMD (via autovectorization). Which is easy enough to do in Rust. - SIMD (via autovectorization). Which is easy enough to do in Rust.
There are many other things left to be desired. There are many other things left to be desired.
The file used for testing was identical to the one used in the previous C implementation. Inserting none chunked data resulted in ~20 minutes of querying to database. Chunked data is below 20 seconds after compiler optimization. The file used for testing was identical to the one used in the previous C implementation. Inserting none chunked data resulted in ~20 minutes of querying to database. Now, chunked data is below 20 seconds after compiler optimization.
Speaking of optimization: Do yourself a favor an run release code not debug code: `cargo run --release`. The compiler does a rather hefty optimization and you will save some time waiting for your precious data do be inserted. I did no further optimization besides trying to enable the compiler to do a better job. Just blackboxing, no assembly tweaking yet. Speaking of optimization: Do yourself a favor an run release code not debug code: `cargo run --release`. The compiler does a rather hefty optimization and you will save some time waiting for your precious data do be inserted. I did no further optimization besides trying to enable the compiler to do a better job. Just blackboxing, no assembly tweaking yet.