introduced full tokio + std mpsc chain, introduced jemallocator

This commit is contained in:
gurkenhabicht 2020-06-29 02:08:26 +02:00
parent 381cecd710
commit b571cb06f5
8 changed files with 875 additions and 43 deletions

View File

@ -6,6 +6,13 @@ edition = "2018"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html # See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[profile.release]
debug = true
#lto = "fat"
#codegen-units = 1
#panic = "abort"
[dependencies] [dependencies]
tokio-postgres = { version="0.5.4", features = ["runtime","with-eui48-0_4","with-serde_json-1"] } tokio-postgres = { version="0.5.4", features = ["runtime","with-eui48-0_4","with-serde_json-1"] }
tokio = { version = "0.2", features = ["full"] } tokio = { version = "0.2", features = ["full"] }
@ -20,3 +27,4 @@ serde = { version = "1.0.3", features = ["derive"] }
rayon = "1.3" rayon = "1.3"
regex = "1.3.7" regex = "1.3.7"
futures = "~0.3.5" futures = "~0.3.5"
jemallocator = "~0.3.2"

View File

@ -3,10 +3,10 @@
The software is written in Rust (2018, safe mode only). At the current state I am having fun writing in Rust and testing language features. The code should be modular enough to change any function you deem awful enough. The software is written in Rust (2018, safe mode only). At the current state I am having fun writing in Rust and testing language features. The code should be modular enough to change any function you deem awful enough.
Error handling is subpar at the moment. There is no real unit testing to speak of since switching to asynchronous functionality. Testing will come back. Error handling is subpar at the moment. There is no real unit testing to speak of since switching to asynchronous functionality. Testing will come back.
This version is a successor of the _POSIX_C_SOURCE 200809L implementation in which all data parsed from a pcap/pcapng files is written as a single and simple query. The ingestion time is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but may be insecure. See the other repository for more information. This version is a successor of the _POSIX_C_SOURCE 200809L implementation in which all data parsed from a pcap/pcapng files is written as a single and simple query. The ingestion time is rather fast (tested writes: 100*10^3 tcp packets in ~1.8 sec) but the procedure may be insecure. See the other repository for more information.
The idea of this iteration is to use a prepared statement and chunk the data according to maximum input. Postgres databases have a custom maximum limit on each insert query of prepared statements. Said chunk size is initialized through the config/interface file called parser.json as `insert_max`. Data can be read from PCAP/PCANG files, as well as network devices. The idea of this iteration is to use a prepared statement and chunk the data according to maximum input. Postgres databases have a custom maximum limit on each insert query of prepared statements. Said chunk size is initialized through the config/interface file called parser.json as `insert_max`. Data can be read from PCAP/PCANG files, as well as network devices.
**UPDATE 0.2.0**: Chunking can be omitted completely when using PostgreSQL's `COPY` transferring binary data instead of using `Insert`. This is not only somewhat faster, but there are quite a few lines of code less in the end. Only parsing from network device still needs chunks. **UPDATE 0.2.0**: Chunking (more on this in the next paragraph) can be omitted completely when using PostgreSQL's `COPY` transferring binary data instead of using `Insert`. This is not only somewhat faster, but there are quite a few lines of code less in the end. Only parsing from network device still needs chunks.
The other recent change is that only none NULL protocol data of a packet is serialized to json. Table insertion should be smaller this way. The other recent change is that only none NULL protocol data of a packet is serialized to json. Table insertion should be smaller this way.
Process is as follows: Process is as follows:
@ -22,18 +22,13 @@ Process is as follows:
- Prepared statements are prepared according to chunksize - Prepared statements are prepared according to chunksize
- Queried data gets queried in chunks afterwards - Queried data gets queried in chunks afterwards
Currently, ethernet, IPv4, IPV6, TCP, UDP and ARP/RARP network protocols are handled. Currently, ethernet, IPv4, IPV6, TCP, UDP and ARP/RARP network protocols are handled any additional session layer/wrapped data can be found in packet->data[u8] -- for now.
Because of testing purposes, layout of the table is serialized json. Table layout is somewhat "dynamic". Any procotols not recognized in a parsed packet will be marked as NULL inside a resulting table row. Because of testing purposes, layout of the table is serialized json. Table layout is somewhat "dynamic". Any procotols not recognized in a parsed packet will be marked as NULL inside a resulting table row.
A query may look like this `select packet from json_dump where packet->>'ipv4_header' is not null;` A query may look like this `select packet->>'ipv4_header' from json_dump;` or this `select packet from json_dump where packet->>'reg_res' is not null;` to show parsed datai via regex.
Speaking of serialization: After profiling it turns out that ~20% of cpu time is used for serialization to json. This, of course, could be saved completely.
Another subgoal was the ability to compile a static binary, which --last time I tested-- works without dependencies, but the need for libpcap itself. It even executes on oracle linux, after linking against the elf64 interpreter in a direct manner. If you ever had the pleasure using this derivate it may come as a suprise to you. The key is to compile via `x86_64-unknown-linux-musl` target. See: https://doc.rust-lang.org/edition-guide/rust-2018/platform-and-target-support/musl-support-for-fully-static-binaries.html Another subgoal was the ability to compile a static binary, which --last time I tested-- works without dependencies, but the need for libpcap itself. It even executes on oracle linux, after linking against the elf64 interpreter in a direct manner. If you ever had the pleasure using this derivate it may come as a suprise to you. The key is to compile via `x86_64-unknown-linux-musl` target. See: https://doc.rust-lang.org/edition-guide/rust-2018/platform-and-target-support/musl-support-for-fully-static-binaries.html
Caveats: Regex Syntax is limited at the moment, because it is not compiled from a Rawstring, but a common one. Escaping does not work properly, character classes do. I have to fiddle the correct synctactical way to get it out of the json file and into a raw. For already supported regular expression syntax see: https://docs.rs/regex/1.3.9/regex/#syntax , also see the example in `parser.json`. Caveats: Regex Syntax is limited at the moment, because it is not compiled from a Rawstring, but a common one. Escaping does not work properly, character classes do. I have to fiddle the correct synctactical way to get it out of the json file and into a raw. For already supported regular expression syntax see: https://docs.rs/regex/1.3.9/regex/#syntax , also see the example in `parser.json`.
Transmitting all the data of the formerly described testing table layout results in a rather big table size. HDD space was no issue so far. Ingest of 30808676 TCP/IP Packets taken from iCTF 2020 PCAPs results in 99.4GB of json data. See: https://docs.docker.com/engine/reference/run/#runtime-constraints-on-resources for more details.
Gotchas: My test setup consists of a postgresql db inside a docker container. Main memory usage of said container is low ~300MB, but I had to set `--oom-score-adj=999` in order to not get the container quit automatically. `--oom-kill-disable=false` would turn it off complete, I guess. I did no fine tuning of this value, yet.
If this whole thing turns out to be viable, some future features may be: If this whole thing turns out to be viable, some future features may be:
@ -49,3 +44,13 @@ There are many other things left to be desired.
Bechmarking was done with the identical file that was used in the previous C implementation. Inserting none chunked data resulted in ~20 minutes of querying to database. Now, chunked data is below 12 seconds after compiler optimization. Bechmarking was done with the identical file that was used in the previous C implementation. Inserting none chunked data resulted in ~20 minutes of querying to database. Now, chunked data is below 12 seconds after compiler optimization.
Speaking of optimization: Do yourself a favor an run release code not debug code: `cargo run --release`. The compiler does a rather hefty optimization and you will save some time waiting for your precious data do be inserted. I did no further optimization besides trying to enable the compiler to do a better job. Just blackboxing, no assembly tweaking yet. Speaking of optimization: Do yourself a favor an run release code not debug code: `cargo run --release`. The compiler does a rather hefty optimization and you will save some time waiting for your precious data do be inserted. I did no further optimization besides trying to enable the compiler to do a better job. Just blackboxing, no assembly tweaking yet.
** TESTRUNS **:
Run 001 at 24.06.2020 of iCTF2020 PCAP (bpf filter: 'tcp') files resulted in a table of roundabout 74GB size (`$du -hs`), 30808676 rows, `cargo run --release 3627,41s user 156,47s system 23% cpu 4:29:19,27 total` . PostgreSQL12 server used was a vanilla `docker pull postgres` container on a 2008 Macbook, 2,4GHz dual core, 6GB RAM connected via wifi.
Memory usage of the Client was at about 11.5GB out of 14.7GB which results in 0.78 utilization. (An tokio mpsc pipe will be the next improvement. Thus, memory usage may be less afterwards)
Run 002 at 25.06.2020 of iCTF2020 PCAP (bpf filter: 'tcp') files resulted in a table size of roundabout 74GB size ('&du -hs')m 30808676 rows,`cargo run --release 3669,68s user 163,23s system 23% cpu 4:27:19,14 total`. PostgreSQL12 server used was a vanilla `docker pull postgres` container on a 2008 Macbook, 2,4GHz dual core, 6GB RAM connected via wifi. Memory usage of the Client was at about 11.5GB out of 14.7GB which results in 0.78 utilization. (An tokio mpsc pipe will be the next improvement. Thus, memory usage may be less afterwards)
Run 003 cargo run --release 3847,69s user 236,93s system 25% cpu 4:22:45,90 total
Run 004 cargo run --release 1176,24s user 146,11s system 30% cpu 1:12:49,93 total on localhost docker
Run 005 cargo run --release 1181,33s user 139,35s system 29% cpu 1:15:40,24 total on localhost docker

419
src/flamegraph.svg Normal file

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 182 KiB

View File

@ -10,6 +10,13 @@ use tokio_postgres::{Error, NoTls};
use tokio_postgres::binary_copy::{BinaryCopyInWriter}; use tokio_postgres::binary_copy::{BinaryCopyInWriter};
use futures::{pin_mut}; use futures::{pin_mut};
use tokio::task; use tokio::task;
use tokio::sync::mpsc;
extern crate jemallocator;
#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;
/* conditionals */ /* conditionals */
const FROM_FILE: bool = false; const FROM_FILE: bool = false;
@ -64,26 +71,66 @@ async fn main() -> Result<(), Error> {
match config.is_device { match config.is_device {
FROM_FILE => { FROM_FILE => {
for (_pcap_file, _pcap_info) in pcap_map.iter() { for (_pcap_file, _pcap_info) in pcap_map.iter() {
println!("{:?}: {:?}", &_pcap_file, &_pcap_info); //println!("{:?}: {:?}", &_pcap_file, &_pcap_info);
let v: Vec<parser::QryData> =
parser::parse(&_pcap_file, &config.filter, &config.regex_filter);
let packets_serialized = serializer::serialize_packets(v); /* MPSC channeled serialization */
// This is just patched up atm, mix between std::sync::mpsc and tokio::sync::mpsc
let (qry_data, h1) = parser::mpsc_parser(_pcap_file.to_owned(), config.filter.to_owned(), config.regex_filter.to_owned()).await;
// let (data_serialized, h2) = serializer::mpsc_serialize(qry_data);
// let packets_serialized = serializer::mpsc_collect_serialized(data_serialized);
// let _r1 = h1.join().unwrap();
// let _r2 = h2.join().unwrap();
/* Deprecated */
// let v: Vec<parser::QryData> =
// parser::parse(&_pcap_file, &config.filter, &config.regex_filter);
// let len = v.len();
/* tokio mpsc channel */
let (tx, mut rx) = mpsc::channel(100);
// let pcap_file = _pcap_file.clone();
// let filter = config.filter.clone();
// let regex_filter = config.regex_filter.clone();
// let join_handle: task::JoinHandle<Vec<parser::QryData>> = task::spawn( async move {
// parser::tokio_parse(pcap_file, &filter, &regex_filter).await
// });
// let v = join_handle.await.unwrap();
// for packet in v.into_iter(){
for packet in qry_data {
let mut tx = tx.clone();
tokio::spawn( async move {
//println!("serializing!number {:?}", i);
let packet_serialized = serializer::tokio_serialize(packet).await;
tx.send(packet_serialized).await.unwrap();
});
}
drop(tx);
let sink = client.copy_in("COPY json_dump(packet) from STDIN BINARY").await.unwrap(); let sink = client.copy_in("COPY json_dump(packet) from STDIN BINARY").await.unwrap();
let writer = BinaryCopyInWriter::new(sink, &[Type::JSON]); let writer = BinaryCopyInWriter::new(sink, &[Type::JSON]);
let join = task::spawn( async move { let join = task::spawn( async move {
pin_mut!(writer); pin_mut!(writer);
for pack in packets_serialized { //for pack in packets_serialized {
writer.as_mut().write(&[&pack]).await.unwrap(); while let Some(res) = rx.recv().await {
writer.as_mut().write(&[&res]).await.unwrap();
drop(res);
// Reminder: write_raw() behavior is very strange, so it's write() for now. // Reminder: write_raw() behavior is very strange, so it's write() for now.
// writer.as_mut().write_raw(chunk.into_iter().map(|p| p as &dyn ToSql).collect()).await.unwrap(); // writer.as_mut().write_raw(chunk.into_iter().map(|p| p as &dyn ToSql).collect()).await.unwrap();
} }
//thread::sleep(time::Duration::from_millis(3000));
writer.finish().await.unwrap(); writer.finish().await.unwrap();
}); });
assert!(join.await.is_ok()); assert!(join.await.is_ok());
let _r1 = h1.join().unwrap();
// TODO: MPSC channel // TODO: MPSC channel
// let mut v = Vec::<parser::QryData>::with_capacity(100000); // let mut v = Vec::<parser::QryData>::with_capacity(100000);
// v.extend(parser::parse(&_pcap_file, &config.filter, &config.regex_filter)); // v.extend(parser::parse(&_pcap_file, &config.filter, &config.regex_filter));

211
src/main_bkp Normal file
View File

@ -0,0 +1,211 @@
extern crate serde_json;
extern crate tokio;
extern crate tokio_postgres;
mod configure;
mod parser;
mod serializer;
//use postgres::{Client, NoTls};
//use postgres::types::ToSql;
//use postgres::binary_copy::{BinaryCopyInWriter};
use tokio_postgres::types::{Type, ToSql};
use tokio_postgres::{Error, NoTls};
use tokio_postgres::binary_copy::{BinaryCopyInWriter};
use futures::{pin_mut};
use tokio::task;
use tokio::sync::mpsc;
//use std::thread::{spawn, JoinHandle};
//use std::sync::mpsc::{channel, Receiver};
//use std::sync::mpsc;
//use std::alloc::System;
//
//#[global_allocator]
//static A: System = System;
/* conditionals */
const FROM_FILE: bool = false;
const FROM_DEVICE: bool = true;
//const NON_CHUNKED: bool = true;
//const CHUNKED: bool = false;
fn query_string(insert_max: &usize, table_name: &str) -> String {
let mut insert_template = String::with_capacity(insert_max * 8 + 96);
insert_template.push_str(&*format!("INSERT INTO {} (packet) Values ", table_name));
for insert in 0..insert_max - 1 {
insert_template.push_str(&*format!("(${}), ", insert + 1));
}
insert_template.push_str(&*format!("(${})", insert_max));
insert_template
}
#[tokio::main(core_threads = 4)]
async fn main() -> Result<(), Error> {
/* Init values from file */
let config: configure::Config = configure::from_json_file().unwrap();
let pcap_map = configure::map_pcap_dir(&config.pcap_dir).unwrap();
// TODO: Create db table with pcap file hashes
// TODO: hash file metadata, so its state is comparable with future file updates and can be written to a db table (and read e.g. after system crash)
// This db table should include UUIDs as primary keys, so it can be joined effectively with past and future runs.
// TODO: Use inotify crate to update pcap_map according to files created while parser is running
/* db connection */
let (client, connection) = tokio_postgres::connect(&config.connection, NoTls).await?;
tokio::spawn(async move {
if let Err(e) = connection.await {
eprintln!("connection error: {}", e);
}
});
client
.execute(&*format!("DROP TABLE IF EXISTS {}", &config.tablename), &[])
.await?;
client
.execute(
&*format!(
"CREATE TABLE {} ( ID serial NOT NULL PRIMARY KEY, packet json NOT NULL)",
&config.tablename
),
&[],
)
.await?;
/* device or file input */
match config.is_device {
FROM_FILE => {
for (_pcap_file, _pcap_info) in pcap_map.iter() {
//println!("{:?}: {:?}", &_pcap_file, &_pcap_info);
/* MPSC channeled serialization */
// let (qry_data, h1) = parser::mpsc_parser(_pcap_file.to_owned(), config.filter.to_owned(), config.regex_filter.to_owned());
// let (data_serialized, h2) = serializer::mpsc_serialize(qry_data);
// let packets_serialized = serializer::mpsc_collect_serialized(data_serialized);
// let _r1 = h1.join().unwrap();
// let _r2 = h2.join().unwrap();
/* This is serializing data without mpsc, which results in higher memory consumption, it's faster but 12GB main memory is needed */
// let v: Vec<parser::QryData> =
// parser::parse(&_pcap_file, &config.filter, &config.regex_filter);
// let len = v.len();
/* tokio mpsc channel */
let (tx, mut rx) = mpsc::channel(1000);
let pcap_file = _pcap_file.clone();
let filter = config.filter.clone();
let regex_filter = config.regex_filter.clone();
let join_handle: task::JoinHandle<Vec<parser::QryData>> = task::spawn( async move {
parser::tokio_parse(pcap_file, &filter, &regex_filter).await
});
let v = join_handle.await.unwrap();
for packet in v.into_iter(){
let mut tx = tx.clone();
tokio::spawn( async move {
//println!("serializing!number {:?}", i);
let packet_serialized = serializer::tokio_serialize(packet).await;
tx.send(packet_serialized).await.unwrap();
});
}
drop(tx);
// let mut packets_serialized: Vec<serde_json::Value> = Vec::new();
// while let Some(res) = rx.recv().await {
// //println!("collecting");
// packets_serialized.push(res);
// //let packets_serialized = serializer::serialize_packets(v);
// }
let sink = client.copy_in("COPY json_dump(packet) from STDIN BINARY").await.unwrap();
let writer = BinaryCopyInWriter::new(sink, &[Type::JSON]);
let join = task::spawn( async move {
pin_mut!(writer);
//for pack in packets_serialized {
while let Some(res) = rx.recv().await {
writer.as_mut().write(&[&res]).await.unwrap();
drop(res);
// Reminder: write_raw() behavior is very strange, so it's write() for now.
// writer.as_mut().write_raw(chunk.into_iter().map(|p| p as &dyn ToSql).collect()).await.unwrap();
}
//thread::sleep(time::Duration::from_millis(3000));
writer.finish().await.unwrap();
});
assert!(join.await.is_ok());
// TODO: MPSC channel
// let mut v = Vec::<parser::QryData>::with_capacity(100000);
// v.extend(parser::parse(&_pcap_file, &config.filter, &config.regex_filter));
// let mut packets_serialized = Vec::<serde_json::Value>::with_capacity(100000);
// packets_serialized.extend(serializer::serialize_packets(v));
// Reminder: If COPY doesn't cut it and INSERT is the way to go, uncomment and use following logic inside FROM_FILE
// /* Do chunks and query data */
// let chunker = (&packets_serialized.len() < &config.insert_max) && (0 < packets_serialized.len()) ;
// match chunker {
// NON_CHUNKED => {
// let insert_str = query_string(&packets_serialized.len(), &config.tablename);
// let statement = client.prepare(&insert_str).await?;
// client
// .query_raw(
// &statement,
// packets_serialized.iter().map(|p| p as &dyn ToSql),
// )
// .await?;
// }
// CHUNKED => {
// let insert_str = query_string(&config.insert_max, &config.tablename);
// let statement = client.prepare(&insert_str).await?;
//
// for chunk in packets_serialized.chunks_exact(config.insert_max) {
// client
// .query_raw(&statement, chunk.iter().map(|p| p as &dyn ToSql))
// .await?;
// }
// let remainder_len = packets_serialized
// .chunks_exact(config.insert_max)
// .remainder()
// .len();
// if 0 < remainder_len {
// let rem_str = query_string(&remainder_len, &config.tablename);
// let statement = client.prepare(&rem_str).await?;
// client
// .query_raw(
// &statement,
// packets_serialized
// .chunks_exact(config.insert_max)
// .remainder()
// .iter()
// .map(|p| p as &dyn ToSql),
// )
// .await?;
// }
// }
// }
}
}
FROM_DEVICE => {
let insert_str = query_string(&config.insert_max, &config.tablename);
let statement = client.prepare(&insert_str).await?;
loop {
let v: Vec<parser::QryData> = parser::parse_device(
&config.device,
&config.filter,
&config.insert_max,
&config.regex_filter,
);
let packets_serialized = serializer::serialize_packets(v);
client
.query_raw(
&statement,
packets_serialized.iter().map(|p| p as &dyn ToSql),
)
.await?;
}
}
}
Ok(())
}

View File

@ -1,6 +1,6 @@
{ {
"insert_max": 20000, "insert_max": 20000,
"filter": "ip6 && tcp", "filter": "tcp",
"regex_filter": "(?:http|https)[[:punct:]]+[[:alnum:]]+[[:punct:]][[:alnum:]]+[[:punct:]](?:com|de|org|net)", "regex_filter": "(?:http|https)[[:punct:]]+[[:alnum:]]+[[:punct:]][[:alnum:]]+[[:punct:]](?:com|de|org|net)",
"from_device": false, "from_device": false,
"parse_device": "enp7s0", "parse_device": "enp7s0",
@ -8,6 +8,6 @@
"pcap_dir": "../target", "pcap_dir": "../target",
"database_tablename": "json_dump", "database_tablename": "json_dump",
"database_user": "postgres", "database_user": "postgres",
"database_host": "localhost", "database_host": "192.168.0.11",
"database_password": "password" "database_password": "docker"
} }

View File

@ -7,8 +7,11 @@ use regex::bytes::Regex;
use std::convert::TryInto; use std::convert::TryInto;
use std::str; use std::str;
use serde::Serialize; use serde::Serialize;
//use std::thread::{spawn, JoinHandle}; use std::thread::{spawn, JoinHandle};
//use std::sync::mpsc::{channel, Receiver}; use std::sync::mpsc::{channel, Receiver};
use tokio::sync::mpsc;
use tokio::stream::{self, StreamExt};
use tokio::task;
/* protocol ids, LittleEndian */ /* protocol ids, LittleEndian */
const ETH_P_IPV6: usize = 0xDD86; const ETH_P_IPV6: usize = 0xDD86;
@ -188,6 +191,7 @@ impl QryData {
/* Regex parse _complete_ package */ /* Regex parse _complete_ package */
fn flag_carnage(re: &Regex, payload: &[u8]) -> Option<String> { fn flag_carnage(re: &Regex, payload: &[u8]) -> Option<String> {
let mut flags: String = String::new(); let mut flags: String = String::new();
if !re.as_str().is_empty() {
for mat in re.find_iter(payload) { for mat in re.find_iter(payload) {
// TODO: Test benchmark format! vs. push_str() // TODO: Test benchmark format! vs. push_str()
// flags.push_str(&format!("{} ",std::str::from_utf8(mat.as_bytes()).unwrap())); // flags.push_str(&format!("{} ",std::str::from_utf8(mat.as_bytes()).unwrap()));
@ -195,15 +199,17 @@ fn flag_carnage(re: &Regex, payload: &[u8]) -> Option<String> {
flags.push_str(std::str::from_utf8(mat.as_bytes()).unwrap()); flags.push_str(std::str::from_utf8(mat.as_bytes()).unwrap());
flags.push_str(";"); flags.push_str(";");
} }
if flags.len() > 0{
println!("{:?}", flags);
} }
match 0 < flags.len() { if !flags.is_empty(){
false => None, // println!("{:?}", flags);
true => Some(flags), }
match flags.is_empty() {
true => None,
false => Some(flags),
} }
} }
#[allow(dead_code)]
pub fn parse(parse_file: &std::path::Path, filter_str: &str, regex_filter: &str) -> Vec<QryData> { pub fn parse(parse_file: &std::path::Path, filter_str: &str, regex_filter: &str) -> Vec<QryData> {
let mut v: Vec<QryData> = Vec::new(); let mut v: Vec<QryData> = Vec::new();
let mut cap = Capture::from_file(parse_file).unwrap(); let mut cap = Capture::from_file(parse_file).unwrap();
@ -220,21 +226,22 @@ pub fn parse(parse_file: &std::path::Path, filter_str: &str, regex_filter: &str)
}; };
me.time = (packet.header.ts.tv_usec as f64 / 1000000.0) + packet.header.ts.tv_sec as f64; me.time = (packet.header.ts.tv_usec as f64 / 1000000.0) + packet.header.ts.tv_sec as f64;
me.reg_res = Some(flag_carnage(&re, packet.data)).unwrap(); // Regex overhead is between 4-9% --single threaded-- on complete packet [u8] data me.reg_res = Some(flag_carnage(&re, packet.data)).unwrap(); // Regex parser on complete packet [u8] data
//v.push(me.clone()); //v.push(me.clone());
v.push(me.clone());
v.push(QryData { /* TODO: Will clone() call destructors correctly and the method below won't? */
id: 0, // v.push(QryData {
time: me.time, // id: 0,
data: me.data, // time: me.time,
ether_header: me.ether_header, // data: me.data,
ipv4_header: me.ipv4_header, // ether_header: me.ether_header,
ipv6_header: me.ipv6_header, // ipv4_header: me.ipv4_header,
tcp_header: me.tcp_header, // ipv6_header: me.ipv6_header,
udp_header: me.udp_header, // tcp_header: me.tcp_header,
arp_header: me.arp_header, // udp_header: me.udp_header,
reg_res: me.reg_res, // arp_header: me.arp_header,
}); // reg_res: me.reg_res,
// });
} }
v v
} }
@ -272,3 +279,106 @@ pub fn parse_device(
} }
v v
} }
#[allow(dead_code)]
pub async fn mpsc_parser (parse_file: std::path::PathBuf, filter_str: String, regex_filter: String) -> (Receiver<QryData>, JoinHandle<()>) {
let (sender, receiver) = channel();
let handle = spawn( move || {
let mut cap = Capture::from_file(parse_file).unwrap();
Capture::filter(&mut cap, &filter_str).unwrap();
let linktype = cap.get_datalink();
//println!("{:?}", &linktype);
let re = Regex::new(&regex_filter).unwrap();
while let Ok(packet) = cap.next() {
let mut me = QryData::new();
match linktype {
Linktype(1) => me.encap_en10mb(packet.data).unwrap(), // I reversed encapsulation/linktype bytes in pcap/pcapng file by looking at https://www.tcpdump.org/linktypes.html
Linktype(12) => me.encap_raw(packet.data).unwrap(), // Either this source + my implementation is wrong or pcap crate sucks
_ => (),
};
me.time = (packet.header.ts.tv_usec as f64 / 1000000.0) + packet.header.ts.tv_sec as f64;
me.reg_res = Some(flag_carnage(&re, packet.data)).unwrap(); // Regex parser on complete packet [u8] data
//v.push(me.clone());
if sender.send(me.clone()).is_err(){
break;
}
}
});
(receiver, handle)
}
#[allow(dead_code)]
pub async fn tokio_parse <'a> (parse_file: std::path::PathBuf, filter_str: &'a str, regex_filter: &'a str ) -> Vec<QryData> {
let mut v: Vec<QryData> = Vec::new();
let mut cap = Capture::from_file(parse_file).unwrap();
Capture::filter(&mut cap, &filter_str).unwrap();
let linktype = cap.get_datalink();
// println!("{:?}", &linktype);
let re = Regex::new(&regex_filter).unwrap();
while let Ok(packet) = cap.next() {
let mut me = QryData::new();
match linktype {
Linktype(1) => me.encap_en10mb(packet.data).unwrap(), // I reversed encapsulation/linktype bytes in pcap/pcapng file by looking at https://www.tcpdump.org/linktypes.html
Linktype(12) => me.encap_raw(packet.data).unwrap(), // Either this source + my implementation is wrong or pcap crate sucks
_ => (),
};
me.time = (packet.header.ts.tv_usec as f64 / 1000000.0) + packet.header.ts.tv_sec as f64;
me.reg_res = Some(flag_carnage(&re, packet.data)).unwrap(); // Regex parser on complete packet [u8] data
v.push(me.clone());
// drop(me);
//std::mem::replace(&mut me, QryData::new());
}
v
}
//pub async fn tokio_parse(parse_file: &std::path::Path, filter_str: &str, regex_filter: &str) -> tokio::stream::Iter<Result<QryData>> {
// //let mut v: Vec<QryData> = Vec::new();
// let mut cap = Capture::from_file(parse_file).unwrap();
// Capture::filter(&mut cap, &filter_str).unwrap();
// let linktype = cap.get_datalink();
// let re = Regex::new(regex_filter).unwrap();
// while let Ok(packet) = cap.next() {
// let mut me = QryData::new();
// match linktype {
// Linktype(1) => me.encap_en10mb(packet.data).unwrap(), // I reversed encapsulation/linktype bytes in pcap/pcapng file by looking at https://www.tcpdump.org/linktypes.html
// Linktype(12) => me.encap_raw(packet.data).unwrap(), // Either this source + my implementation is wrong or pcap crate sucks
// _ => (),
// };
//
// me.time = (packet.header.ts.tv_usec as f64 / 1000000.0) + packet.header.ts.tv_sec as f64;
// me.reg_res = Some(flag_carnage(&re, packet.data)).unwrap(); // Regex parser on complete packet [u8] data
// //v.push(me.clone());
// tx.send
// }
// let x = stream::iter(&mut v).collect().await
//}
//pub async fn tokio_parse (parse_file: std::path::PathBuf, filter_str: &'static str, regex_filter: &'static str) {
// //let mut v: Vec<QryData> = Vec::new();
// let (mut tx, rx) = mpsc::channel(1000);
//
// tokio::spawn (async move {
// let mut cap = Capture::from_file(parse_file).unwrap();
// Capture::filter(&mut cap, &filter_str).unwrap();
// let linktype = cap.get_datalink();
// let re = Regex::new(regex_filter).unwrap();
// while let Ok(packet) = cap.next() {
// let mut me = QryData::new();
// match linktype {
// Linktype(1) => me.encap_en10mb(packet.data).unwrap(), // I reversed encapsulation/linktype bytes in pcap/pcapng file by looking at https://www.tcpdump.org/linktypes.html
// Linktype(12) => me.encap_raw(packet.data).unwrap(), // Either this source + my implementation is wrong or pcap crate sucks
// _ => (),
// };
//
// me.time = (packet.header.ts.tv_usec as f64 / 1000000.0) + packet.header.ts.tv_sec as f64;
// me.reg_res = Some(flag_carnage(&re, packet.data)).unwrap(); // Regex parser on complete packet [u8] data
// //v.push(me.clone());
// tx.send(me.clone()).await.unwrap();
// }
//
// });
//
//}

View File

@ -1,8 +1,12 @@
extern crate serde_json; extern crate serde_json;
use crate::parser; use crate::parser;
use rayon::prelude::*; use rayon::prelude::*;
//use serde::ser::{Serialize, SerializeStruct, Serializer};
use std::thread::{spawn, JoinHandle};
use std::sync::mpsc::{channel, Receiver};
// This is not needed atm
//use serde::ser::{Serialize, SerializeStruct, Serializer};
//impl Serialize for parser::QryData { //impl Serialize for parser::QryData {
// fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error> // fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
// where // where
@ -30,7 +34,6 @@ pub fn serialize_packets(v: Vec<parser::QryData>) -> Vec<serde_json::Value> {
.map(|x| serde_json::to_value(x).unwrap()) .map(|x| serde_json::to_value(x).unwrap())
.collect(); .collect();
// let packets_serialized: Vec<serde_json::Value> = v.par_iter().map(|x| json!(x)).collect(); // let packets_serialized: Vec<serde_json::Value> = v.par_iter().map(|x| json!(x)).collect();
packets_serialized packets_serialized
} }
@ -46,3 +49,32 @@ pub fn serialize_packets_as_string(v: Vec<parser::QryData>) -> Vec<serde_json::V
packets_serialized packets_serialized
} }
#[allow(dead_code)]
pub fn mpsc_serialize(packet: Receiver<parser::QryData>) -> (Receiver<serde_json::Value>, JoinHandle<()>) {
let (sender, receiver) = channel();
let handle = spawn( move || {
for p in packet{
let serialized = serde_json::to_value(p).unwrap();
if sender.send(serialized).is_err(){
return;
}
}
});
(receiver, handle)
}
#[allow(dead_code)]
pub fn mpsc_collect_serialized( packet: Receiver<serde_json::Value> ) -> Vec<serde_json::Value> {
let mut packets_serialized: Vec<serde_json::Value> = Vec::new();
for p in packet {
packets_serialized.push(p);
}
packets_serialized
}
pub async fn tokio_serialize(packet: parser::QryData) -> serde_json::Value {
serde_json::to_value(packet).unwrap()
}