Hadoop
Distributed storage and computing
* Hadoop Attack Libs
Terminology
- Cluster, forms the datalake
- Node, single host inside the cluster
- NameNode, node that keeps the dir tree of the Hadoop file system
- DataNode, slave node that stores files and is instructed by the NameNode
- Primary NameNode, current active node responsible for keeping the directory structure
- Secondary NameNode, hot standby for Primary NameNode. There may be multiple on standby inside the cluster
- Master Node, Hadoop management app like HDFS or YARN Manager
- Slave Node, Hadoop worker like HDFS or MapReduce. a node can be master and slave at the same time
- Edge Node, hosting Hadoop user app like Zeppelin or Hue
-
Kerberised, security enabled cluster through Kerberos
-
HDFS, Hadoop Distributed File System, storage device for unstructured data
- Hive, primary DB for structured data
- YARN, scheduling jobs and resource management
- MapReduce, distributed filtering, sorting and reducing
- HUE, GUI for HDFS and Hive
- Zookeeper, cluster management
- Kafka, message broker
- Ranger, privileged ACL
- Zeppelin, data analytivs inside a webUI
Zeppelin
Ktabs
- Finding
ktpass
es to authenticate at the kerberos TGS
- Output principals and use them to init
klist -k <keytabfile>
kinit <prinicpal name> -k -V -t <keytabfile>
HDFS
- User the
hdfs
utility to enumerate the distributed network storage
- Current user and user on the storage do not have to correspond
- Touched files on the storage may be owned by root
hdfs dfs -touchz testfile /tmp/testfile
hdfs dfs -ls /tmp
- Impersonate by sourcing keytab file of the user, NodeManager is the highest user in regards to permission
Social_engineering