# Hadoop Distributed storage and computing * [Hadoop Attack Libs](https://github.com/wavestone-cdt/hadoop-attack-library.git) ## Terminology * __Cluster__, forms the datalake * __Node__, single host inside the cluster * __NameNode__, node that keeps the dir tree of the Hadoop file system * __DataNode__, slave node that stores files and is instructed by the NameNode * __Primary NameNode__, current active node responsible for keeping the directory structure * __Secondary NameNode__, hot standby for Primary NameNode. There may be multiple on standby inside the cluster * __Master Node__, Hadoop management app like HDFS or YARN Manager * __Slave Node__, Hadoop worker like HDFS or MapReduce. a node can be master and slave at the same time * __Edge Node__, hosting Hadoop user app like Zeppelin or Hue * __Kerberised__, security enabled cluster through Kerberos * __HDFS__, Hadoop Distributed File System, storage device for unstructured data * __Hive__, primary DB for structured data * __YARN__, scheduling jobs and resource management * __MapReduce__, distributed filtering, sorting and reducing * __HUE__, GUI for HDFS and Hive * __Zookeeper__, cluster management * __Kafka__, message broker * __Ranger__, privileged ACL * __Zeppelin__, data analytivs inside a webUI ## Zeppelin * Try [default logins](https://zeppelin.apache.org/docs/0.8.2/setup/security/shiro_authentication.html#4-login) * Try execution inside notebooks ## Ktabs * Finding `ktpass`es to authenticate at the kerberos TGS * Output principals and use them to init ```sh klist -k kinit -k -V -t ``` ## HDFS * User the `hdfs` utility to enumerate the distributed network storage ```sh hdfs dfs -ls / ``` * Current user and user on the storage do not have to correspond * Touched files on the storage may be owned by root ```sh hdfs dfs -touchz testfile /tmp/testfile hdfs dfs -ls /tmp ``` * Impersonate by sourcing keytab file of the user, __NodeManager__ is the highest user in regards to permission