In the Open

Hadoop

Distributed storage and computing * Hadoop Attack Libs

Terminology

  • Cluster, forms the datalake
  • Node, single host inside the cluster
  • NameNode, node that keeps the dir tree of the Hadoop file system
  • DataNode, slave node that stores files and is instructed by the NameNode
  • Primary NameNode, current active node responsible for keeping the directory structure
  • Secondary NameNode, hot standby for Primary NameNode. There may be multiple on standby inside the cluster
  • Master Node, Hadoop management app like HDFS or YARN Manager
  • Slave Node, Hadoop worker like HDFS or MapReduce. a node can be master and slave at the same time
  • Edge Node, hosting Hadoop user app like Zeppelin or Hue
  • Kerberised, security enabled cluster through Kerberos

  • HDFS, Hadoop Distributed File System, storage device for unstructured data

  • Hive, primary DB for structured data
  • YARN, scheduling jobs and resource management
  • MapReduce, distributed filtering, sorting and reducing
  • HUE, GUI for HDFS and Hive
  • Zookeeper, cluster management
  • Kafka, message broker
  • Ranger, privileged ACL
  • Zeppelin, data analytivs inside a webUI

Zeppelin

Ktabs

  • Finding ktpasses to authenticate at the kerberos TGS
  • Output principals and use them to init
klist -k <keytabfile>
kinit <prinicpal name> -k -V -t <keytabfile>

HDFS

  • User the hdfs utility to enumerate the distributed network storage
hdfs dfs -ls /
  • Current user and user on the storage do not have to correspond
  • Touched files on the storage may be owned by root
hdfs dfs -touchz  testfile /tmp/testfile
hdfs dfs -ls /tmp
  • Impersonate by sourcing keytab file of the user, NodeManager is the highest user in regards to permission