Ri Xu Online

Change the life in a geek way

Hadoop

Running Apache Spark on YARN with Docker

Introduction Apache Spark is a fast and general engine for large-scale data processing. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation that has maintained it since. Spark provides…

Ri Xu March 27, 2016 Linux 4 Comments

Setup Hadoop on Ubuntu (Multi-Node Cluster)

Running Hadoop on Ubuntu Linux (Multi-Node Cluster) From single-node clusters to a multi-node cluster We will build a multi-node cluster merge three or more single-node clusters into one multi-node cluster in which one Ubuntu box will become the designated master,…

Ri Xu March 22, 2016 Linux 3 Comments

Setup Hadoop on Ubuntu (Single-Node Cluster)

Wiki Apache Hadoop is a set of algorithms (an open-source software framework written in Java) for distributed storage and distributed processing of very large data sets (Big Data) on computer clusters built from commodity hardware. All the modules in Hadoop…

Ri Xu March 9, 2015 Linux 2 Comments