Compose-Spark组件-单独启动(docker)

DockerFile代码注意事项

FROM openjdk:8u131-jre-alpine
# openjdk:8u131-jre-alpine作为基础镜像,体积小,自带jvm环境。
MAINTAINER <eway>
# 切换root用户避免权限问题,utf8字符集,bash解释器。安装一个同步为上海时区的时间
USER root
ENV LANG=C.UTF-8
RUN apk add --no-cache  --update-cache bash
ENV TZ=Asia/Shanghai
RUN apk --update add wget bash tzdata \
    && cp /usr/share/zoneinfo/$TZ /etc/localtime \
    && echo $TZ > /etc/timezone
# 下载解压spark
WORKDIR /usr/local
RUN wget "http://www.apache.org/dist/spark/spark-2.0.2/spark-2.0.2-bin-hadoop2.7.tgz" \
   && tar -zxvf spark-*  \
   && mv spark-2.0.2-bin-hadoop2.7 spark \
   && rm -rf spark-2.0.2-bin-hadoop2.7.tgz

# 配置环境变量、暴露端口
ENV SPARK_HOME=/usr/local/spark
ENV JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
ENV PATH=${PATH}:${JAVA_HOME}/bin:${SPARK_HOME}/bin

EXPOSE 6066 7077 8080 8081 4044
WORKDIR $SPARK_HOME
CMD ["/bin/bash"]

单个启动组件

# 启动 master 容器
docker run -itd --name spark-master --net=br  -h spark-master yaosong5/bigdata:2.0  sh -c " source /etc/profile && spark-class org.apache.spark.deploy.master.Master"

启动 worker 容器

docker run -itd  --net=br --link  spark-master:worker01 yaosong5/bigdata:2.0 sh -c " source /etc/profile && spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077 "


docker run -it  --net=br --link  spark-master:worker01 yaosong5/bigdata:2.0 sh -c " source /etc/profile && spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077 && ping namenode "

注:--link <目标容器名> 该参数的主要意思是本容器需要使用目标容器的服务,所以指定了容器的启动顺序,并且在该容器的 hosts 文件中添加目标容器名的 DNS。

启动 historyserver 容器

docker run -itd --net=br --name=spark-history --link namenode:namenode --link  spark-master \
-e SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 \
-Dspark.history.retainedApplications=10 \
-Dspark.history.fs.logDirectory=hdfs://namenode:9000/sparkhistory" \
yaosong5/bigdata:2.0   sh -c " source /etc/profile  && ping namenode && spark-class org.apache.spark.deploy.history.HistoryServer"



docker run -itd  --net=br --link  spark-master:worker01 yaosong5/bigdata:2.0 sh -c " source /etc/profile && ping namenode"
docker run -it  --net=br  --link  spark-master   yaosong5/bigdata:2.0 sh -c " source /etc/profile && ping namenode "



docker run -itd --name=history --link  spark-master  \
 --net=br  \
yaosong5/bigdata:2.0  sh -c " source /etc/profile && ping namenode "

docker run -itd  \
 --net=br  --link namenode \
yaosong5/bigdata:2.0  sh -c " source /etc/profile && ping namenode "


-e SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://namenode:9000/sparkhistory" spark-class org.apache.spark.deploy.history.HistoryServer

spark-class org.apache.spark.deploy.history.HistoryServer --conf  spark.history.ui.port=18080 --conf spark.history.retainedApplications=10 --conf spark.history.fs.logDirectory=hdfs://namenode:9000/sparkhistory



/usr/spark/bin/spark-submit --master spark://spark-master:7077 --class org.apache.spark.examples.SparkPi /usr/spark/examples/jars/spark-examples_2.11-2.2.0.jar

运行spark程序

spark-submit --conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=hdfs://namenode:9000/sparkhistory \
--master spark://namenode:7077 --class org.apache.spark.examples.SparkPi /usr/spark/examples/jars/spark-examples_2.11-2.0.2.jar

运行 spark 程序例子 spark-submit #提交spark程序 --conf spark.eventLog.enabled=true # 允许spark程序日志的记录,最后在ip:18080能查询到 --conf spark.eventLog.dir=hdfs://namenode:9000/user/spark/history \

将spark程序日志放在hadoop的【hdfs】分布式文件系统。也可以挂在在本地。

但分布式文件系统因为分布式的原因他更安全,和能跨主机。 --master spark://namenode:7077 # spark-submit提交spark程序到--master指定的master节点 --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.11-2.0.2.jar

docker-compose部署服务

version: "2.0"
  master:
    image: yaosong5/hadoop:3.0
    command: bin/spark-class org.apache.spark.deploy.master.Master -h master
    hostname: spark-master
    container_name: spark-master
    network_mode: "br"
    environment:
      MASTER: spark://spark-master:7077
      SPARK_MASTER_OPTS: "-Dspark.eventLog.dir=hdfs://namenode:9000/user/spark/history"
      SPARK_PUBLIC_DNS: spark-master


  worker:
    image: yaosong5/hadoop:3.0
    command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
    hostname: worker
    container_name: worker
    network_mode: "br"
    environment:
      SPARK_WORKER_CORES: 1
      SPARK_WORKER_MEMORY: 1g
      SPARK_WORKER_PORT: 8881
      SPARK_WORKER_WEBUI_PORT: 8081
      SPARK_PUBLIC_DNS: spark-master
    links:
      - spark-master

 historyServer:
    image: yaosong5/hadoop:3.0
    command: spark-class org.apache.spark.deploy.history.HistoryServer
    hostname: historyServer
    container_name: historyServer
    network_mode: "br"
    environment:
      MASTER: spark://spark-master:7077
      SPARK_PUBLIC_DNS: spark-master
      SPARK_HISTORY_OPTS: "-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://namenode:9000/spark/history"
    links:
      - spark-master
    expose:
      - 18080

Last updated

Was this helpful?