yao
  • 前言
  • Mac环境配置
    • 1.python配置
  • 部署安装
    • 博客blog文档md模板
    • linux安装python3
    • Docker部署
      • Compose-bigdatabase-单组件启动(docker)
      • Compose-Spark组件-单独启动(docker)
      • Docker-通过dockerfile构建镜像(需测试)
      • docker-yaosong5仓库dockerImages汇总
      • Docker基础镜像中-所有home配置
    • gp部署
      • tt
      • 命令和第二次搭建-ok
      • gp搭建遇到的错误
      • gp集群搭建①-下载greenplum
      • gp集群搭建②-新建用户免密登录
      • gp集群搭建③-greenplum系统搭建
  • 实际问题解决
    • kafka异常
    • 数仓-解决hive处理异常json命令行转义字符的问题
  • 数据分析
    • 数据分析总结-pandas
    • 数据分析总结-numpy
  • 应用环境
    • Docker环境启动流程
Powered by GitBook
On this page
  • DockerFile代码注意事项
  • 单个启动组件
  • 启动 worker 容器
  • 启动 historyserver 容器
  • 运行spark程序
  • 将spark程序日志放在hadoop的【hdfs】分布式文件系统。也可以挂在在本地。
  • docker-compose部署服务

Was this helpful?

  1. 部署安装
  2. Docker部署

Compose-Spark组件-单独启动(docker)

DockerFile代码注意事项

FROM openjdk:8u131-jre-alpine
# openjdk:8u131-jre-alpine作为基础镜像,体积小,自带jvm环境。
MAINTAINER <eway>
# 切换root用户避免权限问题,utf8字符集,bash解释器。安装一个同步为上海时区的时间
USER root
ENV LANG=C.UTF-8
RUN apk add --no-cache  --update-cache bash
ENV TZ=Asia/Shanghai
RUN apk --update add wget bash tzdata \
    && cp /usr/share/zoneinfo/$TZ /etc/localtime \
    && echo $TZ > /etc/timezone
# 下载解压spark
WORKDIR /usr/local
RUN wget "http://www.apache.org/dist/spark/spark-2.0.2/spark-2.0.2-bin-hadoop2.7.tgz" \
   && tar -zxvf spark-*  \
   && mv spark-2.0.2-bin-hadoop2.7 spark \
   && rm -rf spark-2.0.2-bin-hadoop2.7.tgz

# 配置环境变量、暴露端口
ENV SPARK_HOME=/usr/local/spark
ENV JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
ENV PATH=${PATH}:${JAVA_HOME}/bin:${SPARK_HOME}/bin

EXPOSE 6066 7077 8080 8081 4044
WORKDIR $SPARK_HOME
CMD ["/bin/bash"]

单个启动组件

# 启动 master 容器
docker run -itd --name spark-master --net=br  -h spark-master yaosong5/bigdata:2.0  sh -c " source /etc/profile && spark-class org.apache.spark.deploy.master.Master"

启动 worker 容器

docker run -itd  --net=br --link  spark-master:worker01 yaosong5/bigdata:2.0 sh -c " source /etc/profile && spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077 "


docker run -it  --net=br --link  spark-master:worker01 yaosong5/bigdata:2.0 sh -c " source /etc/profile && spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077 && ping namenode "

注:--link <目标容器名> 该参数的主要意思是本容器需要使用目标容器的服务,所以指定了容器的启动顺序,并且在该容器的 hosts 文件中添加目标容器名的 DNS。

启动 historyserver 容器

docker run -itd --net=br --name=spark-history --link namenode:namenode --link  spark-master \
-e SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 \
-Dspark.history.retainedApplications=10 \
-Dspark.history.fs.logDirectory=hdfs://namenode:9000/sparkhistory" \
yaosong5/bigdata:2.0   sh -c " source /etc/profile  && ping namenode && spark-class org.apache.spark.deploy.history.HistoryServer"



docker run -itd  --net=br --link  spark-master:worker01 yaosong5/bigdata:2.0 sh -c " source /etc/profile && ping namenode"
docker run -it  --net=br  --link  spark-master   yaosong5/bigdata:2.0 sh -c " source /etc/profile && ping namenode "



docker run -itd --name=history --link  spark-master  \
 --net=br  \
yaosong5/bigdata:2.0  sh -c " source /etc/profile && ping namenode "

docker run -itd  \
 --net=br  --link namenode \
yaosong5/bigdata:2.0  sh -c " source /etc/profile && ping namenode "


-e SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=10 -Dspark.history.fs.logDirectory=hdfs://namenode:9000/sparkhistory" spark-class org.apache.spark.deploy.history.HistoryServer

spark-class org.apache.spark.deploy.history.HistoryServer --conf  spark.history.ui.port=18080 --conf spark.history.retainedApplications=10 --conf spark.history.fs.logDirectory=hdfs://namenode:9000/sparkhistory



/usr/spark/bin/spark-submit --master spark://spark-master:7077 --class org.apache.spark.examples.SparkPi /usr/spark/examples/jars/spark-examples_2.11-2.2.0.jar

运行spark程序

spark-submit --conf spark.eventLog.enabled=true \
--conf spark.eventLog.dir=hdfs://namenode:9000/sparkhistory \
--master spark://namenode:7077 --class org.apache.spark.examples.SparkPi /usr/spark/examples/jars/spark-examples_2.11-2.0.2.jar

运行 spark 程序例子 spark-submit #提交spark程序 --conf spark.eventLog.enabled=true # 允许spark程序日志的记录,最后在ip:18080能查询到 --conf spark.eventLog.dir=hdfs://namenode:9000/user/spark/history \

将spark程序日志放在hadoop的【hdfs】分布式文件系统。也可以挂在在本地。

但分布式文件系统因为分布式的原因他更安全,和能跨主机。 --master spark://namenode:7077 # spark-submit提交spark程序到--master指定的master节点 --class org.apache.spark.examples.SparkPi ./examples/jars/spark-examples_2.11-2.0.2.jar

docker-compose部署服务

version: "2.0"
  master:
    image: yaosong5/hadoop:3.0
    command: bin/spark-class org.apache.spark.deploy.master.Master -h master
    hostname: spark-master
    container_name: spark-master
    network_mode: "br"
    environment:
      MASTER: spark://spark-master:7077
      SPARK_MASTER_OPTS: "-Dspark.eventLog.dir=hdfs://namenode:9000/user/spark/history"
      SPARK_PUBLIC_DNS: spark-master


  worker:
    image: yaosong5/hadoop:3.0
    command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077
    hostname: worker
    container_name: worker
    network_mode: "br"
    environment:
      SPARK_WORKER_CORES: 1
      SPARK_WORKER_MEMORY: 1g
      SPARK_WORKER_PORT: 8881
      SPARK_WORKER_WEBUI_PORT: 8081
      SPARK_PUBLIC_DNS: spark-master
    links:
      - spark-master

 historyServer:
    image: yaosong5/hadoop:3.0
    command: spark-class org.apache.spark.deploy.history.HistoryServer
    hostname: historyServer
    container_name: historyServer
    network_mode: "br"
    environment:
      MASTER: spark://spark-master:7077
      SPARK_PUBLIC_DNS: spark-master
      SPARK_HISTORY_OPTS: "-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://namenode:9000/spark/history"
    links:
      - spark-master
    expose:
      - 18080
PreviousCompose-bigdatabase-单组件启动(docker)NextDocker-通过dockerfile构建镜像(需测试)

Last updated 6 years ago

Was this helpful?