文章目录

配置说明安装hadoop安装Spark测试安装成功

配置说明

Scala - 3.18+ Spark - 3.5.0 Hadoop - 3.3.6

安装hadoop

从这里下载相应版本的hadoop下载后解压,配置系统环境变量

> sudo vim /etc/profile

添加以下两行

export HADOOP_HOME=/Users/collinsliu/hadoop-3.3.6/

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

请自行替换位置 然后执行并生效系统环境变量

> source /etc/profile

安装Spark

从这里下载相应版本的Spark下载后解压,同时类似于hadoop,配置系统环境变量

> sudo vim /etc/profile

添加以下两行

export SPARK_HOME=/Users/collinsliu/spark-3.5.0

export PATH=$PATH:$SPARK_HOME/bin

请自行替换位置 然后执行并生效系统环境变量

> source /etc/profile

然后配置spark连接hadoop,形成local模式: a. 首先进入conf文件夹

> cd /Users/collinsliu/spark-3.5.0/conf

b. 其次替换配置文件

> cp spark-env.sh.template spark-env.sh

> vim spark-env.sh

c. 添加以下三条连接,使得spark能够找到对应的hadoop和相应的包

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_311.jdk/Contents/Home

export HADOOP_CONF_DIR=/Users/collinsliu/hadoop-3.3.6/etc/hadoop

export SPARK_DIST_CLASSPATH=$(/Users/collinsliu/hadoop-3.3.6/bin/hadoop classpath)

测试安装成功

使用内置命令测试

> cd /Users/collinsliu/spark-3.5.0/

> ./run-example SparkPi

可以看到很多输出,最后找到

...

24/02/07 00:31:33 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks resource profile 0

24/02/07 00:31:33 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0) (192.168.0.100, executor driver, partition 0, PROCESS_LOCAL, 8263 bytes)

24/02/07 00:31:33 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1) (192.168.0.100, executor driver, partition 1, PROCESS_LOCAL, 8263 bytes)

24/02/07 00:31:33 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)

24/02/07 00:31:33 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)

24/02/07 00:31:34 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1101 bytes result sent to driver

24/02/07 00:31:34 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1101 bytes result sent to driver

24/02/07 00:31:34 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 1120 ms on 192.168.0.100 (executor driver) (1/2)

24/02/07 00:31:34 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 923 ms on 192.168.0.100 (executor driver) (2/2)

24/02/07 00:31:34 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool

24/02/07 00:31:34 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 1.737 s

24/02/07 00:31:34 INFO DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job

24/02/07 00:31:34 INFO TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished

24/02/07 00:31:34 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 1.807145 s

Pi is roughly 3.1405357026785135

说明安装成功 2. 打开sparkshell

> spark-shell

出现以下内容

24/02/07 00:48:12 WARN Utils: Your hostname, Collinss-MacBook-Air.local resolves to a loopback address: 127.0.0.1; using 192.168.0.100 instead (on interface en0)

24/02/07 00:48:12 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address

Setting default log level to "WARN".

To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Welcome to

____ __

/ __/__ ___ _____/ /__

_\ \/ _ \/ _ `/ __/ '_/

/___/ .__/\_,_/_/ /_/\_\ version 3.5.0

/_/

Using Scala version 2.13.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_311)

Type in expressions to have them evaluated.

Type :help for more information.

24/02/07 00:48:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Spark context Web UI available at http://192.168.0.100:4040

Spark context available as 'sc' (master = local[*], app id = local-1707238103536).

Spark session available as 'spark'.

scala>

说明安装成功

好文推荐

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: