目录

1 Submitting Applications 

1 提交应用 

2 Bundling Your Application’s Dependencies

2 捆绑应用程序的依赖

3 Launching Applications with spark-submit

3 使用spark-submit启动应用程序

4 Master URLs 

5 Loading Configuration from a File

5 从文件加载配置

6 Advanced Dependency Management

6 高级依赖管理

8 More Information 

8 更多信息 

1 Submitting Applications 

1 提交应用 

The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application especially for each one. Spark的 bin 目录中的 spark-submit 脚本用于在集群上启动应用程序。它可以通过统一的接口使用所有Spark支持的集群管理器,因此您不必为每个集群管理器配置应用程序。

2 Bundling Your Application’s Dependencies

2 捆绑应用程序的依赖

If your code depends on other projects, you will need to package them alongside your application in order to distribute the code to a Spark cluster. To do this, create an assembly jar (or “uber” jar) containing your code and its dependencies. Both sbt and Maven have assembly plugins. When creating assembly jars, list Spark and Hadoop as provided dependencies; these need not be bundled since they are provided by the cluster manager at runtime. Once you have an assembled jar you can call the bin/spark-submit script as shown here while passing your jar. 如果您的代码依赖于其他项目,则需要将它们与应用程序一起打包,以便将代码分发到Spark集群。为此,创建一个包含代码及其依赖项的组装jar(或“uber”jar)。sbt和Maven都有汇编插件。在创建assembly jar时,将Spark和Hadoop列为 provided 依赖项;这些不需要捆绑,因为它们是由集群管理器在运行时提供的。一旦你有了一个组装好的jar,你就可以在传递你的jar的时候调用这里所示的 bin/spark-submit 脚本。

For Python, you can use the --py-files argument of spark-submit to add .py, .zip or .egg files to be distributed with your application. If you depend on multiple Python files we recommend packaging them into a .zip or .egg. For third-party Python dependencies, see Python Package Management. 对于Python,您可以使用 spark-submit 的 --py-files 参数来添加要与应用程序一起分发的 .py 、 .zip 或 .egg 文件。如果你依赖于多个Python文件,我们建议将它们打包到 .zip 或 .egg 中。有关第三方Python依赖项,请参阅Python包管理。

3 Launching Applications with spark-submit

3 使用spark-submit启动应用程序

Once a user application is bundled, it can be launched using the bin/spark-submit script. This script takes care of setting up the classpath with Spark and its dependencies, and can support different cluster managers and deploy modes that Spark supports: 一旦用户应用程序被捆绑,就可以使用 bin/spark-submit 脚本启动它。此脚本负责使用Spark及其依赖项设置类路径,并可以支持Spark支持的不同集群管理器和部署模式:

好文推荐

评论可见,请评论后查看内容,谢谢!!!
 您阅读本篇文章共花了: