总结

优先级低-》优先级高

sparksubmit 提交优先级 &lt; scala/java代码中的配置参数 < spark SQL hint

spark submit提交参数

#!/usr/bin/env bash

source /home/work/batch_job/product/common/common.sh
spark_version="/home/work/opt/spark"
export SPARK_CONF_DIR=${spark_version}/conf/
spark_shell="/home/opt/spark/spark3-client/bin/spark-shell"
spark_sql="/home/work/opt/spark/spark3-client/bin/spark-sql"
echo ${spark_sql}
echo ${spark_shell}
${spark_shell} --master yarn 
        --queue test 
        --name "evelopment_sun-data-new_spark_shell" 
        --conf "spark.speculation=true" 
        --conf "spark.network.timeout=400s" 
        --conf "spark.executor.cores=2" 
        --conf "spark.executor.memory=4g" 
        --conf "spark.executor.instances=300" 
        --conf "spark.driver.maxResultSize=4g" 
        --conf "spark.sql.shuffle.partitions=800" 
        --conf "spark.driver.extraJavaOptions=-Dfile.encoding=utf-8" 
        --conf "spark.executor.extraJavaOptions=-Dfile.encoding=utf-8" 
        --conf "spark.driver.memory=8g" 
        --conf "spark.sql.autoBroadcastJoinThreshold=-1" 
        --conf "spark.sql.turing.pooledHiveClientEnable=false" 
        --conf "spark.sql.hive.metastore.jars=/home/work/opt/spark/spark3-client/hive_compatibility/*" 
        --conf "spark.driver.extraClassPath=./__spark_libs__/hive-extensions-2.0.0.0-SNAPSHOT.jar:./hive_jar/parquet-hadoop-bundle-1.6.0.jar:/home/work/opt/spark/spark3-client/hive_compatibility/parquet-hadoop-bundle-1.6.0.jar" 
       --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 
       --conf "spark.sql.legacy.timeParserPolicy=LEGACY" 
       --conf "spark.sql.storeAssignmentPolicy=LEGACY" 
       --conf spark.executor.extraJavaOptions="-XX:+UseG1GC" 
       --jars ./online-spark-1.0-SNAPSHOT.jar

scala/java代码中的配置参数

    val conf = new SparkConf().setAppName(s"production_data-new_UserOverview_${event_day}")
    val spark = SparkSession.builder().config("spark.debug.maxToStringFields", "500").config(conf).getOrCreate()

SQL hint

SELECT /*+ MERGEJOIN(t2) */ * FROM t1 INNER JOIN t2 ON t1.key = t2.key;

Hints – Spark 3.5.0 hints Documentation

原文地址:https://blog.csdn.net/u010003835/article/details/134666922

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任

如若转载,请注明出处:http://www.7code.cn/show_31990.html

如若内容造成侵权/违法违规/事实不符,请联系代码007邮箱suwngjj01@126.com进行投诉反馈,一经查实,立即删除

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注