Spark SQL,DF,RDD cache常用方式

val testRDD = sc.parallelize(Seq(elementA, elementB, elementC)).
    map(x =&gt; (x._1, x._2)).setName("testRDD")

testRDD.cache()

利用 cat a log以表的形式对数据进行缓存

import org.apache.spark.SparkConf
import org.apache.spark.sql.{SaveMode, SparkSession}
import org.apache.spark.sql.functions.udf



val conf = new SparkConf().setAppName(s"test_app")
val spark = SparkSession.builder().config(conf).getOrCreate()
spark.read.parquet(s"${BASEPATH}/dws_live_mid_stat_order_di/event_day=${event_day}").createOrReplaceTempView(s"dwd_flow_sessionid_di")

spark.catalog.cacheTable("dwd_flow_sessionid_di")
spark.catalog.uncacheTable("dwd_flow_sessionid_di")

    spark.sql(
      s"""
         |cache table flow_basic_tmp as
         |select
         |    *
         |from
         |    test.tmp_live_mid_stat_order_di
         |""".stripMargin)

UNCACHE TABLE [ IF EXISTS ] table_identifier

显示所有内容

声明：本站所有文章，如无特殊说明或标注，均为本站原创发布。任何个人或组织，在未征得本站同意时，禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益，可联系我们进行处理。

CTF–Web安全–SQL注入之‘绕过方法’

互联网 1 年前 6

PySpark（四）PySpark SQL、Catalyst优化器、Spark SQL的执行流程、Spark新特性

互联网 1 年前 4

【Mysql】数据库架构学习合集

mysql 1 年前 3

LLM少样本示例的上下文学习在Text-to-SQL任务中的探索

互联网 1 年前 5

SQL在云计算中的新角色：重新定义数据分析

互联网 1 年前 4

JVM之GC垃圾回收

互联网 1 年前 3

行为型设计模式—中介者模式

互联网 1 年前 4

RDD中的cache

dataframe中的cache

SQL中的ca c he

发表回复取消回复

RDD中的cache

dataframe中的cache

SQL中的cache

相关文章

发表回复 取消回复

SQL中的ca c he

发表回复取消回复