Though the following parameters are not required but they can help in running the applications smoothly to avoid timeout and memory-related errors. We advise that you set these in the spark-defaults configuration file.
0.8 The lower this is, the more frequently spills and cached data eviction occur.
5
true When set to true, this property can save substantial space at the cost of some extra CPU time by compressing the RDDs.
true When set to true, this property compresses the map output to save space.
true When set to true, this property compresses the data spilled during shuffles.
org.apache.spark.serializer.KryoSerializer The default of Java serialization works with any Serializable Java object but is quite slow, so we recommend using org.apache.spark.serializer.KryoSerializer and configuring Kryo serialization when speed is necessary.
-XX:+UseG1GC -XX:+G1SummarizeConcMark You can use multiple garbage collectors to evict the old objects and place the new ones into the memory. However, the latest Garbage First Garbage Collector (G1GC) overcomes the latency and throughput limitations with the old garbage collectors.
-XX:+UseG1GC -XX:+G1SummarizeConcMark You can use multiple garbage collectors to evict the old objects and place the new ones into the memory. However, the latest Garbage First Garbage Collector (G1GC) overcomes the latency and throughput limitations with the old garbage collectors.