Pipe operator in Spark, allows developer to process RDD data using external applications. Sometimes in data analysis, we need to use an external library which may not be written using Java/Scala. Ex: Fortran math libraries. In that case, spark’s pipe operator allows us to send the RDD data to the external application.
Accumulators in Spark are highly useful to do side effect based operations. For example, the following code calculates both sum and sum of squares as a side effect.