Logisland on Spark Cluster
Learn how to use Logisland on a spark cluster standalone.
This guide covers:
-
How to install logisland on a spark cluster standalone
-
How to run logisland job on a spark cluster standalone
1. Prerequisites
To complete this guide, you need:
-
less than 30 minutes
-
to be familiar with logisland (please refer to other guides first otherwise)
-
to have an installed spark cluster with standalone manager (see spark documentation)
2. Installation of Logisland
Choose the release of logisland you want to install. A logisland installation contains jars. To use it on a spark cluster standalone you will have to make accessible the engines jars to all spark worker nodes and to the spark master node.
It should be located in $LOGISLAND_HOME/lib/engines/. You can not change the jar path ! It should be the same in every node and the same that the logisland installation you are using. Make sure as well it is readable by the spark user.
3. Run logisland jobs
3.1. Client mode
In spark client mode, you only need to have logisland installed on the node you submit your job. Indeed runtime dependencies on worker will be automatically uploaded to them.
So to un spark client job on a spark cluster standalone is pretty straightforward !
3.2. Cluster mode
In cluster mode, the driver will be potentially run on another machine than the local machine. So your job conf file will have to be accessible from this machine ! That is why, like the engines jars, you should be sure that your job conf file is available in all spark nodes of your cluster (with read acces for spark user).
3.3. available options in engine conf
In spark cluster standalone, you have thos options available in your logisland job conf :
Property | Required | Description | accepted value or exemple |
---|---|---|---|
spark.app.name |
No |
Name of the spark job to use |
String |
spark.master |
Yes |
Spark master URI (see spark doc) |
start with "spark://" |
spark.deploy-mode |
Yes |
Deploy mode to use |
client" or "cluster |
spark.total.executor.cores |
No |
add specified --total-executor-cores X to spark-submit (see spark doc) |
a number |
spark.executor.memory |
No |
add specified --executor-memory X to spark-submit (see spark doc) |
10G |
spark.supervise |
No |
add --supervise to spark-submit (see spark doc) |
true |
spark.executor.cores |
No |
add --executor-cores to spark-submit (see spark doc) |
a number |
spark.driver.memory |
No |
add --driver-memory to spark-submit (see spark doc) |
10G |
spark.driver.cores |
No |
add --driver-cores to spark-submit (see spark doc) |
only available in cluster mode |
3.3.1. Exemple of configuration of the engine in a job conf for a spark cluster
here is an exemple of logisland job conf with specific spark cluster standalone conf in engine conf.
#########################################################################################################
# Logisland configuration script template
#########################################################################################################
version: 1.4.0
documentation: LogIsland analytics main config file. Put here every engine or component config
#########################################################################################################
engine:
component: com.hurence.logisland.engine.spark.KafkaStreamProcessingEngine
type: engine
documentation: An example of engine configuration using spark cluster standalone manager
configuration:
# Spark cluster standalone conf part
spark.app.name: test_logisland_spark_standalone
spark.master: spark://azvmescwap3.escardio.net:7077
spark.deploy-mode: cluster
spark.total.executor.cores: 2
spark.executor.memory: 2G
spark.supervise: true
spark.executor.cores: 2
spark.driver.memory: 1G
spark.driver.cores: 1
# Other conf for kafka streaming
spark.serializer: org.apache.spark.serializer.KryoSerializer
spark.ui.port: 4054
spark.monitoring.driver.port: 7091
spark.streaming.backpressure.enabled: true
spark.streaming.unpersist: false
spark.streaming.blockInterval: 500
spark.streaming.timeout: -1
spark.streaming.kafka.maxRetries: 3
spark.streaming.ui.retainedBatches: 200
spark.streaming.receiver.writeAheadLog.enable: false
spark.streaming.kafka.maxRatePerPartition: 3000
spark.streaming.batchDuration: 5000