书签分享收藏举报版权申诉 / 25

立即下载加入VIP,免费下载

当前位置：首页 > 小学教育 > 语文 > Spark on yarn.docx

Spark on yarn.docx

文档编号：8650181
上传时间：2023-02-01
格式：DOCX
页数：25
大小：29.44KB

Spark on yarn.docx

《Spark on yarn.docx》由会员分享，可在线阅读，更多相关《Spark on yarn.docx（25页珍藏版）》请在冰豆网上搜索。

Spark on yarn.docx

Sparkonyarn

Sparkprovidesthreelocationstoconfigurethesystem:

∙Sparkproperties controlmostapplicationparametersandcanbesetbyusinga SparkConf object,orthroughJavasystemproperties.

∙Environmentvariables canbeusedtosetper-machinesettings,suchastheIPaddress,throughthe conf/spark-env.sh scriptoneachnode.

∙Logging canbeconfiguredthrough log4j.properties.

SparkProperties

Sparkpropertiescontrolmostapplicationsettingsandareconfiguredseparatelyforeachapplication.ThesepropertiescanbesetdirectlyonaSparkConf passedtoyour SparkContext. SparkConf allowsyoutoconfiguresomeofthecommonproperties（e.g.masterURLandapplicationname）,aswellasarbitrarykey-valuepairsthroughthe set（） method.Forexample,wecouldinitializeanapplicationasfollows:

valconf=newSparkConf（）.setMaster（"local"）.setAppName（"CountingSheep"）.set（"spark.executor.memory","1g"）valsc=newSparkContext（conf）

DynamicallyLoadingSparkProperties

Insomecases,youmaywanttoavoidhard-codingcertainconfigurationsina SparkConf.Forinstance,ifyou’dliketorunthesameapplicationwithdifferentmastersordifferentamountsofmemory.Sparkallowsyoutosimplycreateanemptyconf:

valsc=newSparkContext（newSparkConf（））

Then,youcansupplyconfigurationvaluesatruntime:

./bin/spark-submit--name"Myapp"--masterlocal[4]--confspark.shuffle.spill=false--conf"spark.executor.extraJavaOptions=-XX:

+PrintGCDetails-XX:

+PrintGCTimeStamps"myApp.jar

TheSparkshelland spark-submit toolsupporttwowaystoloadconfigurationsdynamically.Thefirstarecommandlineoptions,suchas --master,asshownabove. spark-submit canacceptanySparkpropertyusingthe --conf flag,butusesspecialflagsforpropertiesthatplayapartinlaunchingtheSparkapplication.Running ./bin/spark-submit--help willshowtheentirelistoftheseoptions.

bin/spark-submit willalsoreadconfigurationoptionsfrom conf/spark-defaults.conf,inwhicheachlineconsistsofakeyandavalueseparatedbywhitespace.Forexample:

spark.masterspark:

//5.6.7.8:

7077spark.executor.memory512mspark.eventLog.enabledtruespark.serializerorg.apache.spark.serializer.KryoSerializer

AnyvaluesspecifiedasflagsorinthepropertiesfilewillbepassedontotheapplicationandmergedwiththosespecifiedthroughSparkConf.PropertiessetdirectlyontheSparkConftakehighestprecedence,thenflagspassedto spark-submit or spark-shell,thenoptionsinthe spark-defaults.conf file.

ViewingSparkProperties

TheapplicationwebUIat http:

//:

4040 listsSparkpropertiesinthe“Environment”tab.Thisisausefulplacetochecktomakesurethatyourpropertieshavebeensetcorrectly.Notethatonlyvaluesexplicitlyspecifiedthrougheither spark-defaults.conf orSparkConfwillappear.Forallotherconfigurationproperties,youcanassumethedefaultvalueisused.

AvailableProperties

Mostofthepropertiesthatcontrolinternalsettingshavereasonabledefaultvalues.Someofthemostcommonoptionstosetare:

ApplicationProperties

PropertyName

Default

Meaning

spark.app.name

（none）

Thenameofyourapplication.ThiswillappearintheUIandinlogdata.

spark.master

（none）

Theclustermanagertoconnectto.Seethelistof allowedmasterURL's.

spark.executor.memory

512m

Amountofmemorytouseperexecutorprocess,inthesameformatasJVMmemorystrings（e.g. 512m, 2g）.

spark.serializer

org.apache.spark.serializer.

JavaSerializer

Classtouseforserializingobjectsthatwillbesentoverthenetworkorneedtobecachedinserializedform.ThedefaultofJavaserializationworkswithanySerializableJavaobjectbutisquiteslow,sowerecommend usingorg.apache.spark.serializer.KryoSerializerandconfiguringKryoserializationwhenspeedisnecessary.Canbeanysubclassof org.apache.spark.Serializer.

spark.kryo.registrator

（none）

IfyouuseKryoserialization,setthisclasstoregisteryourcustomclasseswithKryo.Itshouldbesettoaclassthatextends KryoRegistrator.Seethe tuningguide formoredetails.

spark.local.dir

/tmp

Directorytousefor"scratch"spaceinSpark,includingmapoutputfilesandRDDsthatgetstoredondisk.Thisshouldbeonafast,localdiskinyoursystem.Itcanalsobeacomma-separatedlistofmultipledirectoriesondifferentdisks.NOTE:

InSpark1.0andlaterthiswillbeoverridenbySPARK_LOCAL_DIRS（Standalone,Mesos）orLOCAL_DIRS（YARN）environmentvariablessetbytheclustermanager.

spark.logConf

false

LogstheeffectiveSparkConfasINFOwhenaSparkContextisstarted.

Apartfromthese,thefollowingpropertiesarealsoavailable,andmaybeusefulinsomesituations:

RuntimeEnvironment

PropertyName

Default

Meaning

spark.executor.memory

512m

Amountofmemorytouseperexecutorprocess,inthesameformatasJVMmemorystrings（e.g. 512m, 2g）.

spark.executor.extraJavaOptions

（none）

AstringofextraJVMoptionstopasstoexecutors.Forinstance,GCsettingsorotherlogging.NotethatitisillegaltosetSparkpropertiesorheapsizesettingswiththisoption.SparkpropertiesshouldbesetusingaSparkConfobjectorthespark-defaults.conffileusedwiththespark-submitscript.Heapsizesettingscanbesetwithspark.executor.memory.

spark.executor.extraClassPath

（none）

Extraclasspathentriestoappendtotheclasspathofexecutors.Thisexistsprimarilyforbackwards-compatibilitywitholderversionsofSpark.Userstypicallyshouldnotneedtosetthisoption.

spark.executor.extraLibraryPath

（none）

SetaspeciallibrarypathtousewhenlaunchingexecutorJVM's.

spark.files.userClassPathFirst

false

（Experimental）Whethertogiveuser-addedjarsprecedenceoverSpark'sownjarswhenloadingclassesinExecutors.ThisfeaturecanbeusedtomitigateconflictsbetweenSpark'sdependenciesanduserdependencies.Itiscurrentlyanexperimentalfeature.

spark.python.worker.memory

512m

Amountofmemorytouseperpythonworkerprocessduringaggregation,inthesameformatasJVMmemorystrings（e.g. 512m, 2g）.Ifthememoryusedduringaggregationgoesabovethisamount,itwillspillthedataintodisks.

spark.executorEnv.[EnvironmentVariableName]

（none）

Addtheenvironmentvariablespecifiedby EnvironmentVariableName totheExecutorprocess.Theusercanspecifymultipleoftheseandtosetmultipleenvironmentvariables.

spark.mesos.executor.home

driversideSPARK_HOME

SetthedirectoryinwhichSparkisinstalledontheexecutorsinMesos.Bydefault,theexecutorswillsimplyusethedriver'sSparkhomedirectory,whichmaynotbevisibletothem.NotethatthisisonlyrelevantifaSparkbinarypackageisnotspecifiedthroughspark.executor.uri.

ShuffleBehavior

PropertyName

Default

Meaning

spark.shuffle.consolidateFiles

false

Ifsetto"true",consolidatesintermediatefilescreatedduringashuffle.Creatingfewerfilescanimprovefilesystemperformanceforshuffleswithlargenumbersofreducetasks.Itisrecommendedtosetthisto"true"whenusingext4orxfsfilesystems.Onext3,thisoptionmightdegradeperformanceonmachineswithmany（>8）coresduetofilesystemlimitations.

spark.shuffle.spill

true

Ifsetto"true",limitstheamountofmemoryusedduringreducesbyspillingdataouttodisk.Thisspillingthresholdisspecifiedby spark.shuffle.memoryFraction.

press

true

Whethertocompressdataspilledduringshuffles.Compressionwillpression.codec.

spark.shuffle.memoryFraction

0.2

FractionofJavaheaptouseforaggregationandcogroupsduringshuffles,ifspark.shuffle.spill istrue.Atanygiventime,thecollectivesizeofallin-memorymapsusedforshufflesisboundedbythislimit,beyondwhichthecontentswillbegintospilltodisk.Ifspillsareoften,considerincreasingthisvalueattheexpenseofspark.storage.memoryFraction.

press

true

Whethertocompressmapoutputfiles.Generallyagoodidea.Compressionwillpression.codec.

spark.shuffle.file.buffer.kb

32

Sizeofthein-memorybufferforeachshufflefileoutputstream,inkilobytes.Thesebuffersreducethenumberofdiskseeksandsystemcallsmadeincreatingintermediateshufflefiles.

spark.reducer.maxMbInFlight

48

Maximumsize（inmegabytes）ofmapoutputstofetchsimultaneouslyfromeachreducetask.Sinceeachoutputrequiresustocreateabuffertoreceiveit,thisrepresentsafixedmemoryoverheadperreducetask,sokeepitsmallunlessyouhavealargeamountofmemory.

spark.shuffle.manager

HASH

Implementationtouseforshufflingdata.Ahash-basedshufflemanageristhedefault,butstartinginSpark1.1thereisanexperimentalsort-basedshufflemanagerthatismorememory-efficientinenvironmentswithsmallexecutors,suchasYARN.Tousethat,changethisvalueto SORT.

spark.shuffle.sort.bypassMergeThreshold

200

（Advanced）Inthesort-basedshufflemanager,avoidmerge-sortingdataifthereisnomap-sideaggregationandthereareatmostthismanyreducepartitions.

SparkUI

PropertyName

Default

Meaning

spark.ui.port

4040

Portforyourapplication'sdashboard,whichshowsmemoryandworkloaddata

spark.ui.retainedStages

1000

HowmanystagestheSparkUIremembersbeforegarbagecollecting.

spark.ui.killEnabled

true

Allowsstagesandcorrespondingjobstobekilledfromthewebui.

spark.eventLog.enabled

false

WhethertologSparkevents,usefulforreconstructingtheWebUIaftertheapplicationhasfinished.

spark.eventLpress

false

Whethertocompressloggedevents,if spark.eventLog.enabled istrue.

spark.eventLog.dir

file:

///tmp/spark-events

BasedirectoryinwhichSparkeventsarelogged,if spark.eventLog.enabled istrue.Withinthisbasedirectory,Sparkcreatesas

文档加载中……请稍候！
如果长时间未打开，您也可以点击刷新试试。

下载文档到电脑，查找使用更方便

下载	加入VIP,免费下载

版权申诉 word格式文档无特别注明外均可编辑修改；预览文档经过压缩，下载后原文更清晰！ 立即下载

配套讲稿：: 如PPT文件的首页显示word图标，表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
特殊限制：: 部分文档作品中含有的国旗、国徽等图片，仅作为作品整体效果示例展示，禁止商用。设计者仅对作品中独创性部分享有著作权。
关键词：: Spark on yarn

冰豆网所有资源均是用户自行上传分享，仅供网友学习交流，未经上传用户书面授权，请勿作他用。

关于本文

本文标题：Spark on yarn.docx
链接地址：https://www.bdocx.com/doc/8650181.html

Spark on yarn.docx

热门标签