大数据挖掘外文翻译文献Word下载.docx
- 文档编号:22432639
- 上传时间:2023-02-04
- 格式:DOCX
- 页数:19
- 大小:168.43KB
大数据挖掘外文翻译文献Word下载.docx
《大数据挖掘外文翻译文献Word下载.docx》由会员分享,可在线阅读,更多相关《大数据挖掘外文翻译文献Word下载.docx(19页珍藏版)》请在冰豆网上搜索。
VHShastri,VSreeprada
文献出处:
《InternationalJournalofEmergingTrendsandTechnologyinComputerScience》,2016,38
(2):
99-103
字数统计:
英文2291单词,12196字符;
中文3868汉字
外文文献:
A StudyofDataMiningwithBig Data
AbstractDatahasbecomeanimportantpartof every economy,industry,organization,business,function and individual.BigDatais aterm usedto identifylargedatasetstypically whosesizeislarger thanthe typicaldatabase.Bigdataintroducesunique computational andstatisticalchallenges.Big Dataareatpresentexpandinginmostofthedomains ofengineeringandscience.Data mininghelpstoextractusefuldatafromthe huge datasets duetoitsvolume, variabilityandvelocity.ThisarticlepresentsaHACEtheoremthatcharacterizes the featuresoftheBig Datarevolution,andproposes aBigDataprocessingmodel, fromthedataminingperspective.
Keywords:
BigData,DataMining,HACEtheorem,structuredandunstructured.
I.Introduction
BigDatareferstoenormousamountofstructureddata and unstructureddatathatoverflowtheorganization.If this data isproperlyused,it can leadtomeaningfulinformation.Bigdataincludes alargenumberofdatawhichrequiresalot ofprocessing inrealtime. Itprovidesaroomtodiscovernewvalues, tounderstand in-depthknowledgefromhiddenvaluesandprovide aspacetomanagethedataeffectively.A database isanorganizedcollection of logically relateddata whichcanbeeasily managed,updatedand accessed.Dataminingisaprocess discoveringinterestingknowledgesuchasassociations,patterns,changes,anomaliesandsignificantstructuresfromlargeamount ofdata storedin thedatabasesorotherrepositories.
BigData includes3 V’sasitscharacteristics.Theyarevolume,velocityandvariety. Volumemeanstheamountofdata generatedevery second.Thedataisinstateof rest.It isalso knownforitsscale characteristics.Velocity isthe speedwithwhichthe data isgenerated. Itshould havehigh speed data.Thedatagenerated fromsocialmediaisanexample. Varietymeansdifferent types ofdata canbe takensuchasaudio,videoor documents.Itcanbenumerals,images,timeseries, arraysetc.
Data Mininganalysesthedata fromdifferentperspectives andsummarizing it intouseful informationthatcanbeused for businesssolutions andpredictingthefuturetrends. Datamining (DM),alsocalledKnowledgeDiscoveryinDatabases (KDD)or KnowledgeDiscoveryandDataMining, istheprocess of searching large volumesofdata automaticallyforpatterns such asassociationrules.Itappliesmanycomputationaltechniquesfromstatistics,informationretrieval, machinelearningandpatternrecognition.Dataminingextractonly required patternsfromthedatabase inashorttime span.Basedonthetypeofpatternstobemined,dataminingtaskscanbeclassified into summarization,classification, clustering,associationandtrendsanalysis.
Big Dataisexpandingin alldomains includingscienceandengineeringfieldsincludingphysical,biologicalandbiomedicalsciences.
II.BIG DATAwithDATAMINING
Generallybig datarefersto acollectionof largevolumesofdataandthesedataaregenerated fromvarioussourceslikeinternet, social-media, business organization,sensorsetc.Wecanextract someusefulinformationwiththehelpof DataMining. It isatechniquefordiscovering patterns aswellasdescriptive,understandable,models from alarge scaleofdata.
Volumeisthesizeof the datawhichislargerthan petabytes and terabytes. Thescale andriseofsizemakes itdifficulttostoreandanalyseusingtraditional tools.BigData shouldbe usedtominelarge amountsofdatawithinthepredefinedperiod oftime.Traditional databasesystemswere designed toaddresssmallamountsofdata whichwere structured andconsistent,whereasBigDataincludeswidevarietyofdata suchasgeospatialdata,audio,video,unstructuredtextandsoon.
Big Data miningrefers totheactivityofgoing throughbig datasets to look forrelevant information. Toprocesslargevolumes ofdatafromdifferentsourcesquickly,Hadoopisused.Hadoop isafree, Java-basedprogrammingframeworkthatsupports theprocessingoflargedatasets inadistributedcomputingenvironment.Itsdistributedsupports fastdatatransfer ratesamongnodesandallowsthesystemtocontinueoperating uninterruptedattimesofnodefailure.Itruns MapReducefordistributeddataprocessingandis workswithstructuredandunstructureddata.
III.BIG DATA characteristics-HACETHEOREM.
Wehavelarge volumeofheterogeneousdata.Thereexistsa complexrelationshipamongthe data. Weneed todiscover usefulinformationfrom this voluminousdata.
Letus imagineascenarioin which theblindpeopleare askedtodrawelephant. Theinformationcollected byeach blind peoplemaythinkthetrunkaswall,legastree, bodyas walland tailasrope.Theblind men canexchangeinformationwitheachother.
Figure1:
Blindmen andthe giantelephant
Some ofthecharacteristicsthatincludeare:
i.Vastdatawith heterogeneousand diversesources:
Oneof thefundamentalcharacteristicsofbig data is thelarge volumeofdatarepresentedbyheterogeneousanddiverse dimensions.Forexampleinthe biomedicalworld,asingle humanbeingisrepresentedasname,age,gender,family historyetc.,ForX-rayandCTscanimagesandvideosareused. Heterogeneityreferstothe differenttypesofrepresentations of sameindividualanddiverserefersto thevarietyoffeaturestorepresent singleinformation.
ii.Autonomouswithdistributedandde- centralizedcontrol:
thesourcesare autonomous,i.e., automaticallygenerated;
itgenerates informationwithoutany centralizedcontrol.We cancompareit withWorldWideWeb(WWW)whereeachserverprovides a certain amountofinformationwithoutdependingonotherservers.
iii.Complexandevolvingrelationships:
As thesize ofthedatabecomesinfinitelylarge,therelationship that existsis also large.Inearlystages,when dataissmall,thereis nocomplexityinrelationships amongthe data. Datageneratedfrom social media and other sourceshavecomplex relationships.
IV.TOOLS:
OPEN SOURCE REVOLUTION
Largecompanies suchasFacebook, Yahoo,Twitter, LinkedInbenefitand contributeworkonopensourceprojects.In BigDataMining,therearemanyopensourceinitiatives. Themost popular of them are:
ApacheMahout:
Scalablemachinelearninganddata mining opensource softwarebasedmainly inHadoop.Ithasimplementations ofawiderangeofmachinelearninganddataminingalgorithms:
clustering,classification,collaborative filteringand frequentpatternmining.
R:
open sourceprogramminglanguageand software environmentdesigned forstatisticalcomputingand visualization.RwasdesignedbyRoss IhakaandRobert GentlemanattheUniversity ofAuckland,NewZealandbeginningin1993andisusedfor statisticalanalysisofverylargedata sets.
MOA:
Streamdata mining opensourcesoftware toperformdatamininginrealtime. Ithas implementations ofclassification,regression;
clusteringandfrequentitemsetmining and frequent graphmining. ItstartedasaprojectoftheMachineLearning groupofUniversity of Waikato,New Zealand, famousfortheWEKAsoftware.Thestreamsframeworkprovidesanenvironmentfordefiningand runningstreamprocesses usingsimpleXML baseddefinitionsandisable touseMOA,Android andStorm.
SAMOA:
Itisanewupcomingsoftwareprojectfordistributed streamminingthatwillcombineS4andStormwithMOA.
VowpalWabbit:
opensource projectstartedatYahoo!
Researchand continuingatMicrosoftResearchtodesign a fast,scalable,usefullearningalgorithm.VW isabletolearnfromterafeaturedatasets. Itcanexceedthe throughputofanysinglemachinenetworkinterfacewhendoing linearlearning,viaparallellearning.
V.DATA MININGforBIGDATA
Dataminingis the processbywhichdata isanalysedcomingfromdifferent sources discoversusefulinformation.Data Miningcontainsseveralalgorithmswhich fall into 4 categories. Theyare:
1.Association Rule
2.Clustering
3.Classification
4.Regression
Associationisused tosearchrelationship between variables. Itis appliedin searching forfrequentlyvisited items.Inshort itestablishesrelationshipamongobjects. Clustering discoversgroupsandstructuresinthedata.Classificationdealswith associating anunknownstructuretoa knownstructure. Regressionfindsa functionto modelthe data.
Thedifferentdata miningalgorithms are:
Category
Algorithm
Association
Apriori,FPgrowth
Clustering
K-Means,Expectation.
Classification
Decisiontrees,SVM
Regression
Multivariate linearregression
Table1. ClassificationofAlgorithms
Data Miningalgorithmscan beconvertedintobigmapreduce algorithmbasedonparallelcomputingbasis.
BigData
Data Mining
Itiseverythinginthe worldnow.
It istheoldBigData.
Sizeof thedataislarger.
Sizeofthe dataissmaller.
Involvesstorageandprocessingoflargedatasets.
Interestingpatternscanbefound.
BigDataisthetermforlargedataset.
Dataminingrefers totheactivityofgoing throughbigdata settolookfor relevantinformation.
Bigdataisthe asset.
Datamining isthehandler which providebeneficialresult.
Big data"
variesdependingon thecapabilitiesoftheorganizationmanaging theset,and onthecapabilitiesoftheapplications thataretraditionallyusedtoprocess andanalysethedata.
Dataminingreferstotheoperation thatinvolverelativelysophisticatedsearchoperation.
Table2.Differences betweenDataMiningand BigData
VI.ChallengesinBIGDATA
M
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 数据 挖掘 外文 翻译 文献