datavault.docx
- 文档编号:4345159
- 上传时间:2022-11-30
- 格式:DOCX
- 页数:46
- 大小:488.35KB
datavault.docx
《datavault.docx》由会员分享,可在线阅读,更多相关《datavault.docx(46页珍藏版)》请在冰豆网上搜索。
datavault
DataVaultSeries1-DataVaultOverview
1.0Introduction
Thepurposeofthispaperistopresentanddiscussapatent-pendingtechniquecalledtheDataVault™–thenextevolutionindatamodelingforenterprisedatawarehousing.Thisisahighlytechnicalpaperandismeantforanaudienceofdatamodelers,dataarchitectsanddatabaseadministrators.Itisnotmeantforbusinessanalysts,projectmanagers,ormainframeprogrammers.Itisrecommendedthatthereisabaselevelofknowledgeincommondatamodelingtermssuchastable,relationship,parent,child,key(primary/foreign),dimensionandfact.Thetopicsinthispaperareasfollows:
∙DefiningaDataVault.
∙Abriefhistoryofdatamodelingfordatawarehousing.
∙Theproblemsofexistingdatawarehousedatamodelingarchitectures.
∙Theimportanceofarchitectureanddesignforenterprisedatawarehousing.
∙ThecomponentsofaDataVault.
∙Solvingthepainofdatawarehousearchitectures.
∙ThefoundationsoftheDataVaultarchitecture.
∙PossibleApplications/ImplicationsoftheDataVault.
Severaloftheobjectivesthatyoumaylearnfromthispaperare:
∙WhataDataVaultisandwhyitmakessense.
∙HowtobuildasmallDataVaultofyourown.
∙Whatdoesn’tworkfromanenterprisedatawarehousingperspective.
Fortoolongwehavewaitedfordatastructurestofinallycatchupwithartificialintelligenceanddataminingapplications.Mostofthedataminingtechnologyhastoimportflatfileinformationinordertojointheformwiththefunction.Unfortunately,volumesindatawarehousesaregrowingrapidlyandexportingthisinformationfordataminingpurposesisbecomingincreasinglydifficult.Itsimplydoesn’tmakesensetohavethisdiscontinuitybetweenform(structure),function(artificialintelligence),andexecution(theactofdatamining).
Marryingform,functionandexecutionholdstremendouspowerfortheartificialintelligence(AI)anddataminingcommunities.Havingdatastructuresthataremathematicallysoundincreasestheabilitytobringthesetechnologiesbackintothedatabase.TheDataVaultisbasedonmathematicalprinciplesthatallowittobeextensibleandcapableofhandlingmassivevolumesofinformation.Thearchitectureandstructureisdesignedtohandledynamicchangestorelationshipsbetweeninformation.
Astretchoftheimaginationmightbetoonedayencapsulatethedatawiththefunctionsofdatamining,hopefullytomovetowardsa“self-aware”independentpieceofinformation–butthat’sjustadreamfornow.Itispossibletoform,drop,andevaluaterelationshipsbetweendatasetsdynamically.Thuschangingthelandscapeofwhatispossiblewithadatamodel;essentiallybringingthedatamodelintoadynamicstateofflux(throughtheuseofdatamining/artificialintelligence).
ByimplementingreferencearchitecturesontopofaDataVaultstructure-thefunctionsthataccessthecontentmaybegintoexecuteinparallelandinanautomateddynamicfashion.TheDataVaultsolvessomeoftheEnterpriseDataWarehousingstructuralandstorageproblemsfromanormalized,bestofbreedperspective.Theconceptsprovideawholehostofopportunitiesinapplyingthisuniquetechnology.
“Youmuststrivetodothatwhichyouthinkyoucannotdo.”EleanorRoosevelt.
2.0DefiningaDataVault
Definition:
TheDataVaultisadetailoriented,historicaltrackinganduniquelylinkedsetofnormalizedtablesthatsupportoneormorefunctionalareasofbusiness.Itisahybridapproachencompassingthebestofbreedbetween3rdnormalform(3NF)andstarschema.Thedesignisflexible,scalable,consistentandadaptabletotheneedsoftheenterprise.Itisadatamodelthatisarchitectedspecificallytomeettheneedsofenterprisedatawarehouses.
TheDataVaultisarchitectedtomeettheneedsofthedatawarehouse,nottobeconfusedwithadatamart.ItcandoubleasanOperationalDataStore(ODS)ifthecorrecthardwareanddatabaseengineisinplacetosupportit.TheDataVaultcanhandlemassivesetsofgranulardatainasmaller,morenormalizedphysicalspaceincomparisontoboth3NFandstarschema.TheDataVaultisfoundationallystrong.Itisbasedonthemathematicalprinciplesthatsupportthenormalizeddatamodels.InsidetheDataVaultmodelarefamiliarstructuresthatmatchtraditionaldefinitionsofstarschemaand3NFthatincludedimensions,manytomanylinkagesandstandardtablestructures.Thedifferenceslieinrelationshiprepresentations,fieldstructuringandgranulartime-baseddatastorage.ThemodelingtechniquesbuiltintotheDataVaulthaveundergoneyearsofdesignandtestingacrossmanydifferentscenariosprovidingthemwithasolidfoundationalapproachtodatawarehousing.
2.1ABriefHistoryofDataModelingforDataWarehousing
3NFwasoriginallybuiltintheearly1960’s(Codd&Date)forOn-LineTransactionProcessing(OLTP)systems.Intheearly1980’sitwasadaptedtomeetthegrowingneedsofdatawarehouses.Essentiallyadate-timestampwasaddedtotheprimarykeysineachofthetablestructures.(SeeFigure1below)
Inthemidtolate1980’sstarschemadatamodelingwasintroducedandperfected.Itwasarchitectedtosolvesubject-orientedproblemsincluding(butnotlimitedto)aggregations,datamodelstructuralchange,queryperformance,reusableorsharedinformation,easeofuse,andtheabilitytosupportOn-LineAnalyticalProcessing(OLAP).Thissinglesubjectcentricarchitecturebecameknownasadatamart..Soonthereafterittoowasadaptedtomulti-subjectdatawarehousingasanattempttomeetthegrowingneedsofenterprisedatawarehousing.ThetermforthisisConformedDataMarts.
Performanceandotherweaknessesof3NFandstarschema(whenusedwithinanenterprisedatawarehouse)begantoshowinthe90’sasthevolumeofdataincreased.TheDataVaultisarchitectedtoovercometheseshortcomingswhileretainingthestrengthsof3NFandstarschemaarchitectures.Withinthepastyear(ofthedateonthisarticle),thistechniquehasbeenfavorablyreceivedbyindustryexperts.TheDataVaultisthenextevolutionindatamodelingbecauseit’sarchitectedspecificallyfordataenterprisewarehouses.
2.2TheProblemsofExistingDataWarehouseDataModelingArchitectures
Eachmodelingtechniquehaslimitationswhentheyareappliedtoenterprisedatawarehousearchitecture.Thisisbecausetheyareanadaptationofadesignratherthanadesignbuiltspecificallyforthetask.Theselimitationsreduceusabilityandareconstantlycontributingtothe“holywars”inthedatawarehousingworld.Thefollowingparagraphsarewithrespecttothesearchitecturesbeingappliedasdatawarehouses,notastheirrespectiveoriginalpurposes.
3NFhasthefollowingissuestocontendwithincluding:
timedrivenprimarykeyissuescausingparent-childcomplexities,cascadingchangeimpacts,difficultiesinnearrealtimeloading,troublesomequeryaccess,problematicdrill-downanalysis,topdownarchitectureandunavoidabletop-downimplementation.Thefollowingfigureisanoriginal3NFmodeladaptedtodatawarehousingarchitecture.Oneparticularlythornyproblemisevidentwhenadate-timestampisplacedintotheprimarykeyofaparenttable(SeeFigure2below).Thisisnecessaryinordertorepresentchangestodetaildataovertime.
Theproblemisscalabilityandflexibility.Ifanadditionalparenttableisadded,thechangeisforcedtocascadedownthroughallsubordinatetablestructures.Also,whenanewrowisinsertedwithanexistingparentkey(theonlyfieldtochangeisthedate-timestamp)allchildrowsmustbereassignedtothenewparentkey.Thiscascadingeffecthasatremendousimpactontheprocessesandthedatamodel-thelargerthemodelthegreatertheimpact.Thismakesitdifficult(ifnotimpossible)toextendandmaintainanenterprise-widedatamodel.Thearchitectureanddesignsufferasaresult.
Figure2.DateTimeStamped3NF
Theconformeddatamartalsohastrouble.Itisacollectionoffacttablesthatarelinkedtogetherviaprimary/foreignkeys–inotherwords,alinkedsetofrelatedstarschemas.Theproblemsthiscreatesarenumerous:
isolatedsubjectorientedinformation,possibledataredundancy,inconsistentquerystructuring,agitatedscalabilityissues,difficultieswithfacttablelinkages(incompatiblegrain),synchronizationissuesinnearrealtimeloading,limitedenterpriseviewsandtroublesomedatamining.Whilethestarschemaistypicallybottomuparchitecture,bottomupimplementation-theconformeddatamartshouldbetopdownarchitectureandbottomupimplementation.However,informalpollinghasshownthatbottomuparchitectureandbottomupimplementationappeartobethestandard.
Oneofthemostdifficultissuesofaconformeddatamart(orconformedfacttables)isgettingthegrainright.Thatmeansunderstandingthedataasitisaggregatedforeachfacttableandassuringthattheaggregationwillstayconsistentforalltime(duringthelifeoftherelationship)andthestructureofeachfacttablewillnotchange(i.e.,nonewdimensionswillbeaddedtoeitherfacttable).Thislimitsdesign,scalabilityandflexibilityofthedatamodel.Anotherissueisthe“helpertable.”Thistableisdefinedtobeadimension-to-dimensionrelationshipLink.Granularityisveryimportant,asisthestabilityofthedesignofthedimension.Thistoolimitsdesign,scalabilityandflexibilityofthedatamodel.
Figure3.ConformedDataMart
IfthegranularityoftheRevenueFactisaltered,thenitisnolongerthesame(duplicate)facttable.Byaddingadimensiontooneofthefacttablesthegranularityfrequentlychanges.Ithasalsobeensuggestedthatfacttablescanbelinkedtogetherjustbecaus
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- datavault