必备蛋白质的结构分析流程教程Word下载.docx
- 文档编号:20695505
- 上传时间:2023-01-25
- 格式:DOCX
- 页数:18
- 大小:44.68KB
必备蛋白质的结构分析流程教程Word下载.docx
《必备蛋白质的结构分析流程教程Word下载.docx》由会员分享,可在线阅读,更多相关《必备蛋白质的结构分析流程教程Word下载.docx(18页珍藏版)》请在冰豆网上搜索。
广告
cnlics
(站内联系TA)
实验数据
许多实验数据可以辅助结构预测过程,包括:
•二硫键,固定了半胱氨酸的空间位置
•光谱数据,可以提供蛋白的二级结构内容
•定位突变研究,可以发现活性或结合位点的残基
•蛋白酶切割位点,翻译后修饰如磷酸化或糖基化提示了残基必须是暴露的
•其他
预测时,必须清楚所有的数据。
必须时刻考虑:
预测与实验结果是否一致?
如果不是,就有必要修改做法。
蛋白序列数据
对蛋白序列的初步分析有一定价值。
例如,如果蛋白是直接来自基因预测,就可能包含多个结构域。
更严重的是,可能会包含不太可能是球形或可溶性的区域。
此流程图假设你的蛋白是可溶的,可能是一个结构域并不包含非球形结构域。
需要考虑以下方面:
•是跨膜蛋白或者包含跨膜片段吗?
有许多方法预测这些片段,包括:
oTMAP(EMBL)
oPredictProtein(EMBL/Columbia)
oTMHMM(CBS,Denmark)
oTMpred(BaylorCollege)
oDAS(Stockholm)
•如果包含卷曲(coiled-coils)可以在COILSserver预测coiledcoils或者下载COILS程序(最近已经重写,注意GCG程序包里包含了COILS的一个版本)
•蛋白包含低复杂性区域?
蛋白经常含有数个聚谷氨酸或聚丝氨酸区,这些地方不容易预测。
可以用SEG(GCG程序包里包含了一个版本的SEG程序)检查。
如果出现以上一种情况,就应该将序列打成碎片,或忽略序列中的特定区段,等等。
这个问题与细胞定位结构域相关。
搜索序列数据库
分析任何新序列的第一步显然是搜索序列数据库以发现同源序列。
这样的搜索可以在任何地方或者在任何计算机上完成。
而且,有许多WEB服务器可以进行此类搜索,可以输入或粘贴序列到服务器上并交互式地接收结果。
序列搜索也有许多方法,目前最有名的是BLAST程序。
可以容易得到在本地运行的版本(从NCBI或者WashingtonUniversity),也有许多的WEB页面允许对多基因或蛋白质序列的数据库比较蛋白质或DNA序列,仅举几个例子:
•NationalCenterforBiotechnologyInformation(USA)Searches
•EuropeanBioinformaticsInstitute(UK)Searches
•BLASTsearchthroughSBASE(domaindatabase;
ICGEB,Trieste)
•还有更多的站点
最近序列比较的重要进展是发展了gappedBLAST和PSI-BLAST(positionspecificinteratedBLAST),二者均使BLAST更敏感,后者通过选取一条搜索结果,建立模式(profile),然后用再它搜索数据库寻找其他同源序列(这个过程可以一直重复到发现不了新的序列为止),可以探测进化距离非常远的同源序列。
很重要的一点是,在利用下面章节方法之前,通过PSI-BLAST把蛋白质序列和数据库比较,找寻是否有已知结构。
将一条序列和数据库比较的其他方法有:
•FASTA软件包(WilliamPearson,UniversityofVirginia,USA)
•SCANPS(GeoffBarton,EuropeanBioinformaticsInstitute,UK)
•BLITZ(Compugen'
sfastSmithWatermansearch)
•其他方法.
Itisalsopossibletousemultiplesequenceinformationtoperformmoresensitivesearches.Essentiallythisinvolvesbuildingaprofilefromsomekindofmultiplesequencealignment.Aprofileessentiallygivesascoreforeachtypeofaminoacidateachpositioninthesequence,andgenerallymakessearchesmoresentive.Toolsfordoingthisinclude:
•PSI-BLAST(NCBI,Washington)
•ProfileScanServer(ISREC,Geneva)
•HMMER隐马氏模型(SeanEddy,WashingtonUniversity)
•Wisepackage(EwanBirney,SangerCentre;
用于蛋白质对DNA的比较)
AdifferentapproachforincorporatingmultiplesequenceinformationintoadatabasesearchistouseaMOTIF.Insteadofgivingeveryaminoacidsomekindofscoreateverypositioninanalignment,amotifignoresallbutthemostinvariantpositionsinanalignment,andjustdescribesthekeyresiduesthatareconservedanddefinethefamily.Sometimesthisiscalleda"
signature"
.Forexample,"
H--x--x-G-x(5)--H-x(3)-"
describesafamilyofDNAbindingproteins.Itcanbetranslatedas"
histidine,followedbyeitheraphenylalanineortryptophan,followedbyanaminoacid(x),followedbyleucine,isoleucine,valineormethionine,followedbyanyaminoacid(x),followedbyglycine,..."
.
PROSITE(ExPASyGeneva)containsahugenumberofsuchpatterns,andseveralsitesallowyoutosearchthesedata:
•ExPASy
•EBI
Itisbesttosearchafewdifferentdatabasesinordertofindasmanyhomologuesaspossible.Averyimportantthingtodo,andonewhichissometimesoverlooked,istocompareanynewsequencetoadatabaseofsequencesforwhich3Dstructureinformationisavailable.Whetherornotyoursequenceishomologoustoaproteinofknown3Dstructureisnotobviousintheoutputfrommanysearchesoflargesequencedatabases.Moreover,ifthehomologyisweak,thesimilaritymaynotbeapparentatallduringthesearchthroughalargerdatabase.
Onelastthingtorememberisthatonecansavealotoftimebymakinguseofpre-preparedproteinalignments.Manyofthesealignmentsarehandeditedbyexpertsontheparticularproteinfamilies,andthusrepresentprobablythebestalignmentonecangetgiventhedatatheycontain(i.e.theyarenotalwaysasuptodateasthemostrecentsequencedatabases).Thesedatabasesinclude:
•SMART(Oxford/EMBL)
•PFAM(SangerCentre/Wash-U/KarolinskaIntitutet)
•COGS(NCBI)
•PRINTS(UCL/Manchester)
•BLOCKS(FredHutchinsonCancerResearchCentre,Seatle)
•SBASE(ICGEB,Trieste)
通常把蛋白质序列和数据比较都有很多的方法,这些对于识别结构域非常有用。
确定结构域
Ifyouhaveasequenceofmorethanabout500aminoacids,youcanbenearlycertainthatitwillbedividedintodiscretefunctionaldomains.Ifpossible,itispreferabletosplitsuchlargeproteinsupandconsidereachdomainseparately.Youcanpredictthelocatationofdomainsinafewdifferentways.Themethodsbelowaregiven(approximately)frommosttoleastconfident.
•
Ifhomologytoothersequencesoccursonlyoveraportionoftheprobesequenceandtheothersequencesarewhole(i.e.notpartialsequences),thenthisprovidesthestrongestevidencefordomainstructure.Youcaneitherdodatabasesearchesyourselformakeuseofwell-curated,pre-defineddatabasesofproteindomains.Searchesofthesedatabases(seelinksbelow)willoftenassigndomainseasily.
o
SMART(Oxford/EMBL)
PFAM(SangerCentre/Wash-U/KarolinskaIntitutet)
COGS(NCBI)
PRINTS(UCL/Manchester)
BLOCKS(FredHutchinsonCancerResearchCentre,Seatle)
SBASE(ICGEB,Trieste)
YoucanalsofinddomaindescriptionsintheannotationsinSWISSPROT.
Regionsoflow-complexityoftenseparatedomainsinmultidomainproteins.Longstretchesofrepeatedresidues,particularlyProline,Glutamine,SerineorThreonineoftenindicatelinkersequencesandareusuallyagoodplacetosplitproteinsintodomains.
LowcomplexityregionscanbedefinedusingtheprogramSEGwhichisgenerallyavailableinmostBLASTdistributionsorwebservers(aversionofSEGisalsocontainedwithintheGCGsuiteofprograms).
Transmembranesegmentsarealsoverygooddividingpoints,sincetheycaneasilyseparateextracellularfromintracellulardomains.Therearemanymethodsforpredictingthesesegments,including:
TMAP(EMBL)
PredictProtein(EMBL/Columbia)
TMHMM(CBS,Denmark)
TMpred(BaylorCollege)
DAS(Stockholm)
Somethingelsetoconsiderarethepresenceofcoiled-coils.Theseunusualstructuralfeaturessometimes(butnotalways)indicatewhereproteinscanbedividedintodomains.YoucanpredictcoiledcoilsattheCOILSserveroryoucandownloadtheCOILSprogram(recentlyre-writtenbymeofallpeople;
aversionofSEGisalsocontainedwithintheGCGsuiteofprograms).
Secondarystructurepredictionmethods(seebelow)willoftenpredictregionsofproteinstohavedifferentproteinstructuralclasses.Forexampleoneregionofsequencemaybepredictedtocontainonlylphahelicesandanothertocontainonlybetasheets.Thesecanoften,thoughnotalways,suggestlikelydomainstructure(e.g.anallalphadomainandanallbetadomain)
Ifyouhaveseparatedasequenceintodomains,thenitisveryimportanttorepeatallthedatabasesearchesandalignmentsusingthedomainsseparately.Searcheswithsequencescontainingseveraldomainsmaynotfindallsub-homologies,particularlyifthedomainsareabundentinthedatabase(e.g.kinases,SH2domains,etc.).Theremayalsobe"
hidden"
domains.Forexampleifthereisastretchof80aminoacidswithfewhomologuesnestedinbetweenakinaseandanSH2domain,thenyoumaymissmatchesfoundwhensearchingthewholesequenceagainstadatabase.
Anyway,hereismyslidefromthetalkrelatedtothissubject:
多序列比对
Regardlessoftheoutcomeofyoursearches,youwillwantamultiplesequencealignmentcontainingyoursequenceandallthehomologuesyouhavefoundabove.
Somesitesforperformingmultiplealignment:
EBI(UK)ClustalwServer
IBCP(France)MultalinServer
IBCP(France)ClustalwServer
IBCP(France)CombinedMultalin/Clustalw
MSA(USA)Server
BCMMultipleSequenceAlignmentClustalWSever(USA)
Ifyouaregoingtodoalotofalignments,thenitisprobablybesttogetyourowncopyofoneofmanyprograms,someFTPsitesforsomeoftheseare:
HMMer(HMMmethod,WashU)
SAM(HMMmethod,SantaCruz)
ClustalW(EBI,UK)
ClustalW(USA)
MSA(USA)
AMPS(UK)
NotethatPileUpiscontainedwithintheGCGcommercialpackage.Mostinstitutionswithpeopledoingthissortofworkwillhaveaccesstothissoftware,soaskaroundifyouwanttouseit.
ProbablythemostimportantadvancesincethesepagesfirstappearedareHiddenMarkovModelsforsequencealignment.Severalmethodsarelistedabove.
Alignmentscanprovide:
Informationastoproteindomainstructure
Thelocationofresidueslikelytobeinvolvedinproteinfunction
Informationofresidueslikelytobeburiedintheproteincoreorexposedtosolvent
Moreinformationthanasinglesequenceforapplicationslikehomologymodellingandsecondarystructureprediction.
Sometips
Don'
tjusttakeeverythingfoundinthesearchesandfeedthemdirectlyintothealignmentprogram.Searcheswillalmostalwaysreturnmatchesthatdonotindicateasignificantsequencesimilarity.Lookthroughtheoutputcarefullyandthrowthingsoutiftheydon'
tappeartobeamemberofthesequencefamily.Inclusionofnon-membersinyouralignmentwillconfusethingsandlikelyleadtoerrorslater.
Rememberthattheprogramsforaligningsequencesaren'
tperfect,anddonotalwaysprovidethebestalignment.Thisisparticularlysoforlargefamiliesofproteinswithlowsequenceidentities.Ifyoucanseeabetterwayof
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 必备 蛋白质 结构 分析 流程 教程
![提示](https://static.bdocx.com/images/bang_tan.gif)