书签分享收藏举报版权申诉 / 18

立即下载加入VIP,免费下载

当前位置：首页 > 经管营销 > 经济市场 > 必备蛋白质的结构分析流程教程Word下载.docx

必备蛋白质的结构分析流程教程Word下载.docx

文档编号：20695505
上传时间：2023-01-25
格式：DOCX
页数：18
大小：44.68KB

《必备蛋白质的结构分析流程教程Word下载.docx》由会员分享，可在线阅读，更多相关《必备蛋白质的结构分析流程教程Word下载.docx（18页珍藏版）》请在冰豆网上搜索。

必备蛋白质的结构分析流程教程Word下载.docx

cnlics

（站内联系TA）

实验数据

许多实验数据可以辅助结构预测过程，包括：

•二硫键，固定了半胱氨酸的空间位置

•光谱数据，可以提供蛋白的二级结构内容

•定位突变研究，可以发现活性或结合位点的残基

•蛋白酶切割位点，翻译后修饰如磷酸化或糖基化提示了残基必须是暴露的

•其他

预测时，必须清楚所有的数据。

必须时刻考虑：

预测与实验结果是否一致？

如果不是，就有必要修改做法。

蛋白序列数据

对蛋白序列的初步分析有一定价值。

例如，如果蛋白是直接来自基因预测，就可能包含多个结构域。

更严重的是，可能会包含不太可能是球形或可溶性的区域。

此流程图假设你的蛋白是可溶的，可能是一个结构域并不包含非球形结构域。

需要考虑以下方面：

•是跨膜蛋白或者包含跨膜片段吗？

有许多方法预测这些片段，包括：

oTMAP（EMBL）

oPredictProtein（EMBL/Columbia）

oTMHMM（CBS,Denmark）

oTMpred（BaylorCollege）

oDAS（Stockholm）

•如果包含卷曲（coiled-coils）可以在COILSserver预测coiledcoils或者下载COILS程序（最近已经重写，注意GCG程序包里包含了COILS的一个版本）

•蛋白包含低复杂性区域？

蛋白经常含有数个聚谷氨酸或聚丝氨酸区，这些地方不容易预测。

可以用SEG（GCG程序包里包含了一个版本的SEG程序）检查。

如果出现以上一种情况，就应该将序列打成碎片，或忽略序列中的特定区段，等等。

这个问题与细胞定位结构域相关。

搜索序列数据库

分析任何新序列的第一步显然是搜索序列数据库以发现同源序列。

这样的搜索可以在任何地方或者在任何计算机上完成。

而且，有许多WEB服务器可以进行此类搜索，可以输入或粘贴序列到服务器上并交互式地接收结果。

序列搜索也有许多方法，目前最有名的是BLAST程序。

可以容易得到在本地运行的版本（从NCBI或者WashingtonUniversity），也有许多的WEB页面允许对多基因或蛋白质序列的数据库比较蛋白质或DNA序列，仅举几个例子：

•NationalCenterforBiotechnologyInformation（USA）Searches

•EuropeanBioinformaticsInstitute（UK）Searches

•BLASTsearchthroughSBASE（domaindatabase;

ICGEB,Trieste）

•还有更多的站点

最近序列比较的重要进展是发展了gappedBLAST和PSI-BLAST（positionspecificinteratedBLAST），二者均使BLAST更敏感，后者通过选取一条搜索结果，建立模式（profile），然后用再它搜索数据库寻找其他同源序列（这个过程可以一直重复到发现不了新的序列为止），可以探测进化距离非常远的同源序列。

很重要的一点是，在利用下面章节方法之前，通过PSI-BLAST把蛋白质序列和数据库比较，找寻是否有已知结构。

将一条序列和数据库比较的其他方法有：

•FASTA软件包（WilliamPearson,UniversityofVirginia,USA）

•SCANPS（GeoffBarton,EuropeanBioinformaticsInstitute,UK）

•BLITZ（Compugen'

sfastSmithWatermansearch）

•其他方法.

Itisalsopossibletousemultiplesequenceinformationtoperformmoresensitivesearches.Essentiallythisinvolvesbuildingaprofilefromsomekindofmultiplesequencealignment.Aprofileessentiallygivesascoreforeachtypeofaminoacidateachpositioninthesequence,andgenerallymakessearchesmoresentive.Toolsfordoingthisinclude:

•PSI-BLAST（NCBI,Washington）

•ProfileScanServer（ISREC,Geneva）

•HMMER隐马氏模型（SeanEddy，WashingtonUniversity）

•Wisepackage（EwanBirney，SangerCentre；

用于蛋白质对DNA的比较）

AdifferentapproachforincorporatingmultiplesequenceinformationintoadatabasesearchistouseaMOTIF.Insteadofgivingeveryaminoacidsomekindofscoreateverypositioninanalignment,amotifignoresallbutthemostinvariantpositionsinanalignment,andjustdescribesthekeyresiduesthatareconservedanddefinethefamily.Sometimesthisiscalleda"

signature"

.Forexample,"

H--x--x-G-x（5）--H-x（3）-"

describesafamilyofDNAbindingproteins.Itcanbetranslatedas"

histidine,followedbyeitheraphenylalanineortryptophan,followedbyanaminoacid（x）,followedbyleucine,isoleucine,valineormethionine,followedbyanyaminoacid（x）,followedbyglycine,..."

.

PROSITE（ExPASyGeneva）containsahugenumberofsuchpatterns,andseveralsitesallowyoutosearchthesedata:

•ExPASy

•EBI

Itisbesttosearchafewdifferentdatabasesinordertofindasmanyhomologuesaspossible.Averyimportantthingtodo,andonewhichissometimesoverlooked,istocompareanynewsequencetoadatabaseofsequencesforwhich3Dstructureinformationisavailable.Whetherornotyoursequenceishomologoustoaproteinofknown3Dstructureisnotobviousintheoutputfrommanysearchesoflargesequencedatabases.Moreover,ifthehomologyisweak,thesimilaritymaynotbeapparentatallduringthesearchthroughalargerdatabase.

Onelastthingtorememberisthatonecansavealotoftimebymakinguseofpre-preparedproteinalignments.Manyofthesealignmentsarehandeditedbyexpertsontheparticularproteinfamilies,andthusrepresentprobablythebestalignmentonecangetgiventhedatatheycontain（i.e.theyarenotalwaysasuptodateasthemostrecentsequencedatabases）.Thesedatabasesinclude:

•SMART（Oxford/EMBL）

•PFAM（SangerCentre/Wash-U/KarolinskaIntitutet）

•COGS（NCBI）

•PRINTS（UCL/Manchester）

•BLOCKS（FredHutchinsonCancerResearchCentre,Seatle）

•SBASE（ICGEB,Trieste）

通常把蛋白质序列和数据比较都有很多的方法，这些对于识别结构域非常有用。

确定结构域

Ifyouhaveasequenceofmorethanabout500aminoacids,youcanbenearlycertainthatitwillbedividedintodiscretefunctionaldomains.Ifpossible,itispreferabletosplitsuchlargeproteinsupandconsidereachdomainseparately.Youcanpredictthelocatationofdomainsinafewdifferentways.Themethodsbelowaregiven（approximately）frommosttoleastconfident.

•

Ifhomologytoothersequencesoccursonlyoveraportionoftheprobesequenceandtheothersequencesarewhole（i.e.notpartialsequences）,thenthisprovidesthestrongestevidencefordomainstructure.Youcaneitherdodatabasesearchesyourselformakeuseofwell-curated,pre-defineddatabasesofproteindomains.Searchesofthesedatabases（seelinksbelow）willoftenassigndomainseasily.

o

SMART（Oxford/EMBL）

PFAM（SangerCentre/Wash-U/KarolinskaIntitutet）

COGS（NCBI）

PRINTS（UCL/Manchester）

BLOCKS（FredHutchinsonCancerResearchCentre,Seatle）

SBASE（ICGEB,Trieste）

YoucanalsofinddomaindescriptionsintheannotationsinSWISSPROT.

Regionsoflow-complexityoftenseparatedomainsinmultidomainproteins.Longstretchesofrepeatedresidues,particularlyProline,Glutamine,SerineorThreonineoftenindicatelinkersequencesandareusuallyagoodplacetosplitproteinsintodomains.

LowcomplexityregionscanbedefinedusingtheprogramSEGwhichisgenerallyavailableinmostBLASTdistributionsorwebservers（aversionofSEGisalsocontainedwithintheGCGsuiteofprograms）.

Transmembranesegmentsarealsoverygooddividingpoints,sincetheycaneasilyseparateextracellularfromintracellulardomains.Therearemanymethodsforpredictingthesesegments,including:

TMAP（EMBL）

PredictProtein（EMBL/Columbia）

TMHMM（CBS,Denmark）

TMpred（BaylorCollege）

DAS（Stockholm）

Somethingelsetoconsiderarethepresenceofcoiled-coils.Theseunusualstructuralfeaturessometimes（butnotalways）indicatewhereproteinscanbedividedintodomains.YoucanpredictcoiledcoilsattheCOILSserveroryoucandownloadtheCOILSprogram（recentlyre-writtenbymeofallpeople;

aversionofSEGisalsocontainedwithintheGCGsuiteofprograms）.

Secondarystructurepredictionmethods（seebelow）willoftenpredictregionsofproteinstohavedifferentproteinstructuralclasses.Forexampleoneregionofsequencemaybepredictedtocontainonlylphahelicesandanothertocontainonlybetasheets.Thesecanoften,thoughnotalways,suggestlikelydomainstructure（e.g.anallalphadomainandanallbetadomain）

Ifyouhaveseparatedasequenceintodomains,thenitisveryimportanttorepeatallthedatabasesearchesandalignmentsusingthedomainsseparately.Searcheswithsequencescontainingseveraldomainsmaynotfindallsub-homologies,particularlyifthedomainsareabundentinthedatabase（e.g.kinases,SH2domains,etc.）.Theremayalsobe"

hidden"

domains.Forexampleifthereisastretchof80aminoacidswithfewhomologuesnestedinbetweenakinaseandanSH2domain,thenyoumaymissmatchesfoundwhensearchingthewholesequenceagainstadatabase.

Anyway,hereismyslidefromthetalkrelatedtothissubject:

多序列比对

Regardlessoftheoutcomeofyoursearches,youwillwantamultiplesequencealignmentcontainingyoursequenceandallthehomologuesyouhavefoundabove.

Somesitesforperformingmultiplealignment:

EBI（UK）ClustalwServer

IBCP（France）MultalinServer

IBCP（France）ClustalwServer

IBCP（France）CombinedMultalin/Clustalw

MSA（USA）Server

BCMMultipleSequenceAlignmentClustalWSever（USA）

Ifyouaregoingtodoalotofalignments,thenitisprobablybesttogetyourowncopyofoneofmanyprograms,someFTPsitesforsomeoftheseare:

HMMer（HMMmethod,WashU）

SAM（HMMmethod,SantaCruz）

ClustalW（EBI,UK）

ClustalW（USA）

MSA（USA）

AMPS（UK）

NotethatPileUpiscontainedwithintheGCGcommercialpackage.Mostinstitutionswithpeopledoingthissortofworkwillhaveaccesstothissoftware,soaskaroundifyouwanttouseit.

ProbablythemostimportantadvancesincethesepagesfirstappearedareHiddenMarkovModelsforsequencealignment.Severalmethodsarelistedabove.

Alignmentscanprovide:

Informationastoproteindomainstructure

Thelocationofresidueslikelytobeinvolvedinproteinfunction

Informationofresidueslikelytobeburiedintheproteincoreorexposedtosolvent

Moreinformationthanasinglesequenceforapplicationslikehomologymodellingandsecondarystructureprediction.

Sometips

Don'

tjusttakeeverythingfoundinthesearchesandfeedthemdirectlyintothealignmentprogram.Searcheswillalmostalwaysreturnmatchesthatdonotindicateasignificantsequencesimilarity.Lookthroughtheoutputcarefullyandthrowthingsoutiftheydon'

tappeartobeamemberofthesequencefamily.Inclusionofnon-membersinyouralignmentwillconfusethingsandlikelyleadtoerrorslater.

Rememberthattheprogramsforaligningsequencesaren'

tperfect,anddonotalwaysprovidethebestalignment.Thisisparticularlysoforlargefamiliesofproteinswithlowsequenceidentities.Ifyoucanseeabetterwayof