在线时间:8:00-16:00
迪恩网络APP
随时随地掌握行业动态
扫描二维码
关注迪恩网络微信公众号
#=============================== 版本1 =============================================== RELEASE/latest/iprscan_v4.8.tar.gz BIN/4.x/iprscan_bin4.x_[PLATFORM].tar.gz DATA/iprscan_DATA_[LATESTDATAVERSION].tar.gz DATA/iprscan_PTHR_DATA_[LATESTDATAVERSION].tar.gz DATA/iprscan_MATCH_DATA_[LATESTDATAVERSION].tar.gz 3.1.2 将5个文件解压到一个文件夹中,然后运行其中的文件Config.pl,来对InterProScan进行配置。 $bin/iprscan -cli -iprlookup -goterms -format xml -i test.fasta -o test.out # help http://www.chenlianfu.com/?tag=iprscan
该模块中XML::Parser XML::Parser::Expat 这两个模块,后一个必须先安装,后续一个接着安装,由于是C层面的模块,需要安装一些东西 Expat must be installed prior to building XML::Parser and I can't find it in the standard library directories. Install 'expat-devel' (or 'libexpat1-dev') package 小提示: (root或者sudo权限) yum 或者 apt-get install expat-devel (具体版本具体办) #============================================== 版本2 ============================================= https://github.com/ebi-pf-team/interproscan/wiki 原文链接 第一步: 环境配置 Software requirements:
$JAVA_HOME/bin should be added to the $PATH 第二步: 数据下载
tar -pxvzf interproscan-5.27-66.0-64-bit.tar.gz (-p参数为了保持文件的权限 -v 建议去掉,这个是解压过程显示)
第三步:运行测试 ./interproscan.sh -i test_proteins.fasta -f tsv
./interproscan.sh -i test_proteins.fasta -cpu 8 -f GFF3 -goterms -iprlookup -t p -T 20171127tmp
# 参数: -i 输入 -f format -goterms -iprlookup GO注释 -t 数据类型 -T 临时文件目录名称 小提示: TSV 是Tab-separated values的缩写,即制表符分隔值。 #============================= 具体参数 ======================================== 27/11/2017 14:41:35:049 Welcome to InterProScan-5.27-66.0 usage: java -XX:+UseParallelGC -XX:ParallelGCThreads=2 -XX:+AggressiveOpts -XX:+UseFastAccessorMethods -Xms128M -Xmx2048M -jar interproscan-5.jar Please give us your feedback by sending an email to [email protected] -appl,--applications <ANALYSES> Optional, comma separated list of analyses. If this option is not set, ALL analyses will be run. -b,--output-file-base <OUTPUT-FILE-BASE> Optional, base output filename (relative or absolute path). Note that this option, the --output-dir (-d) option and the --outfile (-o) option are mutually exclusive. The appropriate file extension for the output format(s) will be appended automatically. By default the input file path/name will be used. -cpu,--cpu <CPU> Optional, number of cores for inteproscan. -d,--output-dir <OUTPUT-DIR> Optional, output directory. Note that this option, the --outfile (-o) option and the --output-file-base (-b) option are mutually exclusive. The output filename(s) are the same as the input filename, with the appropriate file extension(s) for the output format(s) appended automatically . -dp,--disable-precalc Optional. Disables use of the precalculated match lookup service. All match calculations will be run locally. -dra,--disable-residue-annot Optional, excludes sites from the XML, JSON output -f,--formats <OUTPUT-FORMATS> Optional, case-insensitive, comma separated list of output formats. Supported formats are TSV, XML, JSON, GFF3, HTML and SVG. Default for protein sequences are TSV, XML and GFF3, or for nucleotide sequences GFF3 and XML. -goterms,--goterms Optional, switch on lookup of corresponding Gene Ontology annotation (IMPLIES -iprlookup option) -help,--help Optional, display help information -i,--input <INPUT-FILE-PATH> Optional, path to fasta file that should be loaded on Master startup. Alternatively, in CONVERT mode, the InterProScan 5 XML file to convert. -iprlookup,--iprlookup Also include lookup of corresponding InterPro annotation in the TSV and GFF3 output formats. -ms,--minsize <MINIMUM-SIZE> Optional, minimum nucleotide size of ORF to report. Will only be considered if n is specified as a sequence type. Please be aware of the fact that if you specify a too short value it might be that the analysis takes a very long time! -o,--outfile <EXPLICIT_OUTPUT_FILENAME> Optional explicit output file name (relative or absolute path). Note that this option, the --output-dir (-d) option and the --output-file-base (-b) option are mutually exclusive. If this option is given, you MUST specify a single output format using the -f option. The output file name will not be modified. Note that specifying an output file name using this option OVERWRITES ANY EXISTING FILE. -pa,--pathways Optional, switch on lookup of corresponding Pathway annotation (IMPLIES -iprlookup option) -t,--seqtype <SEQUENCE-TYPE> Optional, the type of the input sequences (dna/rna (n) or protein (p)). The default sequence type is protein. -T,--tempdir <TEMP-DIR> Optional, specify temporary file directory (relative or absolute path). The default location is temp/. -version,--version Optional, display version number -vtsv,--output-tsv-version Optional, includes a TSV version file along with any TSV output (when TSV output requested) Copyright © EMBL European Bioinformatics Institute, Hinxton, Cambridge, UK. (http://www.ebi.ac.uk) The InterProScan software itself is provided under the Apache License, Version 2.0 (http://www.apache.org/licenses/LICENSE-2.0.html). Third party components (e.g. member database binaries and models) are subject to separate licensing - please see the individual member database websites for details. Available analyses: TIGRFAM (15.0) : TIGRFAMs are protein families based on Hidden Markov Models or HMMs SFLD (3) : SFLDs are protein families based on Hidden Markov Models or HMMs SUPERFAMILY (1.75) : SUPERFAMILY is a database of structural and functional annotation for all proteins and genomes. PANTHER (12.0) : The PANTHER (Protein ANalysis THrough Evolutionary Relationships) Classification System is a unique resource that classifies genes by their functions, using published scientific experimental evidence and evolutionary relationships to predict function even in the absence of direct experimental evidence. Gene3D (4.1.0) : Structural assignment for whole genes and genomes using the CATH domain structure database Hamap (2017_10) : High-quality Automated and Manual Annotation of Microbial Proteomes Coils (2.2.1) : Prediction of Coiled Coil Regions in Proteins ProSiteProfiles (2017_09) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them SMART (7.1) : SMART allows the identification and analysis of domain architectures based on Hidden Markov Models or HMMs CDD (3.16) : Prediction of CDD domains in Proteins PRINTS (42.0) : A fingerprint is a group of conserved motifs used to characterise a protein family ProSitePatterns (2017_09) : PROSITE consists of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them Pfam (31.0) : A large collection of protein families, each represented by multiple sequence alignments and hidden Markov models (HMMs) ProDom (2006.1) : ProDom is a comprehensive set of protein domain families automatically generated from the UniProt Knowledge Database. MobiDBLite (1.0) : Prediction of disordered domains Regions in Proteins PIRSF (3.02) : The PIRSF concept is being used as a guiding principle to provide comprehensive and non-overlapping clustering of UniProtKB sequences into a hierarchical order to reflect their evolutionary relationships. Deactivated analyses: Phobius (1.01) : Analysis Phobius is deactivated, because the resources expected at the following paths do not exist: bin/phobius/1.01/phobius.pl SignalP_EUK (4.1) : Analysis SignalP_EUK is deactivated, because the resources expected at the following paths do not exist: bin/signalp/4.1/signalp SignalP_GRAM_POSITIVE (4.1) : Analysis SignalP_GRAM_POSITIVE is deactivated, because the resources expected at the following paths do not exist: bin/signalp/4.1/signalp TMHMM (2.0c) : Analysis TMHMM is deactivated, because the resources expected at the following paths do not exist: bin/tmhmm/2.0c/decodeanhmm, data/tmhmm/2.0c/TMHMM2.0c.model SignalP_GRAM_NEGATIVE (4.1) : Analysis SignalP_GRAM_NEGATIVE is deactivated, because the resources expected at the following paths do not exist: bin/signalp/4.1/signalp
|
请发表评论