STEP
CGCfinder
It performs the MultiGeneBlast-facilitated gene cluster detection in user-defined search database based on sequence similarity and gene order. It is used by VRprofile to identify mobile gene elements, such as T3/4/6/7SS and ICE.
Updated: Aug-10-2016.       Recommended Browsers: Google Chrome.
Retrieve results

(i) Tool to measure the sequence similarity
    Select the alignment program  
    [The "BLAST + MUSCLE" option would excute multiple sequence alignments for the BLAST hits and take longer time than the "BLAST" option.]
(ii) Query sequence
        
         Example: Escherichia coli RS218 Type VI Secretion System T6SS [JN837480]
(1) or Select a completely sequenced bacterial genome from NCBI genome database
        
         Example: Escherichia coli O157:H7 str. EDL933 [NC_002655]
(2) Define the region (< 300 Kb) under analysis
        
      (3) Define the region (<300 Kb) under analysis within the uploaded sequence
        
How to prepare your complete or approximate complete sequence of a given bacterial genome as reference?
Easy to prepare the assembled contig/scaffold sequences from your partially sequenced bacterial genomes:
(1) Prepare a plain file containing the assembled contig/scaffold nucleotide sequences in the Multi-FASTA format, like mysequence.fas (3.9 Mb).
(2) Use CDSeasy to annotate your sequences.
  Upload your file, myseq.fas, into CDSeasy to generate a GenBank file, like, mysequence_.gbk;
  It takes ~10 minutes for CDSeasy to annotate the 5.3-Mb chromosomal sequence of K. pneumoniae strain HS11286.
(3) Upload your sequences as "Query sequence " of CGCfinder.
  Select the 'Genome Seq' tab then click the radio "or (1) Upload a GenBank file containing the nucleotide sequence and annotation");
  Upload the file CDSeasy-output file, myseq_.gbk.
or *(4) Upload your sequences as "Subject sequence set" of CGCfinder.
  Select the 'Upload sequence (1-5)';
  Upload the file CDSeasy-output file, mysequence_.gbk;

For partially sequenced bacterial genomes, CDSeasy firstly generates a 'virtual complete genome' ('pseudochromosome') by connecting contig sequence without considering contig order and provides both contig-specific gene coordinates and corresponding pseudochromosome data. CDSeasy outputs include the sequence and annotation files in commonly used formats, such as GenBank.
The CDSeasy-generated GenBank file can be used directly as the input for VRprofile, CGCfinder and COGviewer.
(iii) Subject sequence set
(User could submit up to 50 bacterial sequences per job, including 1-5 via the 'Select genomes' tap, 1-40 via the 'Input acc. no' tap and/or 1-5 via the 'Upload sequences' tap)
(1) Enter the NCBI Refseq Genome accession no. separated by "return" (UPPER/lower case sensitive). [Format?] [Example]
or (2) No. of genomes from NCBI being compared with the query genome? Example: select '2'
or (3) No. of uploaded sequences being compared with the query sequence?
How to prepare the user's genome data?
Run with Default options
(iv) BLAST parameters (Optional)
     (1) Number of Blast hits per gene to be mapped < (1~500)
     (2) Minimal sequence coverage of BLAST hits > (0~100) %
     (3) Minimal % identity of BLAST hits > (0~100) %
     (4) Maximum distance between genes in locus < (0~100) Kb
     (5) Weight of synteny conservation in hit sorting = (0~2.0)
(v) Results retrieval options
do not close the webpage while running

please do not close the webpage while data uploading
Note: Please retrieving the job result from the E-mail feedback when run with the time-consuming "BLAST + MUSCLE" option.
Retrieve results
CGCfinder
Type in the Job_ID