The first line starts with a greater than sign ">" and contains a name or other identifier for the sequence. This is the sequence header and must be in a single line. The remaining lines contain the sequence data. The sequence can be in upper or lower case letters. Anything other than letters (numbers for example) is ignored. Multiple sequences can be present in the same file as long as each sequence has its own header.
|R||→||G A (purine)|
|Y||→||T C (pyrimidine)|
|K||→||G T (keto)|
|M||→||A C (amino)|
|S||→||G C (strong)|
|W||→||A T (weak)|
|B||→||G T C|
|D||→||G A T|
|H||→||A C T|
|V||→||G C A|
|N||→||A G C T (any)|
|-||→||gap of indeterminate length|
Ns are accepted, IUB/GCG letters (MRWSYKVHDBX) will be converted to Ns. Any other characters will be deleted.
The FASTA format is a plain text format which looks something like this:
>Escherichia coli UTI89|886538|887045 GTTCACTGCCGTACAGGCAGCTTAGAAA TGACGCCATATGCAGATCATTGAGGCGAAACC GTTCACTGCCGTACAGGCAGCTTAGAAA ACGTTCGCACCGGTCAGGGTACTGCGCAGCGT GTTCACTGCCGTACAGGCAGCTTAGAAA GAAACCAGAGCGCCCGCATAAAACAGGCACAA GTTCACTGCCGTACAGGCAGCTTAGAAA GCCAGCATAAAACCGCCTTTGATATTTTATTG GTTCACTGCCGTACAGGCAGCTTAGAAA TCAGCCGGAGGCTCTCAATTTCAGCCGCGCGG GTTCACTGCCGTACAGGCAGCTTAGAAA AGCACGGCTGCGGGGAATGGCTCAATCTCTGC GTTCACTGCCGTACAGGCAGCTTAGAAA TGATGGCGCAGCAGTCCTCCCTCCTGCCGCCA GTTCACTGCCGTACAGGCAGCTTAGAAA CTGAACGTTGAAGAGTGCGACCGTCTCTCCTT GTTCACTGCCGTACAGGCAGTATTCACA
The parameters have been set to detect DRs with high homology level.
It is possible to modify some parameters defining the maximal repeat and the CRISPR properties.
Clustering model option allows users to choose between three stringency levels to identify Cas genes. The first level (General) allows a permissive search (i.e. Cas genes will be detected whatever their type or subtype). The two other levels (Typing and SubTyping) produce more stringent analyses. See MacSyFinder documentation (http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0110726) for further information.
The summary displays information on CRISPR arrays and cas gene clusters in the order in which they lie along the chromosome. Direction is the proposed orientation of the CRISPR cluter (ND is for Not determined) according to the CRISPRdirection program. In Details is shown, in addition, the potential orientation of the CRISPR array based on the AT percentage in 100bp flanking sequences.