Hi All,
As part of my PhD project I'm working on a tool to benchmark reassembly
algorithms. To do this I'm planning on doing the following:
1. Taking a sequence file and breaking it into reads of a specified
length and during this process adding errors.
2. Reassembly these simulated reads with the reassembly programs
available in GAP4.
3. Align contigs of a useful size to the original sequence, note those
that align within a given edit distance.
4. Calculate the percentage of the sequence that is covered by contigs.
I have just completed the alignment with edit distance tool and am now
beginning the processes of benchmarking reassembly algorithms. Does
anybody have any thoughts or suggestions? I should say that my main
interest is short read reassembly.
Secondly, I'm having a problem with GAP4. It only seems to load
19 sequences from my fasta file. My fasta file looks like this:
>R0
CCAATTAGTCCTATTAAGAC
>R1
CAATTAGTCCTATTAAGACT
>R2
AATTAGTCCTATTAAGACTG
>R3
ATTAGTCCTATTAAGACTGT
However if I include any more than 19 sequences in my fasta file I
get the following error:
Failed files:
/home/new/A1.fasta (UNK) 'init: Unknown file type'
Is this a bug? Or I'm I doing something wrong?
Many Thanks for Reading,
Nava Whiteford