The Tools > Sample By command is used to create an output file that contains a filtered set of sequences from a source file. Source file sequences can be filtered according to one or more specified conditions, such as length, contents, and start/end sequence characters.
- Choose Tools > Sample By > (Subcommand). Available subcommands are: 5’ and 3’ End Tags, Contains Sequence, Probability or Skipping, Length, and Name.
The Project window opens with the Project and Options tabs active.
- In the Project tab, the sequences you wish to sample should be placed in the Sequences folder. To add sequences to a folder, select the folder, then right-click it and choose Import. Select the desired sequence files, and press Open. Next, select a single sequence, or use Shift+click or Ctrl/Cmd+click to select multiple sequences.
- The Options tab varies in appearance, depending on which subcommand was chosen. The number of sequences selected in the Project tab appears in the message: “‘n’ sequences will be sampled.”
- Change settings in the Options tab as applicable for the chosen subcommand.
Tools > Sample By > 5’ and 3’ End Tags samples only those sequences beginning and/or ending with a specified sequence fragment or “index tag” on the 3’ or 5’ end. For sequences of DNA or unknown type, matches can occur at the ends of either strand.
Check the Starts with and/or Ends with boxes, then enter the search strand(s) to the right of the checked box(es).
Tools > Sample By > Contains Sequence samples only those sequences containing a specified sequence fragment. This provides a way to sample a set of sequences that contain the same subsequence. For sequences of DNA or unknown type, matches can occur on either strand.
Either type the subsequence or paste copied sequence into the text field, or use the “load a file” link to navigate to a sequence or text file.
Tools > Sample By > Probability or Skipping samples a random subset of sequences.
To choose every nth sequence from the file, select Sample every and type the ‘n’ value (an integer) at the right. To sample a certain number of random sequences instead, select Random sample containing and type the number of sequences at the right.
Tools > Sample By > Length samples only those sequences with a length greater and/or less than a specified length.
To specify a length threshold for sampled sequences, choose between Minimum length, Maximum length and Range. Then type the cutoff value(s) into the box at right.
Tools > Sample By > Name samples only sequences with the specified name.
Choose whether sampled sequences need to have the specified text in the sequence name or a feature name. Type in the text to the right of the selected option.
- In the Options tab, press the Run button.
The resulting sequence(s) appear in the Project tab’s Results folder. The results file has the same name as first sequence file that was sampled, followed by “_sampled.”
- (optional) If you wish to see the script that was used to run this process, press the Script tab.
- (optional) If you want to see the history of the run, including the location of the output file(s), press the History tab.