Match -> Align

Match -> Align creates a sequence alignment from a structural superposition of proteins or nucleic acids in Chimera. Residue types are not used, only their spatial proximities. Iterations of refitting the structures using the sequence alignment and generating a new sequence alignment can be performed.

The output sequence alignment is automatically shown in Multalign Viewer, and RMSDs over the fully populated columns of the alignment are reported in the Reply Log.

For an informal introduction, see the Superpositions and Alignments tutorial. See also: MatchMaker, Multalign Viewer, and

Tools for integrated sequence-structure analysis with UCSF Chimera. Meng EC, Pettersen EF, Couch GS, Huang CC, Ferrin TE. BMC Bioinformatics. 2006 Jul 12;7:339.

There are several ways to start Match -> Align, a tool in the Structure Comparison and Sequence categories.

Chains to be included in the sequence alignment should be chosen from the top section of the panel.

Residue-residue distance cutoff (angstroms) (default 5.0) - maximum CA-CA distance (C4'-C4' for nucleic acids) for defining membership in a column of the output sequence alignment
Residue aligned in column if within cutoff of:
- at least one other (default)
- all others
- how the cutoff should be applied; equivalent for alignments of only two chains
Gap character - how to show gaps in the output sequence alignment
- . (period)
- - (dash)
- ~ (tilde)
Allow for circular permutation - whether to double sequences as needed to simultaneously align the N-terminal region of one protein with the C-terminal region of the other and vice versa. Doubling the sequence of one of a pair of proteins related by circular permutation is required because Match->Align enforces N → C chain directionality. Information on any permutations will be sent to the Reply Log.
Iterate superposition/alignment... whether to perform one or more cycles of refitting the structures using the sequence alignment and generating a new sequence alignment from the adjusted superposition. Superpositions will not be adjusted unless iteration is turned on. Only the final superposition and final sequence alignment will be shown. The number of fully populated columns in any intermediate sequence alignments and corresponding match statistics will be reported in the Reply Log.
Iteration Parameters:
- Iterate alignment:
  - at most [N] times (default 3) - refit and then generate a new sequence alignment N times (or fewer, if convergence is reached)
  - until convergence - refit and then generate a new sequence alignment until the number of fully populated columns no longer increases
- Superimpose full columns:
  - across entire alignment - refit the structures using all fully populated columns of the sequence alignment
  - in stretches of at least [L] consecutive columns (default 3) - refit the structures using only the fully populated columns in consecutive stretches of L or more
- Reference chain for matching [chain] - which structure should remain fixed as the others are matched to it

Save settings writes the current Match -> Align parameters to the preferences file. Reset to defaults resets the dialog to the factory default parameter settings without changing any preferences.

Clicking Apply (or OK, which also dismisses the dialog) initiates the calculation.

The output sequence alignment is automatically shown in Multalign Viewer and can be saved to a file from that tool. The fully populated columns are highlighted as a region (colored boxes). Clicking the region will select the corresponding parts of the structures, in effect their common cores. The header named RMSD shows the spatial variation per column.

The number of fully populated columns in the alignment and the corresponding pairwise and overall RMSDs are reported in the Reply Log. All RMSDs are calculated using one atom per residue: CA in amino acids, C4' in nucleic acids. Structures are not refit using the final sequence alignment; rather, the existing superpositions are simply evaluated over the fully populated columns of that alignment.

Close dismisses the dialog without generating an alignment. Help opens this manual page in a browser window.

Notes

In most cases, a semi-heuristic algorithm is used. However, a modified Needleman-Wunsch procedure (dynamic programming) is used for the case of two chains and no allowance for circular permutation:

The score for aligning a pair of residues is:

(cutoff – distance) for distances no greater than the cutoff
–1 for distances greater than the cutoff
The gap penalty is zero, since for this application the spatial proximity should be more important than adjacency in sequence; that is, residues farther apart than the distance cutoff should not be aligned.

This process determines the sequence alignment that best represents the structural alignment.

UCSF Computer Graphics Laboratory / September 2009