  K. D Pruitt , J Harrow , R. A Harte , C Wallin , M Diekhans , D. R Maglott , S Searle , C. M Farrell , J. E Loveland , B. J Ruef , E Hart , M. M Suner , M. J Landrum , B Aken , S Ayling , R Baertsch , J Fernandez Banet , J. L Cherry , V Curwen , M DiCuccio , M Kellis , J Lee , M. F Lin , M Schuster , A Shkeda , C Amid , G Brown , O Dukhanina , A Frankish , J Hart , B. L Maidak , J Mudge , M. R Murphy , T Murphy , J Rajan , B Rajput , L. D Riddick , C Snow , C Steward , D Webb , J. A Weber , L Wilming , W Wu , E Birney , D Haussler , T Hubbard , J Ostell , R Durbin and D. Lipman

Effective use of the human and mouse genomes requires reliable identification of genes and their products. Although multiple public resources provide annotation, different methods are used that can result in similar but not identical representation of genes, transcripts, and proteins. The collaborative consensus coding sequence (CCDS) project tracks identical protein annotations on the reference mouse and human genomes with a stable identifier (CCDS ID), and ensures that they are consistently represented on the NCBI, Ensembl, and UCSC Genome Browsers. Importantly, the project coordinates on manually reviewing inconsistent protein annotations between sites, as well as annotations for which new evidence suggests a revision is needed, to progressively converge on a complete protein-coding set for the human and mouse reference genomes, while maintaining a high standard of reliability and biological accuracy. To date, the project has identified 20,159 human and 17,707 mouse consensus coding regions from 17,052 human and 16,893 mouse genes. Three evaluation methods indicate that the entries in the CCDS set are highly likely to represent real proteins, more so than annotations from contributing groups not included in CCDS. The CCDS database thus centralizes the function of identifying well-supported, identically-annotated, protein-coding regions.

  P Heinzelman , R Komor , A Kanaan , P Romero , X Yu , S Mohler , C Snow and F. Arnold

We describe an efficient SCHEMA recombination-based approach for screening homologous enzymes to identify stabilizing amino acid sequence blocks. This approach has been used to generate active, thermostable cellobiohydrolase class I (CBH I) enzymes from the 390 625 possible chimeras that can be made by swapping eight blocks from five fungal homologs. Constructing and characterizing the parent enzymes and just 32 ‘monomeras’ containing a single block from a homologous enzyme allowed stability contributions to be assigned to 36 of the 40 blocks from which the CBH I chimeras can be assembled. Sixteen of 16 predicted thermostable chimeras, with an average of 37 mutations relative to the closest parent, are more thermostable than the most stable parent CBH I, from the thermophilic fungus Talaromyces emersonii. Whereas none of the parent CBH Is were active >65°C, stable CBH I chimeras hydrolyzed solid cellulose at 70°C. In addition to providing a collection of diverse, thermostable CBH Is that can complement previously described stable CBH II chimeras (Heinzelman et al., Proc. Natl Acad. Sci. USA 2009;106:5610–5615) in formulating application-specific cellulase mixtures, the results show the utility of SCHEMA recombination for screening large swaths of natural enzyme sequence space for desirable amino acid blocks.

