Newbler parameters are -consed -a 50 -l 350 -g -m -ml 20.

The sequences of the two 16S r RNA gene copies in the genome differ from each other by up to eleven nucleotides, and differ by up to eight nucleotides from the previously published 16S r RNA sequence (AJ440991), which contains seven ambiguous base calls. The tree was inferred from 1,366 aligned characters [7,8] of the 16S r RNA gene sequence under the maximum likelihood (ML) criterion [9].

Rooting was done initially using the midpoint method [10] and then checked for its agreement with the current classification (Table 1).

(Note that the Greengenes database uses the INSDC (= EMBL/NCBI/DDBJ) annotation, which is not an authoritative source for nomenclature or classification.) The highest-scoring environmental sequence was EU735617 (Greengenes short name: 'archaeal structures and pristine soils China oil contaminated soil Jidong Oilfield clone SC78'), which showed an identity of 99.0% and an HSP coverage of 98.4%.

The most frequently occurring keywords within the labels of all environmental samples which yielded hits were 'librari' (3.2%), 'dure' (3.0%), 'bioremedi, broader, chromat, groundwat, microarrai, polylact, sampl, stimul, subsurfac, typic, univers' (2.9%), 'spring' (2.5%) and 'soil' (2.4%) (156 hits in total).

Further detailed physiological data such as carbon source utilization, carbon degradation, and enzyme activities have been reported previously [1]. The genome project is deposited in the Genomes On Line Database [13] and the complete genome sequence is deposited in Gen Bank.

Community Discussion