Dealing with Coordinates
The primary information stored in the PDB archive consists of coordinate files that list the atoms in each structure and their 3D location in space, along with summary information about the structure, sequence, and experiment. These files are available in several formats (PDBx/mmCIF, PDB, XML). The archive also includes data files containing experimental observations that are used to determine these atomic coordinates.
To fully explore the structures in the PDB archive, it is helpful to understand a few concepts about coordinate files. In addition, this knowledge will aid in using visualization programs.
Atomic-level Data
A typical PDB entry will contain atomic coordinates for a diverse collection of proteins, small molecules, ions and water.
Each atom in the coordinate section is identified by a sequential number in the entry file, a specific atom name, the name and number of the residue it belongs to, a one-letter code to specify the chain, its x, y, and z coordinates, and an occupancy and temperature factor (described in more detail below).
In PDBx/mmCIF format, this information is stored in the _atom_site category (please see the Beginner’s Guide to PDB Structures and the PDBx/mmCIF Format for additional information). Shown below are the first several lines from this section of entry 4HHB.
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
ATOM 1 N N . LYS A 1 7 ? 12.364 -13.639
8.445 1.00 54.67 ? 527 LYS A N 1
ATOM 2 C CA . LYS A 1 7 ? 11.119 -12.888
8.550 1.00 49.59 ? 527 LYS A CA 1
ATOM 3 C C . LYS A 1 7 ? 9.961 -13.651
7.926 1.00 44.77 ? 527 LYS A C 1
ATOM 4 O O . LYS A 1 7 ? 9.055 -14.126
8.617 1.00 49.39 ? 527 LYS A O 1
ATOM 5 C CB . LYS A 1 7 ? 11.255 -11.538
7.841 1.00 49.41 ? 527 LYS A CB 1
ATOM 6 C CG . LYS A 1 7 ? 10.169 -10.531
8.174 1.00 53.16 ? 527 LYS A CG 1
ATOM 7 C CD . LYS A 1 7 ? 10.523 -9.771
9.432 1.00 59.71 ? 527 LYS A CD 1
ATOM 8 C CE . LYS A 1 7 ? 11.779 -8.947
9.195 1.00 63.60 ? 527 LYS A CE 1
ATOM 9 N NZ . LYS A 1 7 ? 12.353 -8.381
10.443 1.00 64.85 ? 527 LYS A NZ 1
ATOM 10 N N . ARG A 1 8 ? 10.011 -13.762
6.603 1.00 40.03 ? 528 ARG A N 1
<snip>
In PDB file format, the ATOM record is used to identify proteins or nucleic acid atoms, and the HETATM record is used to identify atoms in small molecules. Shown below are the first several lines from this section of entry 4HHB.
ATOM 1 N LYS A 527 12.364 -13.639
8.445 1.00 54.67 N
ATOM 2 CA LYS A 527 11.119 -12.888
8.550 1.00 49.59 C
ATOM 3 C LYS A 527 9.961 -13.651
7.926 1.00 44.77 C
ATOM 4 O LYS A 527 9.055 -14.126
8.617 1.00 49.39 O
ATOM 5 CB LYS A 527 11.255 -11.538
7.841 1.00 49.41 C
ATOM 7 CD LYS A 527 10.523 -9.771
9.432 1.00 59.71 C
ATOM 8 CE LYS A 527 11.779 -8.947
9.195 1.00 63.60 C
ATOM 9 NZ LYS A 527 12.353 -8.381
10.443 1.00 64.85 N
ATOM 10 N ARG A 528 10.011 -13.762
6.603 1.00 40.03 N
This information gives you a lot of control when exploring the structure. For instance, most molecular graphics programs enable you to color identified portions of the molecule selectively--for example, to pick out all of the carbon atoms and color them green, or to pick one particular amino acid and highlight it.
Tip: By default, many molecular graphics programs do not display the water molecules that may be present even though they are often important to the function and interaction of biological molecules. Most of these programs have a way to display them, if you use their methods for atom selection.
Chains and Models
Biological molecules are hierarchical, building from atoms to residues to chains to assemblies. Coordinate files contain ways to organize and specify molecules at all of these levels. As described above, the atom names and residue information are included in each atom record.
In PDBx/mmCIF format, the looping nature of the records makes it easy to represent different chains and multiple molecules.
Shown below is a segment from entry 4hhb showing the transition from chain A to chain B, where the chain is designated in the _atom_site.label_asym_id record and further identified in the _atom_site.label_entity_id record. Please see the Beginner’s Guide to PDB Structures and the PDBx/mmCIF Format for an introduction to entities.
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
ATOM 1 N N . VAL A 1 1 ? 6.204 16.869
4.854 1.00 49.05 ? 1 VAL A N 1
ATOM 2 C CA . VAL A 1 1 ? 6.913 17.759
4.607 1.00 43.14 ? 1 VAL A CA 1
ATOM 3 C C . VAL A 1 1 ? 8.504 17.378
4.797 1.00 24.80 ? 1 VAL A C 1
<snip>
ATOM 1067 N NH1 . ARG A 1 141 ? -10.147 7.455
-6.079 1.00 23.24 ? 141 ARG A NH1 1
ATOM 1068 N NH2 . ARG A 1 141 ? -8.672 8.328
-4.506 1.00 33.34 ? 141 ARG A NH2 1
ATOM 1069 O OXT . ARG A 1 141 ? -9.474 13.682
-9.742 1.00 31.52 ? 141 ARG A OXT 1
ATOM 1070 N N . VAL B 2 1 ? 9.223 -20.614
1.365 1.00 46.08 ? 1 VAL B N 1
ATOM 1071 C CA . VAL B 2 1 ? 8.694 -20.026
-0.123 1.00 70.96 ? 1 VAL B CA 1
ATOM 1072 C C . VAL B 2 1 ? 9.668 -21.068
-1.645 1.00 69.74 ? 1 VAL B C 1
ATOM 1073 O O . VAL B 2 1 ? 9.370 -22.612
-0.994 1.00 71.82 ? 1 VAL B O 1
<snip>
Here, for solution NMR ensemble structure entry 1vre, the _atom_site.pdbx_PDB_model_num record is used to indicate the 29 different models represented in the file:
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
ATOM 1 N N
. GLY A 1 1 ? 13.878 9.721 9.134 1.00 0.00 ? 1 GLY A N 1
ATOM 2 C CA
. GLY A 1 1 ? 12.761 8.747 8.973 1.00 0.00 ? 1 GLY A CA 1
ATOM 3 C C
. GLY A 1 1 ? 13.273 7.506 8.239 1.00 0.00 ? 1 GLY A C 1
<snip>
HETATM 2175 H HBD2 . HEM B 2 . ? -8.871
3.884 -8.248 1.00 0.00 ? 148 HEM A HBD2 1
HETATM 2176 C C
. CMO C 3 . ? -7.184 0.894 -1.865 1.00 0.00 ? 149 CMO A C 1
HETATM 2177 O O
. CMO C 3 . ? -7.008 -0.217 -1.956 1.00 0.00 ? 149 CMO A O 1
ATOM 2178 N N
. GLY A 1 1 ? 11.063 9.378 8.937 1.00 0.00 ? 1 GLY A N 2
ATOM 2179 C CA
. GLY A 1 1 ? 10.504 8.078 8.473 1.00 0.00 ? 1 GLY A CA 2
ATOM 2180 C C
. GLY A 1 1 ? 11.648 7.196 7.970 1.00 0.00 ? 1 GLY A C 2
<snip>
HETATM 63131 H HBD2
. HEM B 2 . ? -8.603 4.604 -7.315 1.00 0.00 ? 148 HEM A HBD2 29
HETATM 63132 C C
. CMO C 3 . ? -7.211 0.912 -1.966 1.00 0.00 ? 149 CMO A C 29
HETATM 63133 O O
. CMO C 3 . ? -7.058 -0.203 -2.022 1.00 0.00 ? 149 CMO A O 29
#
In PDB file format, TER records are used to separate protein and nucleic acid chains. The chains are included one after another in the file, separated by a TER record to indicate that the chains are not physically connected to each other. Most molecular graphics programs look for this TER record so that they don't draw a bond to connect different chains. Shown below is the portion of entry 4HHB where a TER record is used to separate the first copy of the alpha chain (chain A) from the first copy of the beta chain (chain B):
ATOM 1067 NH1 ARG A 141 -10.147 7.455
-6.079 1.00 23.24 N
ATOM 1068 NH2 ARG A 141 -8.672 8.328
-4.506 1.00 33.34 N
ATOM 1069 OXT ARG A 141 -9.474 13.682
-9.742 1.00 31.52 O
TER 1070 ARG A
141
ATOM 1071 N VAL B 1 9.223 -20.614
1.365 1.00 46.08 N
ATOM 1072 CA VAL B 1 8.694 -20.026
-0.123 1.00 70.96 C
ATOM 1073 C VAL B 1 9.668 -21.068
-1.645 1.00 69.74 C
ATOM 1074 O VAL B 1 9.370 -22.612
-0.994 1.00 71.82 O
ATOM 1075 CB VAL B 1 9.283 -18.281
-0.381 1.00 59.18 C
ATOM 1076 CG1 VAL B 1 7.449 -17.518
-0.791 1.00 57.89 C
Chains B and C will be separated similarly, as will chains C and D.
PDB format files use the MODEL/ENDMDL keywords to indicate multiple molecules in a single file. This was initially created to archive coordinate sets that include several different models of the same structure, like the structural ensembles obtained in NMR analysis. When you view these files, you will see dozens of similar molecules all superimposed. The MODEL keyword is now also used in biological assembly files to separate the many symmetrical copies of the molecule that are generated from the asymmetric unit (For more information, see the tutorial on biological assemblies).
Shown below is a section from the biological assembly file of entry 1out which contains half (chains A and B) of the hemoglobin model in the asymmetric unit. The full 4-chain molecule is found in the biological assembly file, where the two sets of two chains are separated by MODEL records:
<snip>
MODEL 1
HETATM 1 C ACE A 0 40.573 27.347
55.464 1.00 42.49 C
HETATM 2 O ACE A 0 41.130 27.445 56.567
1.00 50.27 O
HETATM 3 CH3 ACE A 0 39.709 28.526
55.115 1.00 49.32 C
<snip>
HETATM 2475 O HOH B 238 8.440 58.387 54.230 1.00 67.86 O
HETATM 2476 O HOH B 239 23.699 54.828
72.752 1.00 71.63 O
HETATM 2477 O HOH B 240 30.823 46.229
47.604 1.00 71.95 O
ENDMDL
MODEL 2
HETATM 1 C ACE A 0 50.950 33.338
48.783 1.00 42.49 C
HETATM 2 O ACE A 0 50.587 32.905
47.680 1.00 50.27 O
HETATM 3 CH3 ACE A 0 50.361 34.676
49.132 1.00 49.32 C
<snip>
HETATM 2475 O HOH B 238 40.135 76.686 50.017 1.00 67.86 O
HETATM 2476 O HOH B 239 35.588 61.692
31.495 1.00 71.63 O
HETATM 2477 O HOH B 240 39.473 51.223
56.643 1.00 71.95 O
ENDMDL
MASTER 0 0 0 16 0 0 8 6
2475 2 0 23
END
Temperature Factors
If we were able to hold an atom rigidly fixed in one place, we could observe its distribution of electrons in an ideal situation. The image would be dense towards the center with the density falling off further from the nucleus. When you look at experimental electron density distributions, however, the electrons usually have a wider distribution than this ideal. This may be due to vibration of the atoms, or differences between the many different molecules in the crystal lattice. The observed electron density will include an average of all these small motions, yielding a slightly smeared image of the molecule.
These motions, and the resultant smearing of the electron density, are incorporated into the atomic model by a B-value or temperature factor. The amount of smearing is proportional to the magnitude of the B-value. Values under 10 create a model of the atom that is very sharp, indicating that the atom is not moving much and is in the same position in all of the molecules in the crystal. Values greater than 50 or so indicate that the atom is moving so much that it can barely been seen. This is often the case for atoms at the surface of proteins, where long side chains are free to wag in the surrounding water.
In PDBx/mmCIF format, the _atom_site.B_iso_or_equiv record is used to store temperature factor values. Again from entry 4hhb:
<snip>
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
ATOM 1 N N . VAL A 1 1 ? 6.204 16.869
4.854 1.00 49.05 ? 1 VAL A N 1
ATOM 2 C CA . VAL A 1 1 ? 6.913 17.759
4.607 1.00 43.14 ? 1 VAL A CA 1
ATOM 3 C C . VAL A 1 1 ? 8.504 17.378
4.797 1.00 24.80 ? 1 VAL A C 1
<snip>
In PDB file format, the temperature factor is given in columns 61 - 66. From entry 4hhb:
<snip>
ATOM 1 N VAL A 1 6.204 16.869
4.854 1.00 49.05 N
ATOM 2 CA VAL A 1 6.913 17.759 4.607
1.00 43.14 C
ATOM 3 C VAL A 1 8.504 17.378
4.797 1.00 24.80 C
<snip>
The picture shows the whole molecule, with the atoms colored by the temperature factors. High values, indicating lots of motion, are in red and yellow, and low values are in blue. Notice that the interior of the protein has low B-values and the amino acids on the surface have higher values.
Click on the Jmol tab to see an interactive Jmol.
The Jmol shows the whole molecule, with the atoms colored by the temperature factors. High values, indicating lots of motion, are in red and yellow, and low values are in blue. Notice that the interior of the protein has low B-values and the amino acids on the surface have higher values.
Tip: Temperature factors are a measure of our confidence in the location of each atom. If you find an atom on the surface of a protein with a high temperature factor, keep in mind that this atom is probably moving a lot, and that the coordinates specified in the PDB file are only one possible snapshot of its location.
Occupancy and Multiple Conformations
Macromolecular crystals are composed of many individual molecules packed into a symmetrical arrangement. In some crystals, there are slight differences between each of these molecules. For instance, a sidechain on the surface may wag back and forth between several conformations, or a substrate may bind in two orientations in an active site, or a metal ion may be bound to only a few of the molecules. When researchers build the atomic model of these portions, they can use the occupancy to estimate the amount of each conformation that is observed in the crystal. For most atoms, the occupancy is given a value of 1, indicating that the atom is found in all of the molecules in the same place in the crystal. However, if a metal ion binds to only half of the molecules in the crystal, the researcher will see a weak image of the ion in the electron density map, and can assign an occupancy of 0.5 in the PDB structure file for this atom. Occupancies are also commonly used to identify side chains or ligands that are observed in multiple conformations. The occupancy value is used to indicate the fraction of molecules that have each of the conformations. Two (or more) atom records are included for each atom, with occupancies like 0.5 and 0.5, or 0.4 and 0.6, or other fractional occupancies that sum to a total of 1.
The picture below of the whole myoglobin molecule is shown with all of the amino acids that have two conformations in the file.
Click on the Jmol tab to see an interactive Jmol.
Alternate Conformations in Myoglobin (PDB entry 1a6m)
Tip: When dealing with PDB entries with multiple coordinates, you often need to pay close attention. It is not always possible to select just the "A" conformations and throw away the "B" conformations. You need to look carefully in each case and make sure that there are not any bad contacts between mobile sidechains.
In PDBx/mmCIF format, alternate conformations are indicated in the _atom_site.label_alt_id category and the occupancy in the _atom_site.occupancy category. Shown below is residue 8 from entry 1a6m.
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
<snip>
ATOM 63 N N . GLN A 1 8 ? 5.404 13.203
22.532 1.00 8.42 ? 8 GLN A N 1
ATOM 64 C CA . GLN A 1 8 ? 6.475 12.812
23.418 1.00 8.84 ? 8 GLN A CA 1
ATOM 65 C C . GLN A 1 8 ? 7.602 12.149
22.631 1.00 8.08 ? 8 GLN A C 1
ATOM 66 O O . GLN A 1 8 ? 8.769 12.399
22.918 1.00 8.39 ? 8 GLN A O 1
ATOM 67 C CB A GLN A 1 8 ? 5.987 11.822
24.520 0.57 13.03 ? 8 GLN A CB 1
ATOM 68 C CB B GLN A 1 8 ? 5.948 11.968
24.580 0.43 9.68 ? 8 GLN A CB 1
ATOM 69 C CG A GLN A 1 8 ? 7.030 11.303
25.506 0.57 16.30 ? 8 GLN A CG 1
ATOM 70 C CG B GLN A 1 8 ? 6.967 12.094
25.688 0.43 12.07 ? 8 GLN A CG 1
ATOM 71 C CD A GLN A 1 8 ? 7.981 10.227
25.063 0.57 15.61 ? 8 GLN A CD 1
ATOM 72 C CD B GLN A 1 8 ? 6.439 11.470
26.952 0.43 14.43 ? 8 GLN A CD 1
ATOM 73 O OE1 A GLN A 1 8 ? 7.688 9.392
24.214 0.57 19.54 ? 8 GLN A OE1 1
ATOM 74 O OE1 B GLN A 1 8 ? 5.419 10.767
26.918 0.43 17.46 ? 8 GLN A OE1 1
ATOM 75 N NE2 A GLN A 1 8 ? 9.219 10.114
25.607 0.57 21.38 ? 8 GLN A NE2 1
ATOM 76 N NE2 B GLN A 1 8 ? 7.067 11.762
28.084 0.43 14.03 ? 8 GLN A NE2 1
In PDB file format, alternate conformations are given in column 17 using an alternate location indicator and occupancy is given in columns 55 - 60. Shown below from entry 1a6m is the glutamine residue 8 modeled in two different conformations, A and B, where conformation A is given 57% occupancy and conformation B is given 43% occupancy:
ATOM 63 N GLN A 8 5.404 13.203
22.532 1.00 8.42 N
ATOM 64 CA GLN A 8 6.475 12.812
23.418 1.00 8.84 C
ATOM 65 C GLN A 8 7.602 12.149
22.631 1.00 8.08 C
ATOM 66 O GLN A 8 8.769 12.399
22.918 1.00 8.39 O
ATOM 67 CB AGLN
A 8 5.987 11.822 24.520 0.57
13.03 C
ATOM 68 CB BGLN
A 8 5.948 11.968 24.580 0.43
9.68 C
ATOM 69 CG AGLN
A 8 7.030 11.303 25.506 0.57
16.30 C
ATOM 70 CG BGLN
A 8 6.967 12.094 25.688 0.43
12.07 C
ATOM 71 CD AGLN
A 8 7.981 10.227 25.063 0.57
15.61 C
ATOM 72 CD BGLN
A 8 6.439 11.470 26.952 0.43
14.43 C
ATOM 73 OE1AGLN
A 8 7.688 9.392 24.214 0.57
19.54 O
ATOM 74 OE1BGLN
A 8 5.419 10.767 26.918 0.43
17.46 O
ATOM 75 NE2AGLN
A 8 9.219 10.114 25.607 0.57
21.38 N
ATOM 76 NE2BGLN
A 8 7.067 11.762 28.084 0.43
14.03 N
<snip>