PDB I/O¶

Scripts in this section are for manipulation of pdb files. Except for PDBParser, the rest of the scripts are standalone and they are located within the stb_proteinart repository. There will be a module that does the same job within PDBParser in the future..

Change Residue Numbering¶

Residue numbering in a PDB file does not always match the Uniprot sequences. While some scientists excludes the signal peptide, other do. Using the renumber.py script one can manipulate the starting residue number for each chain. The script that users MODELLER, renumbers the residues consecutively.

Sometimes in crystal structures, a large domain non-terminal domain is removed, however the residue numbering is written according to the reference sequence. This resembles a gap in the structure, well in fact it is not. For some programs you might need to have a continuous residue numbering. This script takes a pdb file and outputs a partially renumbered pdb.

*KNOWN ISSUES:*: when beta field in a pdb file contains 6 digits, partial renumbering fails as it expects a space between z and beta field.

Mutating a Residue¶

This example in this github folder shows how to make a mutation using pymol. This is not generalized, you have manually change the file.

PDBParser¶

PDBParser are the python scripts that I have wrote to be used with eBDIMs. In the beginning I was parsing the PDB headers to extract the necessary chains, but the pre-alpha release of the package has moved away from this as much as possible. Headers can be inaccurate sometimes. You can install PDBParser as a package, or import it directly. If you want to use run.py as it is, you should install it. Otherwise you can import PDBParser directly and use the commands within run.py as a guide. There are two modes of PDBParser that are avaialbe from run.py executable.

First mode is run with two structures, local files or PDBIDs. These structures are cleaned from anything but the CA coordinates. Then they are compared via pairwise algorithm of BioPython. The gap penalty is set to an extreme value so that point mutations are aligned. You can use these structures (written out as start.pdb and targe.pdb) to generate eBDIMs trajectories (webserver is under development).

Second mode is for ensemble preperation from UniProtKB ID. There is a python notebook associated with this process. Altough it is possible to use run.py executable for an automated generation of structures, I strongly recommend using the python notebook. A chimera, or a missing large terminal domain will produce undesired results.

You can reach the PDBParser github repository from here.