Perl extension for manipulating PDB files

  • Author

  • Source code

  • Requirements
    • Unix or Unix-like system
    • Perl 5
    • LAPACK (tested with version 3.8.0)
    • GNU compilers (gcc and g77 or gfortran)

  • How to install
    1. Download the source code package.
    2. Unpack it by running "tar xvfz PDB.tgz"
    3. Change directory to PDB.
    4. Edit install.sh to adjust the LAPACK_DIR and LD variables.
    5. Run install.sh
    Note that LAPACK should be compiled with "-fPIC" option.

  • Example 1: Superimposing Cα atoms of a protein on those of another protein based on the alignment generated by CE. (ex1.tgz)
    #!/usr/bin/perl -w
    
    use lib 'installation_directory';
    use PDB;
    
    $a=PDB->new();
    $a->read(fname => "1AUA.pdb");
    $b=PDB->new();
    $b->read(fname => "3B7N.pdb");
    
    @lista=$a->select(resnum => [8..221,243..271,276..299], atnam => ["CA"], chain => ["A"]);
    @listb=$b->select(resnum => [4..91,98..223,245..273,278..301], atnam => ["CA"], chain => ["A"]);
    
    $rmsd=$a->fit(moving_pdb => $b, moving_atoms => \@listb, ref_atoms => \@lista);
    
    print "RMSD=$rmsd\n";
    
    $b->write(fname => "3B7N_fit.pdb");
    

  • Example 2: Selecting protein atoms within a 5-Å cut-off distance from a ligand. (ex2.tgz)
    #!/usr/bin/perl -w
    
    use lib 'installation_directory';
    use PDB;
    
    $cut=5.0;
    
    $a=PDB->new();
    $a->read(fname => "1I0V.pdb");
    @ligand =$a->select(resnum => [106], alt => [" ","A"]);
    @protein=$a->select(resnum => [1..104], alt => [" ","A"]);
    
    %contact=();
    foreach $i (@protein) {
      foreach $j (@ligand) {
        $d=$a->distance($i,$j);
        if($d < $cut) {
          $contact{$i}++;
          last;
        }
      }
    }
    @list=sort {$a <=> $b} keys(%contact);
    $a->write(fname => "1I0V_contact.pdb", selection => \@list);
    

  • Example 3: Renumbering residue number.
    #!/usr/bin/perl -w
    
    use lib 'installation_directory';
    use PDB;
    
    @files=<*[0-9].pdb>;
    
    foreach $file (@files) {
      $out=$file;
      $out =~ s/\.pdb$/.renum.pdb/;
      $a=PDB->new();
      $a->read(fname => $file);
      for($i=0;$i<$a->{natom};$i++) {
        if($a->{chain}->[$i] eq "B") {
          $a->{resnum}->[$i]-=407;
        }
      }
      $a->write(fname => $out);
    }
    

  • Data structure
    Data are stored in a hash table. A hash element of PDB object $a can be accessed using a hash key as $a->{Key}. When the type of the hash value is reference to an array, an array element can be accessed using the index of the array as $a->{Key}->[Index].
    KeyDescription of valueType of value
    natomNumber of atoms (natom)Integer value
    seqresPrimary sequence from SEQRES record of each chainReference to a hash table
    atnumAtom serial numberReference to an integer array of length natom
    atnamAtom nameReference to a character array of length natom
    altAlternate location indicatorReference to a character array of length natom
    resnamResidue nameReference to a character array of length natom
    chainChain identifierReference to a character array of length natom
    resnumResidue sequence numberReference to an integer array of length natom
    insertCode for insertion of residuesReference to a character array of length natom
    xOrthogonal coordinates for X in AngstromsReference to a floating-point number array of length natom
    yOrthogonal coordinates for Y in AngstromsReference to a floating-point number array of length natom
    zOrthogonal coordinates for Z in AngstromsReference to a floating-point number array of length natom
    qOccupancyReference to a floating-point number array of length natom
    bTemperature factorReference to a floating-point number array of length natom
    segnameSegment name (used in CHARMM)Reference to a character array of length natom
    elementElement symbolReference to a character array of length natom

  • Methods
    • new
      • Synopsis
        $a=PDB->new();
      • Description
        This method creates a new PDB object.
    • read
      • Synopsis
        $a->read(fname => $file_name);
        $a->read(fh => $fh);
      • Description
        This method reads a PDB file whose name is specified by $file_name or to file handle $fh, and stores the data in the data structure of the PDB object.
    • write
      • Synopsis
        $a->write(fname => $file_name, selection => \@list);
        $a->write(fh => $fh, selection => \@list);
      • Description
        This method writes the PDB data to a file whose name is specified by $file_name or to file handle $fh. When the value for selection is given, only the atoms whose indexes are listed in @list are written.
    • select
      • Synopsis
        @list=$a->select(chain => \@chain_id_list, resnum => \@residue_number_list, atnam => \@atom_name_list, alt => \@alt_id_list);
      • Description
        This method makes a list of atoms that have chain identidiers listed in @chain_id_list, residue numbers listed in @residue_number_list, atom names listed in @atom_name_list and alternate location indicators listed in @alt_id_list. When a set of key => value is omitted, the property associated with the key is ignored in the selection.
    • zone
      • Synopsis
        @list=$a->zone(center_id => $i, cut => $cut);
        @list=$a->zone(center_xyz => [$x, $y, $z], cut => $cut);
      • Description
        This method makes a list of atoms that are within a sphere of radiue $cut Å centered on atom $i or on ($x, $y, $z).
    • fit
      • Synopsis
        $rmsd=$a->fit(moving_pdb => $b, moving_atoms => \@listb, ref_atoms => \@lista);
      • Description
        This method superimposes the atoms listed in @listb of PDB object $b on the atoms listed in @lista of PDB object $a and returns the RMSD value in Å.
    • rmsd
      • Synopsis
        $rmsd=$a->rmsd(comp_pdb => $b, comp_atoms => \@listb, ref_atoms => \@lista);
      • Description
        This method calculates RMSD between the atoms listed in @listb of PDB object $b and the atoms listed in @lista of PDB object $a without fitting.
    • distance
      • Synopsis
        $d=$a->distance($i, $j);
      • Description
        This method calculates the distance between atoms $i and $j.
    • angle
      • Synopsis
        $t=$a->angle($i, $j, $k);
      • Description
        This method calculates the angle formed by atoms $i, $j, and $k.
    • dihedral
      • Synopsis
        $p=$a->dihedral($i, $j, $k, $l);
      • Description
        This method calculates the dihedral angle formed by atoms $i, $j, $k, and $l.
    • set_dihedral
      • Synopsis
        $a->set_dihedral(dih_atoms => [$i, $j, $k, $l], angle = $phi, moving_atoms => \@list);
      • Description
        This method changes dihedral angle formed by atoms $i, $j, $k, and $l to $phi degree. Coordinates of the atoms listed in @list are transformed.
    • copy
      • Synopsis
        $b=$a->copy(selection => \@list);
      • Description
        This method creates a copy of PDB object $a. When When the value for selection is given, only the atoms whose indexes are listed in @list are copied.
    • append
      • Synopsis
        $a->append($b);
      • Description
        This method appends the atoms of PDB object $b to those of PDB object $a. Note that SEQRES data of $a are not altered.
    • getchains
      • Synopsis
        @list=$a->getchains();
      • Description
        This method makes a list of chain identifiers.
    • getseq
      • Synopsis
        @seq3=$a->getseq(three => 1, chain => $c);
        $seq=$a->getseq(chain => $c);
      • Description
        This method obtains primary sequence of chain $c from ATOM records and returns it in the three-letter format (with three => 1) or in the one-letter format.
    • getseqres
      • Synopsis
        @seq3=$a->getseqres(three => 1, chain => $c);
        $seq=$a->getseqres(chain => $c, missing => $m);
      • Description
        This method obtains primary sequence of chain $c from SEQRES records and returns it in the three-letter format (with three => 1) or in the one-letter format. When the value of missing is given, missing residues are shown by the character $m (typically '-') in the one-letter format.
    • is_donor
      • Synopsis
        $flag=$a->is_donor($i);
      • Description
        This method returns 1 if atom $i is an atom of a standard amino acid and an H-bond donor, and otherwise returns 0.
    • is_acceptor
      • Synopsis
        $flag=$a->is_acceptor($i);
      • Description
        This method returns 1 if atom $i is an atom of a standard amino acid and an H-bond acceptor, and otherwise returns 0.


Your questions and comments are welcome. Please send an e-mail to .