Prediction of soluble protein expression in Escherichia coli


  • SoluProt v1 – standalone Python script, useful for processing large sets of proteins. Requires Python 3.7, scikit-learn 0.20.1, BioPython 1.74, pandas, tqdm, TMHMM and USEARCH. We recommend Miniconda for easy installation of dependencies.
  • SoluProt data – training and test set, TargetTrack keywords and lists of identified and manually checked E. coli protocols.
  • NESG dataset – original NESG dataset used to build SoluProt test set.