Welcome to the SELPHI2 server

The SELPHI 2.0 (Systematic Extraction of Linked PhosphoInteractions 2.0) server provides you with a platform to analyse phosphoproteomics data, including providing you with a list of high confidence kinase-substrate predictions for the phosphosites included in their data. SELPHI 2 contains 73+ million kinase substrate predictions. You can also fit the kinase substrate predictions to their data set to identify context-specific sub networks, conduct pathway enrichments and download highly probable edges supported by external evidence from external publications.


If you find SELPHI2.0 useful please cite:

Maier BD#, Petursson B#, Lussana A# and Petsalaki E, SELPHI 2: Data-driven extraction of human kinase-substrate relationships from omics datasets. bioRxiv, 2024. https://doi.org/10.1101/2022.01.15.476449

(#) These authors contributed equally.

All code for reproducing this project as well as a Docker image of the web server can be found at: https://gitlab.ebi.ac.uk/petsalakilab/selphi_2


License for SELPHI2 server

Copyright (c) 2024 Petsalaki Group, EMBL-EBI

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Upload data for SELPHI2 server

Please upload data for analysis. The default format is columns representing samples and rows representing phosphosites. HGNC symbols are used. If your data does not conform to this format, you can indicate which columns contain data and which columns contain information on protein names (UniProt or HGNC) and the position in the protein or the peptide. If the position given is within the peptide the peptide will be mapped onto the UniProt sequence to find position within the protein.

Predictions can be made for the phosphosites in the data set in the following ways:
(i) Correlation based predictions: Spearman's correlation between phosphosites found on kinases
and the rest of the phosphosites. if the kinase has more than one phosphosite, the highest correlation coefficient is chosen as a edge score
(ii) Random forest: Random forest classifier was used to generate a list of kinase substrate prediction as is described in our recent publication[1].
You can download kinase predictions for phosphosites that are included in this prediction list
(iii) Random forest functional: Same as (ii) but only for phosphosites that are likely to be functional according to functional score developed previously[2].

References

1. Maier BD#, Petursson B#, Lussana A# and Petsalaki E, SELPHI 2: Data-driven extraction of human kinase-substrate relationships from omics datasets. bioRxiv, 2024. https://doi.org/10.1101/2022.01.15.476449
2. Ochoa, D., Jarnuczak, A.F., Viéitez, C. et al. The functional landscape of the human phosphoproteome. Nat Biotechnol 38, 365–373 (2020). https://doi.org/10.1038/s41587-019-0344-3

Enrichment of regulated phosphosites

This is the enrichment submission page. Please select log ratio threshold to select regulated phosphosites. Enrichment annalysis is applied on the proteins that contain the regulated phosphosite. Enrichment is conducted by using the enrichR[1] and the clusterProfiler[2] package. Separate heatplots are generated for the down and up-regulated phosphosite.

References

1. Kuleshov, Maxim V., Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, et al. 2016. “Enrichr: A Comprehensive Gene Set Enrichment Analysis Web Server 2016 Update.” Nucleic Acids Res 44 (Web Server issue): W90–W97.
2. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu x, Liu S, Bo X, Yu G (2021). “clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.” The Innovation, 2(3), 100141. doi:10.1016/j.xinn.2021.100141.

Select clustering methods (kmeans or Mclust) to cluster the data and run enrichmnet analysis on the derived clusters. We use the Shilohette to select cluster numbers. This step might take a few minutes ...

Prize collecting Steiner's forest

This is the Prize collecting Steiner's forest (PCSF) tab[1]. A signalling network is fitted to your data to generate a context specific network for the samples in your data. The network is generated by combining the kinase substrate predictions and a kinase kinase regulatory network. You can choose between probabilistic kinase-kinase network which was formulated in a previous publication[2] and the literature network as described in OmniPath[3].
PCSF has three tunable parameters:
(i) w: the number of trees in the forest
(ii) b: the node penalty
(iii) μ: the edge penalty
You can select the minimum and maximum values for each of these parameters and the increment. All possible parameter combinations are tested the solution with the best F1 score with regards to kinase substrate relationships

Reference

1. Akhmedov M, Kedaigle A, Chong RE, Montemanni R, Bertoni F, et al. (2017) PCSF: An R-package for network-based interpretation of high-throughput data. PLOS Computational Biology 13(7): e1005694. https://doi.org/10.1371/journal.pcbi.1005694
2. Brandon M. Invergo, Borgthor Petursson, Nosheen Akhtar, David Bradley, Girolamo Giudice, Maruan Hijazi, Pedro Cutillas, Evangelia Petsalaki, Pedro Beltrao, 2020, Prediction of Signed Protein Kinase Regulatory Circuits, Cell systems, Pages 384-396.e9, ISSN 2405-4712, https://doi.org/10.1016/j.cels.2020.04.005.
3. D Turei, T Korcsmaros and J Saez-Rodriguez (2016) OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature Methods 13 (12)

Kinase activity heatmap

Kinase activities can be calculated for kinases that are predicted to have substrate in your data. You can select a probability threshold for kinase substrates predictions. Kolmigorov Smirnov statistic is used to assess kinase activities by quantifying overrepresentation of kinase's substrate at the top of the phosphorylation distribution. For comparion the same activities are calculated by using kinase substrates from PhosphoSitePlus as well as experimentally predicted edges from Sugiyama.


Download heatmaps



Experimentally supported edges

Here you can download a list of experimentally supported SELPHI2 predictions. Two recent high-throughput studies are used to corroborate kinase substrate predictions[1],[2]. The boxplot presented below, shows the probability assigned to the edges that are supported by either or both of these external studies as compared to the background of unsupported edges.

References

1. Hijazi, M., Smith, R., Rajeeve, V., Bessant, C. and Cutillas, P. R. Reconstructing kinase network topologies from phosphoproteomics data reveals cancer-associated rewiring. Nat. Biotechnol. 38, 493–502 (2020)
2. Sugiyama, N., Imamura, H. and Ishihama, Y. Large-scale Discovery of Substrates of the Human Kinome. Sci. Rep. 9, 10503 (2019)

Download

Welcome to the sephi2 server

1. To upload data please press Browse under the Choose File option. Alternatively one of three examples can be picked under select examples.

2. To correctly upload your data please indicate if the first line contains sample names then select separator, that is how values in your table are separated.

3. The standard format for SELPHI2 has sample names as first row, gene names and position in the first column and data values in all subsequent columns. If your data is, for example, processed output from a mass spectrometry analysis, the following steps need to be followed:


(i) check the "Does the input need to be reformated?"
(ii) Indicate which column contains protein names
(iii) Indicate which which column contains the residue number

(iv) Give substring that identifies relevevant data columns. For example, if your samples are named:

sample_1, sample_2,..., sample_n, the substring sample will identify the data columns. Please make sure that the substring is unique to relevant columns.

4. Please select the number of rows to skip in the file, this option is for files with titles or descriptions at the head of the fie.

5. Select type of protein Ids Selphi2 currently support Uniprot accession and HGNC.

6. Predictions can be made for the phosphosites in the data set in the following ways:
(i) Correlation based predictions: Spearman's correlation between phosphosites found on kinases
and the rest of the phosphosites. if the kinase has more than one phosphosite, the highest correlation coefficient is chosen as a edge score
(ii) Random forest: Random forest classifier was used to generate a list of kinase substrate prediction as is described in our recent publication[1].
You can download kinase predictions for phosphosites that are included in this prediction list
(iii) Random forest functional: Same as (ii) but only for phosphosites that are likely to be functional according to functional score developed previously[2].

All available predictions can also be downloaded.

If all information was provided correctly, the main panel should look similar to the figure below.

References

1. Maier BD#, Petursson B#, Lussana A# and Petsalaki E, SELPHI 2: Data-driven extraction of human kinase-substrate relationships from omics datasets. bioRxiv, 2024. https://doi.org/10.1101/2022.01.15.476449
2. Ochoa, D., Jarnuczak, A.F., Viéitez, C. et al. The functional landscape of
the human phosphoproteome. Nat Biotechnol 38, 365–373 (2020). https://doi.org/10.1038/s41587-019-0344-3

Enrichment of regulated phosphosites

This is the enrichment submission page.

1. Please select database to enrich against, available are: Reactome, KEGG, Jensen's diseases and GO

2. Please select log ratio threshold to detrmine wich phosphosites count as regulated. Enrichment annalysis is applied on the proteins

that include a regulated phosphosite.

3. Select clustering methods. This clusters the data and uses each cluster as a set for enrichment.

Enrichment is conducted by using the enrichR[1] and the clusterProfiler [2] package. Separate heatplots are generated for the down and up-regulated phosphosite.

4. To enrich up/downregulated sites, slelct the UP/DOWNREGULAED PSITES and hit start.

5. To start clustering enrichment select clusters and press start.

A successful submission results in a figure as shown below.

References

1. Kuleshov, Maxim V., Matthew R. Jones, Andrew D. Rouillard, Nicolas F. Fernandez, Qiaonan Duan, Zichen Wang, Simon Koplev, et al. 2016. “Enrichr: A Comprehensive Gene Set Enrichment Analysis Web Server 2016 Update.” Nucleic Acids Res 44 (Web Server issue): W90–W97.
2. Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, Fu x, Liu S, Bo X, Yu G (2021). “clusterProfiler 4.0: A universal enrichment tool for interpreting omics data.” The Innovation, 2(3), 100141. doi:10.1016/j.xinn.2021.100141.

Map signalling network onto data

A signalling network is fitted to uploaded data to generate a context specific network for the samples in the data. The network is generated by combining the kinase substrate predictions and a kinase kinase regulatory network. You can choose between probabilistic kinase-kinase network which was formulated in a previous publication[2] and the literature network as described in OmniPath[3].

1. Choose Pruning: None includes all proteins in the data; TS/PPS/KS will only include Phosphatases, Kinases and transcription factors

2. Chosse network probability to select kinase substrate probability threshold

3. Choose between probabilistic kinase-kinase network which was formulated in a previous publication[2] and the literature network as described in OmniPath[3].

4. Select log2 threshold to seect phosphosites that are to be included in the sub-network

5. Parameter tuning: PCSF has three tunable parameters:
(i) w: the number of trees in the forest
(ii) b: the node prizes: the higher the value, the greater the prize for including nodes
, (iii) μ: the edge penalty: the higher the value, the higher the penaliation for including edges
You can select the minimum and maximum values for each of these parameters and the increment. All possible parameter combinations are tested the solution with the best F1 score with regards to kinase substrate relationships

6. Press start

Successful run will look like the image below.

Reference

1. Akhmedov M, Kedaigle A, Chong RE, Montemanni R, Bertoni F, et al. (2017) PCSF: An R-package for network-based interpretation of high-throughput data. PLOS Computational Biology 13(7): e1005694. https://doi.org/10.1371/journal.pcbi.1005694
2. Brandon M. Invergo, Borgthor Petursson, Nosheen Akhtar, David Bradley, Girolamo Giudice, Maruan Hijazi, Pedro Cutillas, Evangelia Petsalaki, Pedro Beltrao, 2020, Prediction of Signed Protein Kinase Regulatory Circuits, Cell systems, Pages 384-396.e9, ISSN 2405-4712, https://doi.org/10.1016/j.cels.2020.04.005.
3. D Turei, T Korcsmaros and J Saez-Rodriguez (2016) OmniPath: guidelines and gateway for literature-curated signaling pathway resources. Nature Methods 13 (12)

~

Kinase activity heatmap

Kinase activities can be calculated for kinases that are predicted to have substrate in your data. You can select a probability threshold for kinase substrates predictions. Kolmigorov Smirnov statistic is used to assess kinase activities by quantifying overrepresentation of kinase's substrate at the top of the phosphorylation distribution. For comparion the same activities are calculated by using kinase substrates from PhosphoSitePlus as well as experimentally predicted edges from Sugiyama.