Skip to main content

Table 5 The dataset

From: Mutual information and variants for protein domain-domain contact prediction

Dataset

Protein

D1

D2

Sequences

Species

1A45

1 82

83 173

160

E(146)N(14)

1BIB

67 270

271 317

236

A(12)B(201)N(23)

1BKS

1 188

189 268

478

A(21)B(401)E(10)N(46)

1FNB

19 152

153 314

58

B(22)E(34)N(2)

1G8A

1 51

52 227

75

A(47)E(20)N(8)

1G8P

18 216

261 350

230

A(10)B(143)E(49)N(28)

1I39

1 158

159 200

688

A(32)B(538)E(7)V(1)U(1)N(109)

1J5X

2 169

170 319

252

A(9)B(183)E(5)N(55)

1LAP

1 147

148 484

454

A(2)B(331)E(84)N(37)

1LLD

7 148

149 319

709

A(33)B(389)E(221)N(66)

1MRI

1 162

163 246

68

B(2)E(65)N(1)

1PII

1 255

256 452

75

B(65)N(10)

1RHD

1 156

157 293

505

A(26)B(365)E(57)U(1)N(56)

1THM

1 127

128 208

106

A(1)B(62)E(34)N(9)

1W98

88 227

228 357

70

E(64)N(6)

1WRU

3 176

177 346

64

B(58)V(2)N(4)

1X2G

1 246

247 337

224

A(2)B(155)E(42)N(25)

2AAA

1 376

377 484

245

B(141)E(74)N(30)

2AHE

16 108

109 253

144

B(25)E(100)N(19)

2D3V

3 95

96 195

77

E(71)N(6)

2D8N

16 97

102 189

240

E(195)N(45)

2E64

1 188

189 235

294

A(9)B(231)E(4)U(1)N(49)

2I00

10 300

301 406

116

A(2)B(80)N(34)

2IU5

1 71

72 180

65

B(56)N(9)

2NPO

3 76

77 188

224

A(3)B(182)U(1)N(38)

2NRC

1 247

261 480

188

A(9)B(96)E(68)N(15)

2OF7

17 67

68 207

204

B(135)N(69)

2OI8

8 86

87 216

215

B(151)N(64)

2PGD

1 172

178 433

317

B(211)E(78)N(28)

2PGE

3 136

137 368

138

A(6)B(102)E(1)N(29)

2PGX

2 56

57 250

102

B(87)N(15)

2PHZ

20 142

143 296

420

A(4)B(343)N(73)

2QY9

201 284

285 495

471

A(32)B(344)E(15)N(80)

2REB

23 268

269 328

482

B(434)E(12)N(36)

2TS1

1 220

248 319

598

B(512)E(34)N(52)

4ENL

1 126

127 436

649

A(32)B(448)E(122)N(47)

4MDH

1 154

155 333

339

A(6)B(173)E(134)N(26)

5FBP

1 201

202 335

355

A(3)B(213)E(112)N(27)

6GST

1 82

90 217

374

B(10)E(312)N(52)

8TLN

1 135

136 316

44

A(1)B(36)E(2)N(5)

  1. The “protein” column contains a list of pdb identifiers[40]. D1 and D2 columns denote the start and end pdb residues of domains 1 and 2, respectively. For all pdbs listed, the start and end residues are located in chain A of the structure, except for pdb 1W98 where the mentioned domains are in chain B, and pdb 8TLN in chain E. The “sequences” column indicates the number of sequences present in the multiple sequence alignment (MSA). The final column states the distribution of sequences in each MSA taken from the various species’ domains: eukaryotes (E); archea (A); bacteria (B); viruses (V); unclassified (U); and not found (N), i.e. those sequences that could not be found in the NCBI Taxonomy Database. This dataset was taken from Hamer et al.[12].