1 |
|
数据准备
把数据处理成DSSM要求的格式,一个query,一个pos_doc,四个neg_doc
1 |
|
1 |
|
label | q1 | q2 | |
---|---|---|---|
0 | 1 | Q397345 | Q538594 |
1 | 0 | Q193805 | Q699273 |
2 | 0 | Q085471 | Q676160 |
3 | 0 | Q189314 | Q438123 |
4 | 0 | Q267714 | Q290126 |
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
label | query | pos_doc | neg_doc | |
---|---|---|---|---|
0 | 1 | Q397345 | Q538594 | [Q521609, Q175780, Q068667, Q632305] |
1 | 1 | Q369715 | Q658908 | [Q696189, Q428940, Q198861, Q578218] |
2 | 1 | Q537991 | Q268444 | [Q011513, Q022092, Q229357, Q498790] |
3 | 1 | Q639518 | Q053248 | [Q392805, Q657314, Q647857, Q539673] |
4 | 1 | Q683881 | Q087150 | [Q432305, Q723726, Q272217, Q206480] |
把question替换成words
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
label | query | pos_doc | neg_doc | |
---|---|---|---|---|
0 | 1 | W04465 W04058 W05284 W02916 | W18238 W18843 W01490 W09905 | [W06579 W17705 W09745 W10938 W01490 W07863, W1... |
1 | 1 | W12908 W19355 W08041 W06040 W18399 W01773 W16319 | W12908 W06112 W08041 W17342 | [W13157 W16564 W08020 W08924 W08276 W11824 W04... |
2 | 1 | W16429 W14586 W03914 W09648 W02262 W18399 W06682 | W13522 W05733 W17917 W10691 W16319 | [W00022 W06756, W18830 W05733 W08276 W06179 W0... |
3 | 1 | W04182 W05733 W03914 W09400 W13868 | W04476 W11385 W05733 W18804 W16686 W19081 W18448 | [W12440 W19536 W17945 W18080 W15175 W19355 W17... |
4 | 1 | W17378 W14586 W01661 W03914 W04182 W12803 W02262 | W07777 W05733 W04476 W11385 W10628 W08815 W047... | [W18238 W05284 W09158 W04745 W03390, W17378 W0... |
把words进行编码
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
label | query | pos_doc | neg_doc | |
---|---|---|---|---|
0 | 1 | [1, 2, 3, 4] | [5, 6, 7, 8] | [[723, 1649, 27, 151, 7, 25], [124, 773, 99, 2... |
1 | 1 | [45, 21, 46, 47, 48, 49, 30] | [45, 50, 46, 51] | [[39, 324, 837, 66, 287, 238, 1394, 53, 25], [... |
2 | 1 | [52, 26, 53, 54, 55, 48, 56] | [57, 58, 59, 60, 30] | [[951, 1383], [2317, 58, 287, 593, 25], [570, ... |
3 | 1 | [71, 58, 53, 72, 73] | [10, 74, 58, 75, 76, 77, 78] | [[333, 116, 594, 764, 698, 21, 613], [5, 28, 6... |
4 | 1 | [20, 26, 98, 53, 71, 99, 55] | [100, 58, 10, 74, 101, 102, 103, 48, 104] | [[5, 3, 382, 103, 40], [20, 111, 7, 30, 5, 293... |
数据处理成dssm要求的tensor格式样式
1 |
|
构建模型
1 |
|
1 |
|
跑DSSM模型 SGD 一个个样本跑
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|
1 |
|