The alignment of inline inspection datasets can help to improve the utilization rate of the data. At present, domestic and foreign scholars have preliminarily established the alignment method. However, there is still a lack of solutions to the complexity and the diversity of Chinese characters, which are used in the inline inspection reports. Here the method of Chinese semantic similarity calculation was used to determine the matching degree between fields, select the matched fields from a large number of fields and achieve the data alignment between different testing companies. This method is improved based on Synonym Forest, and the actual fields from the inline inspection test reports are used. The improved method can distinguish the different fields and has good applicability to the multiple inspection data alignment.
semantic similarity; inline inspection; data alignment; Synonym Forest; long distance pipeline