Improved 3D measurement with a novel preprocessing method in DFP

Shadow and background are two common factors in digital fringe projection, which lead to ambiguity in three-dimensional measurement and thereby need to be seriously considered. Preprocessing is often needed to segment the object from invalid points. The existing segmentation approaches based on modulation normally perform well in pure dark background circumstances, which, however, lose accuracy in situations of white or complex background. In this paper, an accurate shadow and background removal technique is proposed, which segments the shadow by one threshold from modulation histogram and segments the background by the threshold in intensity histogram. Experiments are well designed and conducted to verify the effectiveness and reliability of the proposed method.


Background
Digital fringe projection (DFP) techniques are widely employed in flexible, non-contact and high-speed 3D shape measurement [1]. In a DFP system, a sequence of phase-shifted sinusoidal fringes is often projected on the object by the projector, and the fringes are distorted by the object surface and captured by a camera. Phase map can be retrieved from the deformed fringes, and the object height information is calculated from the phase map in a calibrated DFP system [2]. However, shadow and the background are inevitable, since the projector and camera are arranged from different viewpoints. Invalid points such as shadow and background should be identified and removed from the object.
Researchers made great efforts to remedy the influence of invalid points including the shadow and background. Skydan et al. [3] utilized multiple projectors to probe the object from different viewpoints to achieve shadow free reconstruction. However, the increased cost of hardware keeps this method from commonly utilized. Zhang [4] proposed to employ the Gaussian filter on the fringes to remove random noise and identify the invalid points by the monotonicity of the unwrapped phase. However, the Gaussian filter introduces errors to the object details. Chen et al. [5] applied a threshold to the least-squares fitting errors in temporal phase unwrapping for invalid points detection. However, this method is vulnerable to noise [6].
Huang and Asundi [6] proposed a compact framework combining modulation, rms error and monotonicity for shadow and background removal and error detection. Intensity modulation is very effective in measuring how informative are the pixels, and can be used to detect background and shadow. However, manually adjusting the threshold is time-consuming. In practice, the threshold selection is subject to measurement conditions such as the environmental illumination and object surface characteristics. Lu et al. [7] proposed a technique to remove shadow points by mapping the 3D results into projector coordinates, and the modulation is not needed. However, this method can only detect shadow caused by the DFP system [8].
Otsu's method [9] is widely utilized for thresholding in image segmentation, which is automatic and efficient. However, it fails to provide optimal threshold when the class to be separated increases or when the intensity histogram is close to unimodal distribution [10]. Ng [10] improved this technique through a weighting factor, considering the occurrence probability of the threshold point. Both Otsu's method and Ng's method aim for image segmentation based on intensity histogram. The literature [8] utilized the automatic thresholding method in modulation histogram for object detection. However, their method can only deal with dark background with low modulation, since the background and shadow are with similar low modulation, while the object is with obviously higher modulation level, and only one threshold is needed to segment the object. When the background is a white board or complex with higher or similar modulation level, it is difficult to segment the background from the object. In this situation, there will be three classes in the modulation map, and two thresholds are needed to separate the object from the background and shadow, as shown in Fig. 1. The method in [8] cannot deal well with this situation.
In this paper, we apply the multi-thresholding technique on modulation histogram and propose a preprocessing method to detect the valid points of the object by firstly segmenting the shadow using one threshold from the modulation histogram. Secondly, we project one more picture onto the object and reference plane and calculate the intensity difference of the captured images, and the histogram of the difference map is analyzed for the background detection. We call this one more picture the coding map.
The rest of this paper is organized as follows: We introduce the related principles and existing methods in Related work. In "Methods" section, we introduce the details of how to implement our proposed object segmentation technique. In the experiments and results part, we present and compare some segmentation results using our method and the expanded conventional method. The 3D shape reconstruction result is also presented in this section. In the end, we make a summary in "Conclusion". Comparison between dark background and white background. a One of captured fringes on the object with dark background, b modulation distribution of the captured fringes, c modulation histogram, d one of the captured fringes on the object with a white background, e modulation distribution of the captured fringes, f modulation histogram. In f, the lower threshold t 1 is for shadow and interior invalid points detection, and the upper threshold t 2 is for background detection

N-step phase shifting and modulation
Phase-shifting algorithms are widely utilized in the stationary object measurement due to their high accuracy and flexibility [11]. They carry out point-by-point measurement and calculate wrapped phase value from −π to π. For the N-step phase-shifting method, sinusoidal fringes with the following intensity modulation are often used [4], where n is the phase-shifting number and N is the total phase-shifting steps. I n is the intensity map of the nth sinusoidal fringes and I a and I m are the average intensity and modulation intensity, respectively. The wrapped phase φ w can be calculated as [6], The modulation M is defined as, It shows how much useful information is contained in each pixel. It is usually selected as the reliability map to guide the phase unwrapping and object segmentation [12]. If the proper threshold t is found, object can be identified from the background, shadow and the less informative pixels. However, manually adjusting the modulation threshold is very tedious and unstable, since the modulation varies according to measuring conditions, such as the incoherent light, the reflection of object and background, and the occlusion caused by object step height.

Existing methods of threshold selection
Otsu's method is commonly utilized for quick segment of the object and background based on image intensity. For a given image, if we distribute the gray levels into L bins ranging from 1 to L, k i represent the total number of pixels with gray-level i and K is the total pixels of the given image, K = k 1 + k 2 + · · · + k L . The occurrence probability of gray-level i is calculated as, When a single value threshold is applied, the pixels of the given image are to be divided into two classes (typically the object and background with shadow): class C 0 includes the pixels with levels k 1 , k 2 , . . . , k t , and class C 1 includes the pixels with levels k t+1 , k t+2 , . . . , k L , where k t is the threshold to be determined. The occurrence probability of each class can be calculated as, and the class mean levels are, where ω(t) and μ(t) are the zeroth-order and the firstorder cumulative moments of the histogram up to tth level, respectively. The total average gray level of the whole image is calculated as,

For any selection of t, it is easily verified that
According to the discriminant criterion analysis [9], Otsu showed that the optimal threshold t * can be calculated by maximizing the between-class variance, where the between-class variance σ 2 B is defined as, The optimal threshold t * is often calculated by an equivalent, but simpler equation [13], Otsu's method works well on the histogram of bimodal distribution, but not robust for histograms of unimodal or close to unimodal [10]. Ng [10] developed a valley emphasis method to improve Otsu's method.
By adding a weighting factor, then the threshold is calculated by considering two elements, the small occurrence and the big between-class variance. The threshold of Ng's method is calculated as, The above two methods for automatic threshold selection are intended for image segment based on gray-level histogram. The literature [8] utilizes them in modulation histogram for object segmentation. However, in their work, the background is dark, so invalid points in shadow and background are with low modulation level, and the object is with higher modulation level; only one threshold is enough to segment the object. As shown in Fig. 1, Fig. 1a shows a captured fringe on the object with dark background, Fig. 1b shows the modulation map of the captured fringes, and Fig. 1c shows the histogram of the modulation map. The modulation histogram is within two classes, and it is easy to find the threshold t 1 , to segment the valid points and invalid points.
In practical, the modulation histogram is not necessarily in two classes, such as when a white board is used as the background for system calibration, as shown in Fig. 1d. Figure 1e shows the modulation map of Fig. 1d, and Fig. 1f shows the histogram of the modulation map. As can be seen that when the background is a white board, the modulation level of the background will be high, and the modulation histogram in Fig. 1f is to be classified to three categories. The background is with middle to high modulation, the object is with medium modulation, and the shadow is with low modulation level. Two thresholds need to be calculated for shadow and the background segmentation separately. For this situation, the conventional method cannot be utilized directly.

Methods
To segment the object from white background, or complex background, we firstly applied the expanded Ng's method for multi-threshold calculation in modulation histogram. Then, we proposed our method for shadow and background detection. Figure 2 shows the flowchart of our method. The first threshold calculated from modulation histogram is utilized for shadow segmentation. For the background segmentation, we project one coding image onto the object and calculate the intensity difference between the object and the background. The threshold in intensity histogram is used for background segmentation. Details on how to segment the shadow and background are introduced as follows.

Expanded thresholding method
The literature [8] has improved and applied Ng's method for single thresholding in the fringe modulation histogram for object detection in digital fringe projection technique, while it only discussed the situation of a dark background, in which only one threshold is needed for object segmentation. For DFP system with a white or complex background, we apply the multi-thresholding Ng's method on the modulation. The expanded Ng's method can be described by [9], Utilizing this equation, two thresholds t 1 and t 2 in Fig. 1f can be calculated. Pixels with modulation level smaller Fig. 2 Flowchart of object valid points detection than t 1 are regarded as the shadow, pixels with modulation level larger than t 2 are regarded as background, and the object pixels are with medium modulation level. However, the multi-threshold calculation is less credible [9]. What's worse, when the background is complex, with modulation levels distributed for a large range, it is difficult to segment the background by just modulation. In our method, only t 1 is utilized for shadow detection, and the background is segmented from image intensity. Figure 3 shows the preliminary detection results, and black pixels are shadow and invalid points.

Intensity-based background segmentation
For background segment, we project an extra coding image with intensity of Eq. (17) on the object and background and analyze the intensity of their difference to calculate a reliable t in .
Here 255 is the total gray-level range, and N is the column of the projected image. The coding image for projection is shown in Fig. 4. The captured coding image on the reference plane I flat is shown in Fig. 5a, and the captured coding image on the object I obj is shown in Fig. 5b. The intensity difference map I diff shown in Fig. 5c is calculated by subtracting I flat from I obj .
Here (x, y) is omitted for simplicity.
Since the extra projected image contains a lot of useful information for background detection, we call it the coding map. The histogram of difference coding map I diff is shown in Fig. 6a. Utilizing the single threshold criteria in [10], we can calculate a reliable intensity threshold I in for segmenting the background. The 150th row cross-section intensity of Fig. 5a-c is shown in Fig. 6b.
So with the multi-thresholding Ng's method utilized on modulation histogram, the object valid points matrix V valid is computed as, where B is a matrix with the same size as M, calculated , is the matrix of modulation map and t 1 and t 2 are the first and second threshold of modulation histogram calculated by (16). ° represents the Hadamard product of two matrices, and ¬ means negative. Multi-threshold calculation is less credible [9], and the background may be complex. We analyze intensity difference of the coding map to find t in for background segmentation, and the lower threshold t 1 from modulation is still used for shadow detection. The proposed object valid points matrix V pro is calculated as, where I diff is the intensity difference map calculated from Eq. (18) and t in is the intensity threshold.

Experiments and results
Experiments are carried out to test the proposed shadow and background removal technique. A DFP 3D shape measurement system in Fig. 7 with defocused projector projecting binary fringes of width T = 30 is employed  The intensity distribution of the coding image: it would be projected on the object and reference plane, and the difference of the captured images are used for calculating the threshold for background detection to measure the 3D objects. Utilizing defocused binary fringes can avoid nonlinear gamma correction [14]. The projected fringes are deformed by the object and captured by a camera. Phase of the object surface is retrieved by phase-shifting technique, and height information is calculated after system calibration [15]. The hardware in the study includes a DLP projector of model AAXA P4-X with native resolution of 480 × 854 pixels and a CCD camera of Point Gray FL3-U3-13S2M-CS with resolution of 1328 × 1048 pixels. The camera is attached with a 6-mm focal-length lens of model Kowa LM6JC. The projection distance is about 40 cm.

Shadow and background segmentation
In this experiment, two different objects are tested and segmented, and the results are shown in Fig. 8 for the first object and Fig. 9 for the second object. The calculated thresholds are shown in Table 1. Three different defocusing levels of the projector are utilized, to produce different fringe contrasts and modulation levels. Figure 8a shows the modulation histogram of the captured fringe patterns, and Fig. 8b shows the histogram of intensity difference for the captured coding image. Figure 8c shows the object segmentation by single threshold, as we can see from this picture, only one threshold is not enough to segment the whole object when the background is with high modulation level. It only segments the shadow from the object. Figure 8d shows the detected object by modulation thresholds t 1 and t 2 , as we  Fig. 6b   Fig. 6 a The histogram of Fig. 5c, according to which the threshold t in is calculated. Pixels with intensity difference larger than the threshold are regarded as the background, and the pixels with intensity difference smaller than the threshold are the object and shadow. b The 150th row crosssection intensity of Fig. 5a-c can see, it can segment the shadow and background from the object, but part of the background is detected as the valid points of the object. There are two reasons: First, multi-threshold calculation is not always credible [9], and second, when the background is complicated with modulation levels distributed in both the second cluster and the third cluster, background segmentation based on pure modulation is prone to error. Figure 8e shows the detected object by our proposed method, the background is segmented based on the intensity difference histogram of the coding map shown in Fig. 8b, and threshold t in is utilized. We may notice that the detected object is more accurate than Fig. 8c. The similar trends are shown in Fig. 8f-j for slightly defocused projector and Fig. 8k-o for strongly defocused projector. They provide different fringe contrasts and modulation levels. We may see that when the projector defocusing level increases, the modulation thresholds t 1 and t 2 become smaller, because the defocusing will depress the fringe modulation level in general. The same experiments are also done on the second object, and similar results are shown in Fig. 9. To demonstrate that our proposed method can work with a more complex background, we put a small statue near  the measuring object to make the background more complex. Results are shown in Fig. 10. Figure 10a shows the modulation histogram of the captured fringes, Fig. 10b shows the histogram of the intensity difference for the captured coding map, and Fig. 10c shows the object with a small statue beside it. Object segmented by Ng's method based on modulation is shown in Fig. 10d, and by our proposed method, it is shown in Fig. 10e. We may see that our proposed method can accurately segment the object from background, while the modulation-based  method cannot segment the object from complex background. Our proposed method can segment valid points of the object more accurately than that of pure modulation, in most practical conditions.

3D reconstruction
After we retrieved the phase map of the object, the height information can be calculated by system calibration [15]. One commonly utilized method calibrates the camera and the projector separately to find the system parameters [16]. This kind of method is easy to understand, because each system parameter has its geometric meaning, but is also time-consuming, and error prone [17]. Because the projector is regarded as an inversed camera, its calibration accuracy depends on the camera calibration process. In this work, we apply the calibration framework presented in [15] to calculate the height information of the object. For a general DFP system with arbitrary arrangements, the governing equation of the 3D height is computed as [18,19], where z is the height at pixel (i, j) and φ is the phase value of the projection fringe at that pixel. c 1 -c 9 and d 0 -d 9 are constants related to system parameters. To determine the 19 coefficients, we need to know some sample points height information on the calibration board, their corresponding phase φ and pixel position (i, j) and use leastsquares algorithm to find the coefficients.
In our experiment, a 2D checkerboard with 12 × 16 black and white squares is utilized as the calibration object. The calibration includes obtaining the 3D coordinates and phase value of all calibration points on the checkerboard, at ten different positions. Phase-shifted sinusoidal fringes and an extra white image are projected on to the calibration board and captured by the camera. The camera intrinsic and extrinsic parameters are calibrated with the captured clear checkerboard. We define the points in the world and camera coordinate system as x w , y w , z w T and x c , y c , z c T , respectively. Generally, z w is set to zero, so the relationship between the world and camera coordinate systems is expressed by, here R and T represent the rotation and translation elements of the camera extrinsic parameters. Using Eq. (22), we can find all the calibration points in the camera coordinate system. Set the first calibration board position as the reference plane and its coordinate system as the world coordinate system. The literature [15] computes the reference plane equation in camera coordinate system and calculates the distance of each calibration point to this plane as the points' height. In our experiments, all the calibration points are transformed to the world coordinate system according to their respective transformation matrix; then, Zw is the point's height. ; then, Levenberg-Marquardt algorithm is utilized to verify the results. The reconstructed 3D object is shown in Fig. 11. The object in Fig. 11a is preprocessed by object segmentation based on modulation histogram, and that of Fig. 11b is preprocessed by our proposed method with modulation and intensity histogram being analyzed. As we can see, the modulation-based segmentation can remove the shadow correctly, so as our proposed method. However, in Fig. 11a, part of the measurement platform is segmented as part of the object, which should be removed as background, while our proposed method can accurately remove the shadow and complex background from the object points.

Conclusion
In this paper, we proposed a novel preprocessing method for object segmentation in DFP 3D shape measurement. We firstly applied the multi-threshold Ng's method on modulation histogram and then proposed our method for shadow and background detection based on modulation and intensity histogram. Experiments verified that our proposed method can improve the 3D shape measurement with white and complex background.