Haarcascades to filter out non-Frontally Aligned Faces

Haarcascades to filter out non-Frontally Aligned Faces

I'm sure many of you have used OpenCV for a variety of work. Most of the time, in the case of face detection, we want to know if the classifier has a detected a face. Here, I actually want to know the reverse - where has it NOT detected a face?

There are some images in my directory that are not frontally aligned. To prepare it for a GANs project I'm working on, what's the easiest way to detect and then remove them from my directory?

Here, some of the faces are frontally aligned, even if slightly off-angle:

In other cases, we can clearly see it is not:

In the below, (x, y, w, h) are the dimensions of the bounding box. What it returns, if a face is detected is a np.ndarray. numpy.ndarray() is a class but  numpy.array() is a function used to create ndarray.

    arr = np.array([1,2,3])
    print(type(arr))
    print(arr)
    
    >><class 'numpy.ndarray'>
        [1 2 3]

If a bounding box is not returned, meaning that a face WAS NOT detected, then faces will return at tuple (an immutable sequence of Python objects), not an ndarray. But this is actually an empty tuple.

    face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades+ 
     "haarcascade_frontalface_default.xml")
    
    
    def convertToRGB(image):
        return cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    #def is from another blog post I found^^
    
    UR_PATH = 'your/path/here'
    dirs = os.listdir(UR_PATH)
    files_not_FA = []

    for item in dirs: 
        counter = 1
        fullpath = os.path.join(UR_PATH, item)
        f, e = os.path.splitext(fullpath) #file and extension
        #base=os.path.basename(f)
        #print(os.path.splitext(base)[0])

        #read the image from the fullpath and convert it
        image = cv2.imread(fullpath) 
        frame_gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        frame_rgb = cv2.cvtColor(frame_gray, cv2.COLOR_GRAY2RGB)

        #plt.imshow(frame_rgb)
        #plt.title('my picture')
        #plt.show()

        faces = face_cascade.detectMultiScale(frame_rgb, 1.1, 8, 
        minSize=(30, 30))

        if type(faces) == np.ndarray: 
            for (x,y,w,h) in faces:
                #print("yes")
                #print(x, y, w, h)
                cv2.rectangle(frame_rgb,(x,y),(x+w,y+h),(0,255,0),2)
                #plt.imshow(convertToRGB(frame_rgb))
                files_not_FA.append(1)
        elif type(faces) == tuple: #if no face, it returns a tuple 
            #print("Not frontally aligned!")
            files_not_FA.append(fullpath) 

    print(files_not_FA)

793 images are not frontally aligned. Here's a sample of some output:

[1, 1, '081_-405.bmp', '341_-1526.bmp', '110_-562.bmp', 1, 1, '341_-2384.bmp', '341_-2400.bmp', 1, '347_-317.bmp', 1, 1, 1, 1, 1, '341_-1182.bmp', ...]

At this point, we have a list of all the files that need to be removed. So, simply iterate through the directory again, and if the filepath that indicate the image WAS NOT frontally aligned matches what's the file in the directory, then delete it using os.remove(i) where i is your filepath.

UR_PATH = 'your/path/here'
dirs = os.listdir(UR_PATH)

counter = 0
for item in dirs: 
    fullpath = os.path.join(UR_PATH, item)
    print(fullpath)

    for i in files_not_FA: 
        if(fullpath == i): 
            counter +=1
            os.remove(i)
print(counter)

Now we have several images in our directory (sample below is from several thousand) that are frontally-aligned.