Why post this topic when the wiki clearly says create new Github Issue or post in the forum (not both)? A consolidated post of PDFExtractors available or in development would be more useful.
That link is not much use when the files need to be desensitised and they are full of   which can’t normally be seen in a text editor. I passed on the recommendation from the PDFBox developer to post-process extracted text to replace all   with " ".