PCI DSS requires that the scope of assessment must be checked to make sure the scope is accurate. This check must also be carried out every year. Even if the documented scope means that no cardholder data is stored, there still may be some cardholder details that have been inadvertently left in documents. These credit card details may either be left over from activities prior to working towards PCI DSS compliance, or it may be that company credit card procedures have been breached. There are some good tools out there that search for cardholder data on PCs and networks, however I was looking for something that could easily run standalone on individual PCs and search a hard drive relatively quickly. Dionach developed PANhunt, which does exactly that. It is a Python script that is easily converted to a standalone executable, which can then be run off a USB stick. PANhunt uses simple regular expressions to look for Visa, MasterCard and American Express card numbers in document and email files such as Word documents, Excel spread sheets, TXT files, XML and PST files. PANhunt also searches ZIP files recursively. PANhunt will create a report listing masked PANs found. Some system files do generate false positives, but Windows system folders are excluded by default. The current release will not search Access databases, but will list where they are located. The scripts and instructions can be found at https://github.com/Dionach/PANhunt. Technically, searching across a C:\ drive for files with a certain extension is straightforward in Python. PANhunt treats documents as text files, or in the case of DOCX and XLSX as ZIP files. Text files can be easily searched using regular expressions to match the different credit card types. The PST format was more challenging. Microsoft has published the PST file format as an open specification here: https://msdn.microsoft.com/en-us/library/ff385210. There are some code libraries out there in different languages such as Java, and C#, however they didn’t provide everything needed, weren’t in Python, or just didn’t work. So, I developed pst.py to parse PST files and so provide access to emails and attachments contained in them. The script supports both ANSI or Unicode PST formats. A few interesting things I learnt about PST files:
- The published specification is wrong in at least two places about the ANSI format.
- Setting a PST password does not encrypt anything, it just sets a password property in the PST file. This can be ignored.
- Recent Microsoft Outlook client versions seem to encode PST data sections by default, so you can’t easily see email text in the raw PST file. The decoding algorithm looks complicated in the specification but reduces to quite a simple substitution algorithm.
- Hopefully PANhunt will be useful for people who want to easily check if there are any credit card numbers stored on local PCs.