Skip to content

Limit md5 import to allow for compatibility with fips environments#1288

Open
lratc wants to merge 3 commits into
apache:trunkfrom
lratc:fips_compatibility_mark_non_security_md5
Open

Limit md5 import to allow for compatibility with fips environments#1288
lratc wants to merge 3 commits into
apache:trunkfrom
lratc:fips_compatibility_mark_non_security_md5

Conversation

@lratc
Copy link
Copy Markdown

@lratc lratc commented Apr 23, 2026

In a FIPS-140 environment, the cassandra-python-driver may error as md5 is not compiled into the hashlib module.

ModuleNotFoundError: No module named 'md5'

As md5 is not available, the MD5Token class may not be used but the overall module is still usable.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Cassandra driver’s MD5-based token hashing to explicitly mark MD5 usage as non-security-related, improving compatibility with FIPS-140 environments where MD5 may be disallowed for security purposes.

Changes:

  • Replace from hashlib import md5 with import hashlib.
  • Update MD5Token.hash_fn to call hashlib.md5(..., usedforsecurity=False).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cassandra/metadata.py Outdated
Comment thread cassandra/metadata.py Outdated
if isinstance(key, str):
key = key.encode('UTF-8')
return abs(varint_unpack(md5(key).digest()))
return abs(varint_unpack(hashlib.md5(key,usedforsecurity=False).digest()))
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hashlib.md5(..., usedforsecurity=False) is not supported by all Python implementations (e.g., PyPy may raise TypeError: md5() takes no keyword arguments), which would break token hashing at runtime. Consider feature-detecting support once (e.g., try calling with usedforsecurity=False and fall back to a call without the kwarg) so the driver remains compatible while still fixing FIPS on CPython/OpenSSL builds that honor the flag.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a bit concerned about this as well. The CPython issue in question (as well as the Python docs for the minimum support Python runtime) suggest that support for the usedforsecurity arg went in with Python 3.9. Assuming that's true we should be okay with all officially supported versions (since we only go back to 3.10)... but we would be introducing a breaking change for earlier Python versions. That's not critical since we don't claim support... but we also generally are pretty cautious about introducing changes known to break earlier versions.

Something like executing this functionality in a try block (and re-executing if we get the expected exception) might address this but... honestly, I'm a bit more worried about something else.

Those same 3.10 docs include the following:

md5() is normally available as well, though it may be missing or blocked if you are using a rare “FIPS compliant” build of Python.

That... isn't great. This potentially introduces a whole new class of errors; now we're not talking about the args md5() supports... instead we're talking about whether md5() exists at all.

I did some random Google searching to try to identify cases in which this might occur, especially since all of this functionality comes from OpenSSL (or an equivalent) in modern Python. Near as I can tell it requires some kind of custom build aimed at limiting exposed hash algorithms in order to meet the FIPS standard but nothing I found was super-clear on that point.

But since the whole point of this PR is better support for FIPS-compliant environments I'm starting to wonder if we shouldn't also handle the case in which md5() isn't present.

@bschoening any thoughts?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@absurdfarce could move the import into Class MD5Token and only import if it's instantiated.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You raise some very good points - on further inspection, the usedforsecurity=False isn't resolving the issue I was experiencing - not trying to import md5 at that point is what has resolved the issue. Moving the import into the class would be a much better solution.

Comment thread cassandra/metadata.py Outdated
from collections.abc import Mapping
from functools import total_ordering
from hashlib import md5
import hashlib
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would prefer we keep the narrow import scope on a security librarry.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I have changed the approach to use a similar pattern to the one used for murmur3 (I inverted the if logic to avoid nesting the if statements).

Comment thread cassandra/metadata.py Outdated
if isinstance(key, str):
key = key.encode('UTF-8')
return abs(varint_unpack(md5(key).digest()))
return abs(varint_unpack(hashlib.md5(key,usedforsecurity=False).digest()))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@absurdfarce could move the import into Class MD5Token and only import if it's instantiated.

lratc and others added 2 commits May 1, 2026 10:33
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@lratc lratc changed the title Mark non-security md5 usage to allow for compatibility with fips environments Limit md5 import to allow for compatibility with fips environments May 1, 2026
@lratc lratc requested a review from bschoening May 1, 2026 12:40
@bschoening
Copy link
Copy Markdown
Contributor

bschoening commented May 1, 2026

@lratc @absurdfarce what about this approach:

class MD5Token(HashToken):
    """
    A token for ``RandomPartitioner``.
    """

    @classmethod
    def hash_fn(cls, key):
        from hashlib import md5
        if isinstance(key, str):
            key = key.encode('UTF-8')
        return abs(varint_unpack(md5(key, usedforsecurity=False).digest()))

I don't think catching the exception is necessarily helpful.

Note, this suggest reworking import murmur3 along the same lines.

@absurdfarce
Copy link
Copy Markdown
Contributor

I feel like we need to have a larger conversation here.

As I understand things Apache Cassandra itself is not compliant with FIPS 140. It sounds like there is a variant out there which claims to be but I haven't looked into it at all. The Python client also doesn't claim FIPS 140 compliance. I mention all of this mainly to say that I feel like any attempt at supporting FIPS environments should be treated as a first-class effort rather than as a single PR.

It's also worth pointing out that the existing NoMurmur3 pattern may not be a good model to follow. The current driver code includes a Python impl of murmur3 which is used in case the C version isn't available for some reason (see this code for more on that). As a result cassandra.murmur3 will always be available which means this check is completely unnecessary.

More generally I question whether throwing a NoMurmur3/NoMD5 exception is the right approach. While those errors are the root cause of the problem they happen pretty late in the process. It seems like a more informative exception for the user would be to throw an exception from the functionality that uses those digests saying that MD5 isn't available so no information can be computed. MD5 is used in Metadata.get_replicas() so it seems like the exception should be triggered there... but that entails knowing that MD5/Murmur3/other hash algorithm isn't available at that point in the process. Which in turn gets back to my original point about treating FIPS compliance as a top-level feature to be added rather than something we try to patch via a single PR.

Thoughts?

@bschoening
Copy link
Copy Markdown
Contributor

bschoening commented May 3, 2026

I agree that full FIPS 140 compliance would be a broader conversation if we had reason to tackle that. However, I believe we can make meaningful incremental improvements here without overcomplicating the driver's architecture.

Specifically, adopting usedforsecurity=False is simply good practice for modern Python. Since this was introduced in Python 3.9, it serves as explicit documentation that our use of MD5 isn't for cryptographic security. This is particularly helpful for users running automated compliance scanners; it allows those tools to recognize the usage as a non-issue rather than flagging it as a vulnerability.

Regarding the placement of the import: since the RandomPartitioner is rarely used in modern deployments, we could further "ringfence" the dependency by moving the MD5 import directly into the MD5Token class. This keeps the impact localized and ensures that the majority of users—who aren't using this partitioner—never even touch that logic and would never see an exception.

We could remove the term "FIPS" from the Jira and just focus on conforming with Python 3.9+ security best practices.

@absurdfarce
Copy link
Copy Markdown
Contributor

I mean, we can... kind of do that, you're right @bschoening. But it's worth mentioning that the usedforsecurity flag was explicitly added to handle the FIPS case (there's quite a bit on that in the CPython ticket referenced above). And I'm not aware of any other use case which causes MD5 to outright not be available in hashlib; this only becomes an issue when the backing OpenSSL impl which handles hashlib ops now doesn't support MD5 and the only plausible case for that is a FIPS-aware OpenSSL build. So I agree that we can probably logically decouple usedforsecurity support from FIPS support... but that feels like a very thin slicing of functionality to me.

I guess I have less of a problem with ringfencing the MD5 impl into MD5Token... that seems like a pretty safe step to me. But this also gets to a problem I have with the original report (and maybe @lratc can shed some light here). Is the problem actually in the import of MD5 from hashlib or in the call to md5() later in the code? Without clarity on exactly that point it's hard for me to measure which solution makes the most sense here.

I've spent a fair portion of today trying to get a FIPS-aware environment up and running via Docker so that I can try to recreate what's going on and see where things actually fail. That's proven significantly harder than I expected (as this kind of thing always does) but I'm planning on resuming that effort more soon. Hopefully if I can get a locally reproducible case I can at least better understand the context of the problem.

@absurdfarce
Copy link
Copy Markdown
Contributor

Took quite a bit of doing but I finally got to a reproducible test case. I went down the path of trying to get a Docker image containing a FIPS-aware OpenSSL build but that kept falling apart for unrelated build reasons. Finally just settled on a local OpenSSL build with FIPS enabled and a local Python build referencing that OpenSSL build and after mucking about with some configs I got to something workable.

$ OPENSSLDIR=~/local/openssl-3.1.2-fips/ssl LD_LIBRARY_PATH=/home/me/local/openssl-3.1.2-fips/lib ~/local/python-3.12.11-fips/bin/python3
Python 3.12.11 (main, May 11 2026, 22:01:52) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from hashlib import md5
>>> foo = md5()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_hashlib.UnsupportedDigestmodError: [digital envelope routines] unsupported

So the import does appear to be fine... the problem comes about when we call md5() to get to a local hash object.

@absurdfarce
Copy link
Copy Markdown
Contributor

absurdfarce commented May 12, 2026

Also worth mentioning that the error shown above by my FIPS-based test is pretty different from what @lratc originally reported. The only way I've been able to get to a ModuleNotFoundError is by trying to import md5 directly:

$ OPENSSLDIR=~/local/openssl-3.1.2-fips/ssl LD_LIBRARY_PATH=/home/me/local/openssl-3.1.2-fips/lib ~/local/python-3.12.11-fips/bin/python3
Python 3.12.11 (main, May 11 2026, 22:01:52) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import md5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'md5'

There used to be an old "md5" module in Python 2.x but that was completely removed when the transition to 3.x happened. And unsurprisingly I can't find any usage of "import md5" in the current Python driver code base.

@lratc is it possible the error you cited came from some other part of your code base? Can you provide the Python driver version as well as the Python interpreter version you used to generate the error in your original report (i.e. the ModuleNotFoundError)?

@absurdfarce
Copy link
Copy Markdown
Contributor

absurdfarce commented May 12, 2026

Given the analysis above I'm not sure there's much benefit to hiding the md5 import in metadata.py behind any kind of logical gate. The evidence seems to suggest that import will succeed in all cases.

Huh, and this sure is interesting:

$ OPENSSLDIR=~/local/openssl-3.1.2-fips/ssl LD_LIBRARY_PATH=/home/me/local/openssl-3.1.2-fips/lib ~/local/python-3.12.11-fips/bin/python3
Python 3.12.11 (main, May 11 2026, 22:01:52) [GCC 14.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from hashlib import md5
>>> foo = md5(usedforsecurity=False)
>>> foo.update(b'abc')
>>> foo.digest()
b'\x90\x01P\x98<\xd2O\xb0\xd6\x96?}(\xe1\x7fr'
>>> foo = md5()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
_hashlib.UnsupportedDigestmodError: [digital envelope routines] unsupported

That seems to bolster your case @bschoening that just adding usedforsecurity=False goes a long way here.

Hmmmm, if that does indeed hold up then that might be the only change we need here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants