Transformer(*) is all you need for Credit Risk
We are a project trying to apply Transformer to process Credit Risk Data in a more scalable and efficient manner. We aim to exceed traditional methods like XGBoost, espeacially on alternative data.
Videos
Description
Context
In the field of credit risk, the use of alternative data such as transaction patterns, geolocation information, mobile usage, and behavioral signals has become increasingly important. However, these data sources are often high-dimensional, unstructured, and noisy, which makes them difficult to process efficiently with traditional modeling techniques like logistic regression or gradient boosting. From industry experience, I know that the actual running and training of these models typically takes only a fraction of the time compared to the far more demanding tasks of data preparation and feature engineering. These steps not only consume significant resources but also reduce data dimensionality and depth. Traditional models, designed primarily for structured tabular inputs, are limited in their ability to capture temporal dependencies, latent interactions, and evolving risk profiles within alternative data streams.
To address this challenge, we propose the application of Transformer-based methods. Originally developed for natural language processing, Transformers are well suited to handle sequential information and capture complex feature interactions at scale. Their self-attention mechanism allows the model to selectively prioritize the most relevant signals across both time and feature dimensions, making them particularly effective for heterogeneous and dynamic datasets.
Goal-Result
By leveraging this architecture, our objective is manyfold, yet solvable with the same set of solutions.
First, to develop a scalable and adaptive credit risk modeling system capable of continuously incorporating new forms of alternative data without relying on extensive manual feature engineering.
Second, to enable truly dynamic credit pricing, where credit limits, interest rates, and repayment terms can be adjusted in near real time as a borrower’s risk profile evolves. This approach represents a shift away from static scorecards and rigid risk bands toward a more flexible, data-rich, and personalized framework for credit management, which allows for better segmentation and RL-methods down the line (The Revolving Credit Problem).
Third, to preserve user privacy and ensure auditability through blockchain integration.
By committing only encrypted representations or cryptographic hashes of customer vectors and alternative data features to a distributed ledger, the system prevents unauthorized access to sensitive information while still providing regulators and counterparties with tamper-proof proofs of data integrity and model usage. This design directly addresses risks highlighted by incidents such as the CIC data breach, shifting control over credit data back to borrowers while enabling transparent and verifiable credit risk management.


Progress During Hackathon
We have came up with an model architecture and hopes to deploy an MVP model at the Hackathon. However, this is a high potential product as we like to keep our thousands of little secrets.
Tech Stack
Fundraising Status
Unfunded