Software

Open Source Software

The Center for Digital Finance & Technologies at Columbia Engineering champions the pursuit of open source and reproducible cutting edge research. Many of the center's innovative research projects have publicly available open source software repositories via Github. 

zkFuzz is a ZK circuit fuzzer designed to help you identify vulnerabilities in zero-knowledge proof circuits. It leverages fuzzing with program mutation to uncover counterexamples that reveal under-constrained or over-constrained behavior in your circuits

Smart contracts are software programs that enable diverse business activities on the blockchain. Recent research has identified new classes of "machine un-auditable" bugs that arise from source code not meeting underlying transaction contexts. Existing detection methods require human understanding of underlying transaction logic and manual reasoning across different sources of context (i.e., modalities), such as code and natural language specifying the expected transaction behavior. To automate the detection of "machine un-auditable" bugs, we present SmartInv, an accurate and fast smart contract invariant inference framework.

A Graph Neural Network (GNN)-based methodology to aggregate firm characteristics across a large supply chain network to explain cross-sectional expected returns. Each firm receives a pricing signal, nonlinearly constructed from the characteristics of neighboring firms within d-hops on the network. The GNN model leads to a portfolio sorted by ML-driven firm-level estimated returns that condition on both historical supply chain data and firm characteristics.

The Joint Diffusion Kalman Filter (JDKF) integrates diffusion maps, a nonlinear manifold learning technique, with Kalman filtering within a supervised learning framework to estimate high-dimensional nonlinear observation dynamics and perform response variable prediction while accounting for dynamic correlations, without imposing restrictive parametric assumptions.

An intelligent financial analysis system using logical transduction for time-series analysis of financial datasets. And a comprehensive financial analytics platform built with CrewAI, custom tools, and an isolated code execution environment, capable of performing deep research, producing comprehensive reports with visualizations, and backtesting complex strategies.

Financial reinforcement learning (FinRLĀ®) (Document website) is the first open-source framework for financial reinforcement learning. FinRL has evolved into an ecosystem. FinRL has three layers: market environments, agents, and applications. For a trading task (on the top), an agent (in the middle) interacts with a market environment (at the bottom), making sequential decisions.

1). Finance is highly dynamic. BloombergGPT trained an LLM using a mixture of finance data and general-purpose data, which took about 53 days, at a cost of around $3M). It is costly to retrain an LLM model like BloombergGPT every month or every week, thus lightweight adaptation is highly favorable. FinGPT can be fine-tuned swiftly to incorporate new data (the cost falls significantly, less than $300 per fine-tuning).

2). Democratizing Internet-scale financial data is critical, say allowing timely updates of the model (monthly or weekly updates) using an automatic data curation pipeline. BloombergGPT has privileged data access and APIs, while FinGPT presents a more accessible alternative. It prioritizes lightweight adaptation, leveraging the best available open-source LLMs.

3). The key technology is "RLHF (Reinforcement learning from human feedback)", which is missing in BloombergGPT. RLHF enables an LLM model to learn individual preferences (risk-aversion level, investing habits, personalized robo-advisor, etc.), which is the "secret" ingredient of ChatGPT and GPT4.