📊 Project Overview

StructFormer is a Transformer-based model built to automate structured data adjustments from validation errors. It learns to generate SQL or CSV-based inserts, updates, or deletes based on input errors and lookup data.

🚀 Trained with SentencePiece tokenizer on domain-specific errors and adjustments
💡 Can be extended to any structured transformation task

🛠️ Technologies Used

  • Python 3.10
  • Keras 3 with PyTorch backend
  • SentencePiece Tokenizer
  • Transformer Encoder-Decoder
  • FastAPI (Planned)
  • LangChain (Planned for prompt-based data refinement)

🧭 System Architecture

System Architecture Diagram

✅ Implemented

  • Validation Error + Lookup ➝ SQL Adjustments
  • Sliding window + BOS/EOS prep for Transformer
  • Inference with custom greedy decoder

🔜 Planned

  • Online fine-tuning via FastAPI backend
  • LLM fallback support via HuggingFace pipeline
  • Enterprise dashboard for per-record diff

📌 Roadmap

  • ✅ Tokenizer training with SentencePiece
  • ✅ Transformer model with custom layers
  • ✅ Inference with greedy decoding
  • ⏳ Model hosting on HuggingFace
  • ⏳ API-first interface with FastAPI

📂 GitHub Repository

github.com/spsarolkar/StructFormer