How to Convert Bank Statements to JSON [2025]​

How to Convert Bank Statements to JSON

Converting bank statements from PDF to JSON format might sound technical, but it’s easier than you think. Whether you’re a developer looking for code solutions or someone who just needs to get the job done without programming, this guide has you covered.

What is JSON and Why Do You Need It?

JSON (JavaScript Object Notation) is simply a way to organize data in a structured format that computers can easily read and process. Think of it like organizing your transactions into a digital filing cabinet where every piece of information has a clear label.

Why convert to JSON?

  • Import transactions into accounting software automatically
  • Analyze spending patterns with financial tools
  • Build custom reports and dashboards
  • Automate bookkeeping tasks
  • Integrate with business applications

How to Convert Bank Statements to JSON | For Non-Technical Users: Easy Solutions

Method 1: Free Online PDF to JSON Converter (Easiest Way)

This is the simplest method that requires zero coding knowledge. You just upload your file and download the converted result.

Using ilovepdf2.com (Free and Simple):

  1. Go to the converter
  2. Upload your bank statement
    • Click “Select PDF file” or drag and drop your PDF
    • The file uploads instantly
    • You can upload multiple statements at once
  3. Convert to JSON
    • Click the “Convert to JSON” button
    • Wait a few seconds for processing
    • The tool automatically extracts data from your PDF
  4. Download your JSON file
    • Click “Download JSON” when ready
    • Save the file to your computer
    • Open it with any text editor to view the data

Benefits of using this free tool:

  • No software installation needed
  • Works on any device (Windows, Mac, mobile)
  • Fast processing (usually under 30 seconds)
  • Handles multi-page statements
  • Secure connection (HTTPS)

Important tip: After downloading your JSON file, delete it from the website for privacy. Most converters delete files automatically after a few hours, but it’s better to remove them immediately.

Method 2: Other Trusted Online Converters

If you want alternatives or need additional features:

PDFTables.com

  • Good for complex table structures
  • Offers free trials
  • High accuracy for bank statements

Convertio.co

  • Supports 300+ formats
  • Batch conversion available
  • Clean, simple interface

Zamzar.com

  • One of the oldest converters
  • Reliable and safe
  • Email delivery option

How to use any online converter:

  1. Upload your PDF file
  2. Select “JSON” as output format
  3. Click “Convert”
  4. Download the result

Method 3: Excel as a Bridge (Two-Step Process)

This method gives you more control and lets you verify your data before converting.

Step 1: PDF to Excel

  1. Open your bank statement PDF
  2. Use one of these methods:
    • Adobe Acrobat: File → Export To → Spreadsheet → Microsoft Excel
    • Microsoft Word: Open PDF in Word, copy the table, paste into Excel
    • Google Docs: Upload PDF to Google Drive, open with Google Sheets
    • Online tool: Use ilovepdf2.com or any PDF to Excel converter
  3. Clean up the Excel file:
    • Remove headers, footers, and bank logos
    • Make sure you have columns for: Date, Description, Debit, Credit, Balance
    • Delete any empty rows
    • Format numbers correctly (remove currency symbols if needed)

Example of what your Excel should look like:

DateDescriptionDebitCreditBalance01/05/2025ATM Withdrawal200.000.004800.0001/07/2025Salary Credit0.005000.009800.0001/10/2025Grocery Store150.000.009650.00

Step 2: Excel to JSON

  1. Save your Excel file as CSV (File → Save As → CSV format)
  2. Go to an online “CSV to JSON converter” like:
    • ConvertCSV.com
    • CSVJSON.com
    • OnlineJSONTools.com
  3. Upload your CSV file
  4. Download the JSON output

Method 4: Use Bank APIs (If Available)

Some banks let you download transaction data directly in digital formats without dealing with PDFs at all.

How to check if your bank offers this:

  1. Log into your online banking
  2. Go to “Statements” or “Transaction History”
  3. Look for download options
  4. Check if they offer JSON, CSV, or Excel formats (not just PDF)
  5. If available, select your date range and download

Banks that often provide this: Chase, Bank of America, Wells Fargo, Capital One, HSBC (availability varies by account type and region)

Method 5: Accounting Software with Import Features

Many accounting tools can automatically convert bank statements.

Popular options:

QuickBooks

  • Supports bank statement import
  • Exports data in various formats
  • Automatic categorization

Xero

  • Can import PDFs
  • Converts to structured data
  • Bank feed connections

Wave (Free)

  • No cost for basic features
  • Bank statement import
  • Receipt scanning

FreshBooks

  • Statement parsing capabilities
  • Automatic expense tracking
  • Invoice integration

Steps:

  1. Sign up for the accounting software
  2. Use their “Import Bank Statement” feature
  3. Upload your PDF
  4. Let the software extract the data
  5. Export in JSON format (if supported) or use their API

How to Convert Bank Statements to JSON For Technical Users: Code Solutions

Method 1: Python with pdfplumber (Best for Most Cases)

This is the most reliable method for extracting data from PDF tables.

Installation:

bash

pip install pdfplumber pandas

Complete working code:

python

import pdfplumber
import json

def convert_bank_statement_to_json(pdf_path, output_json_path):
    transactions = []
    
    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            # Extract all tables from the page
            tables = page.extract_tables()
            
            for table in tables:
                # Skip the header row
                for row in table[1:]:
                    if row and len(row) >= 4:  # Make sure row has data
                        try:
                            # Clean and structure the data
                            transaction = {
                                "date": row[0].strip() if row[0] else "",
                                "description": row[1].strip() if row[1] else "",
                                "debit": float(row[2].replace(',', '').replace('$', '')) if row[2] and row[2].strip() else 0.0,
                                "credit": float(row[3].replace(',', '').replace('$', '')) if row[3] and row[3].strip() else 0.0,
                                "balance": float(row[4].replace(',', '').replace('$', '')) if len(row) > 4 and row[4] else 0.0
                            }
                            
                            # Only add if it looks like a valid transaction
                            if transaction["date"]:
                                transactions.append(transaction)
                                
                        except (ValueError, AttributeError, IndexError):
                            # Skip rows that don't match the expected format
                            continue
    
    # Create the final JSON structure
    bank_statement = {
        "account_info": {
            "statement_period": "Extract from PDF or set manually",
            "account_number": "Extract from PDF or set manually"
        },
        "transactions": transactions,
        "summary": {
            "total_debits": round(sum(t['debit'] for t in transactions), 2),
            "total_credits": round(sum(t['credit'] for t in transactions), 2),
            "transaction_count": len(transactions)
        }
    }
    
    # Save to JSON file with pretty formatting
    with open(output_json_path, 'w', encoding='utf-8') as json_file:
        json.dump(bank_statement, json_file, indent=4, ensure_ascii=False)
    
    print(f"✓ Successfully converted {len(transactions)} transactions")
    print(f"✓ Output saved to: {output_json_path}")
    
    return bank_statement

# How to use:
convert_bank_statement_to_json('my_statement.pdf', 'transactions.json')

What you’ll get:

json

{
    "account_info": {
        "statement_period": "January 2025",
        "account_number": "****1234"
    },
    "transactions": [
        {
            "date": "01/05/2025",
            "description": "ATM Withdrawal - Main Street",
            "debit": 200.00,
            "credit": 0.0,
            "balance": 4800.00
        },
        {
            "date": "01/07/2025",
            "description": "Direct Deposit - Salary",
            "debit": 0.0,
            "credit": 5000.00,
            "balance": 9800.00
        }
    ],
    "summary": {
        "total_debits": 200.00,
        "total_credits": 5000.00,
        "transaction_count": 2
    }
}

Method 2: Python with Camelot (For Complex Tables)

Use this when pdfplumber struggles with your PDF format.

Installation:

bash

pip install camelot-py[cv] pandas

Code:

python

import camelot
import json

def camelot_extract(pdf_path, output_json_path):
    # Extract all tables from all pages
    tables = camelot.read_pdf(pdf_path, pages='all', flavor='stream')
    
    print(f"Found {len(tables)} tables in the PDF")
    
    all_transactions = []
    
    for table_num, table in enumerate(tables):
        print(f"Processing table {table_num + 1}...")
        
        df = table.df
        
        # Process each row (skip header)
        for index, row in df.iterrows():
            if index == 0:  # Skip header row
                continue
            
            try:
                transaction = {
                    "date": str(row[0]).strip(),
                    "description": str(row[1]).strip(),
                    "debit": float(str(row[2]).replace(',', '').replace('$', '')) if row[2] else 0.0,
                    "credit": float(str(row[3]).replace(',', '').replace('$', '')) if row[3] else 0.0,
                    "balance": float(str(row[4]).replace(',', '').replace('$', '')) if len(row) > 4 else 0.0
                }
                
                if transaction["date"] and transaction["date"] != "":
                    all_transactions.append(transaction)
                    
            except (ValueError, IndexError):
                continue
    
    # Create final JSON
    result = {
        "transactions": all_transactions,
        "summary": {
            "total_transactions": len(all_transactions),
            "total_debits": round(sum(t['debit'] for t in all_transactions), 2),
            "total_credits": round(sum(t['credit'] for t in all_transactions), 2)
        }
    }
    
    with open(output_json_path, 'w', encoding='utf-8') as f:
        json.dump(result, f, indent=4)
    
    print(f"✓ Extracted {len(all_transactions)} transactions")

# Usage:
camelot_extract('statement.pdf', 'output.json')

Method 3: Python with Tabula (Fast and Simple)

Good balance between speed and accuracy.

Installation:

bash

pip install tabula-py pandas

Code:

python

import tabula
import json
import pandas as pd

def tabula_to_json(pdf_path, output_json_path):
    # Read all tables from PDF
    dfs = tabula.read_pdf(pdf_path, pages='all', multiple_tables=True)
    
    all_transactions = []
    
    for table_index, df in enumerate(dfs):
        print(f"Processing table {table_index + 1} with {len(df)} rows")
        
        # Make sure we have at least 4 columns
        if len(df.columns) >= 4:
            # Rename columns for clarity
            df.columns = ['Date', 'Description', 'Debit', 'Credit', 'Balance'][:len(df.columns)]
            
            for _, row in df.iterrows():
                try:
                    # Clean and parse the data
                    debit_value = str(row['Debit']).replace(',', '').replace('$', '').strip()
                    credit_value = str(row['Credit']).replace(',', '').replace('$', '').strip()
                    balance_value = str(row['Balance']).replace(',', '').replace('$', '').strip() if 'Balance' in row else "0"
                    
                    transaction = {
                        "date": str(row['Date']).strip(),
                        "description": str(row['Description']).strip(),
                        "debit": float(debit_value) if debit_value and debit_value != 'nan' else 0.0,
                        "credit": float(credit_value) if credit_value and credit_value != 'nan' else 0.0,
                        "balance": float(balance_value) if balance_value and balance_value != 'nan' else 0.0
                    }
                    
                    # Only add valid transactions
                    if transaction["date"] and transaction["date"] != 'nan':
                        all_transactions.append(transaction)
                        
                except (ValueError, KeyError):
                    continue
    
    # Save as JSON
    result = {"transactions": all_transactions, "count": len(all_transactions)}
    
    with open(output_json_path, 'w', encoding='utf-8') as f:
        json.dump(result, f, indent=4)
    
    print(f"✓ Conversion complete: {len(all_transactions)} transactions saved")

# Usage:
tabula_to_json('bank_statement.pdf', 'output.json')

Method 4: Node.js Solution

For JavaScript developers or Node.js environments.

Installation:

bash

npm install pdf-parse

Code:

javascript

const fs = require('fs');
const pdfParse = require('pdf-parse');

async function convertBankStatementToJSON(pdfPath, outputPath) {
    try {
        // Read the PDF file
        const dataBuffer = fs.readFileSync(pdfPath);
        
        // Parse PDF content
        const data = await pdfParse(dataBuffer);
        const text = data.text;
        
        // Pattern to match transaction lines (adjust based on your bank's format)
        // This pattern looks for: Date Description Debit Credit Balance
        const transactionPattern = /(\d{2}\/\d{2}\/\d{4})\s+(.+?)\s+([\d,]+\.\d{2})\s+([\d,]+\.\d{2})\s+([\d,]+\.\d{2})/g;
        
        const transactions = [];
        let match;
        
        // Extract all matching transactions
        while ((match = transactionPattern.exec(text)) !== null) {
            transactions.push({
                date: match[1],
                description: match[2].trim(),
                debit: parseFloat(match[3].replace(',', '')),
                credit: parseFloat(match[4].replace(',', '')),
                balance: parseFloat(match[5].replace(',', ''))
            });
        }
        
        // Create final JSON structure
        const result = {
            transactions: transactions,
            summary: {
                total_count: transactions.length,
                total_debits: transactions.reduce((sum, t) => sum + t.debit, 0).toFixed(2),
                total_credits: transactions.reduce((sum, t) => sum + t.credit, 0).toFixed(2)
            }
        };
        
        // Write to file
        fs.writeFileSync(outputPath, JSON.stringify(result, null, 4));
        console.log(`✓ Converted ${transactions.length} transactions to ${outputPath}`);
        
    } catch (error) {
        console.error('Error:', error.message);
    }
}

// Usage:
convertBankStatementToJSON('statement.pdf', 'output.json');

Method 5: For Scanned PDFs (OCR Required)

If your PDF is a scanned image, you need OCR (Optical Character Recognition).

Installation:

bash

pip install pytesseract pdf2image pillow
# Also install Tesseract OCR: https://github.com/tesseract-ocr/tesseract

Code:

python

from pdf2image import convert_from_path
import pytesseract
import json
import re

def ocr_pdf_to_json(pdf_path, output_json_path):
    print("Converting PDF pages to images...")
    
    # Convert PDF to images (one image per page)
    images = convert_from_path(pdf_path, dpi=300)  # Higher DPI = better quality
    
    all_text = ""
    
    print(f"Processing {len(images)} pages with OCR...")
    for i, image in enumerate(images):
        print(f"  Processing page {i + 1}...")
        
        # Perform OCR on each image
        text = pytesseract.image_to_string(image, config='--psm 6')
        all_text += text + "\n"
    
    print("Extracting transactions from text...")
    
    # Pattern to find transactions (adjust based on your statement format)
    pattern = r'(\d{2}/\d{2}/\d{4})\s+([A-Za-z0-9\s\-\.]+?)\s+([\d,]+\.?\d{0,2})\s+([\d,]+\.?\d{0,2})'
    matches = re.findall(pattern, all_text)
    
    transactions = []
    for match in matches:
        try:
            transactions.append({
                "date": match[0],
                "description": match[1].strip(),
                "debit": float(match[2].replace(',', '')) if match[2] else 0.0,
                "credit": float(match[3].replace(',', '')) if match[3] else 0.0
            })
        except ValueError:
            continue
    
    result = {"transactions": transactions, "count": len(transactions)}
    
    with open(output_json_path, 'w', encoding='utf-8') as f:
        json.dump(result, f, indent=4)
    
    print(f"✓ Extracted {len(transactions)} transactions")

# Usage:
ocr_pdf_to_json('scanned_statement.pdf', 'output.json')

Quick Comparison: Which Method Should You Use?

For non-technical users:

  • ilovepdf2.com – Fastest and easiest, no installation needed
  • Excel bridge – Best if you want to verify data manually
  • Bank API – Most accurate if your bank supports it
  • Accounting software – Best for ongoing bookkeeping needs

For developers:

  • pdfplumber – Best for most bank statements (recommended first choice)
  • Camelot – Use when pdfplumber doesn’t work well
  • Tabula – Good for simple, well-structured tables
  • Node.js – If you’re already working in JavaScript
  • OCR (Tesseract) – Only for scanned/image PDFs (less accurate)

Common Issues and Solutions

Problem: Extracted data is messy or incorrect

  • Try a different tool (switch from pdfplumber to Camelot or use ilovepdf2.com)
  • Check if your PDF is an image (use OCR method)
  • Manually verify column positions in your code

Problem: Missing transactions

  • Your PDF might have multiple tables per page
  • Try processing all pages instead of just page 1
  • Check if transactions are in a different format on some pages

Problem: Numbers have wrong decimal places

  • Add better number cleaning in your code
  • Check for currency symbols that need to be removed
  • Verify comma/period usage (some countries use different formats)

Problem: Can’t install Python libraries

  • Make sure Python is installed correctly (python.org)
  • Try using pip3 instead of pip
  • Use virtual environments to avoid conflicts
  • Consider using online tools instead

Problem: Online converter gives poor results

  • Your PDF might be too complex
  • Try the Excel bridge method for more control
  • Use Python methods for better customization

Validation Checklist

After conversion, always verify your JSON data:

  • ✓ Transaction count matches your PDF
  • ✓ Total debits and credits match statement totals
  • ✓ Dates are in correct format
  • ✓ No missing or duplicate transactions
  • ✓ Balances are accurate
  • ✓ Special characters display correctly
  • ✓ Currency amounts have correct decimal places

Security and Privacy Tips

  1. Protect your data: Use trusted websites like ilovepdf2.com with HTTPS
  2. Delete after download: Remove files from online converters immediately
  3. Use encryption: Store JSON files in encrypted folders or password-protected drives
  4. Local processing: Use Python/Node.js methods to keep data on your computer
  5. Backup originals: Always keep your original PDF statements safe
  6. Check privacy policy: Read what the converter does with your data
  7. Avoid public WiFi: Don’t upload bank statements on public networks

Step-by-Step Tutorial for Beginners

Let’s walk through the easiest method step by step:

Getting Started with ilovepdf2.com:

  1. Open your browser and go to https://ilovepdf2.com/convert-pdf-to-json/
  2. Locate your bank statement PDF file on your computer
  3. Upload the file by clicking “Select PDF file” or dragging it to the website
  4. Wait for processing (usually takes 10-30 seconds depending on file size)
  5. Download your JSON by clicking the download button
  6. Open the JSON file with Notepad, TextEdit, or any text editor to view your data
  7. Verify the data by checking a few transactions against your original PDF
  8. Use the JSON file in your accounting software, app, or script

That’s it! You now have your bank statement in JSON format ready to use.

Real-World Use Cases

For Small Business Owners: Convert monthly statements to JSON, then import into QuickBooks or Xero for automatic reconciliation and expense categorization.

For Freelancers: Extract transaction data to track business expenses, calculate tax deductions, and generate client invoices based on project-related payments.

For Personal Finance: Convert statements from multiple banks to JSON, then use budgeting apps or create custom spreadsheets to analyze spending patterns.

For Accountants: Batch convert client statements to JSON for faster data entry, automated categorization, and streamlined tax preparation.

For Developers: Use JSON data to build custom financial dashboards, automate reporting, or integrate with business intelligence tools.

Beyond JSON: Other Formats You Might Need

Once you have your data in Excel format, you can also convert to:

  • CSV – For importing into databases and spreadsheet software
  • XML – For legacy system integration
  • SQL – For direct database insertion
  • API formats – For custom application integration

If you need your bank statements in Excel format first (which is often easier to work with), Your Bank Statement Converter makes the process simple. Convert your PDF statements to clean Excel files in seconds, then you can easily transform them to JSON or any other format you need.

Final Thoughts

Converting bank statements to JSON doesn’t have to be complicated. Whether you’re using the free online tool at ilovepdf2 for quick conversions or implementing custom Python scripts for automated workflows, you now have multiple ways to get the job done.

Start with the easiest method that fits your needs, and don’t be afraid to try different approaches if the first one doesn’t work perfectly. Every bank formats their statements differently, so finding the right tool or method might take a little experimentation.

The time you invest in converting your statements to JSON will pay off with better financial insights, automated workflows, and more efficient data management.

Share:

More Posts

Send Us A Message

Scroll to Top