Blockchain Data Wizard
New York
Allium
Accurate, Fast, Simple blockchain data. We cover 40+ blockchains, 100+ schemas, in near realtime on a single platformBlockchain data is hard, messy, and chaotic
When we started out in late 2021 our thesis was simple - blockchain data, despite it being public and free, was difficult to understand, clunky to access and troublesome to maintain. Answering a simple question like “Who are the biggest Ethereum token holders over time?” requires an engineering team to run their own RPC nodes, ingest the full history of the blockchain, clean the data, transform the data and finally summon a wizard to cast a complex SQL query.
Accessing data is hard because blockchains are optimized for Writes and not Reads
Why is it so hard? Blockchains have historically been optimized for Writes (getting data onto the blockchain) and less for Reads (getting data OUT of the blockchain). This is because optimization efforts were focused on increasing transaction throughput and building fault tolerant and scalable consensus algorithms. This neglect makes it hard to get data out efficiently and reliably at scale.
Parsing and interpreting blockchain data requires both deep domain expertise and data manipulation
To quote Tim Roughgarden, Columbia Professor, “Blockchains are (virtual) computers, not databases.” They are Turing machines that support general computations, and anyone can write and deploy their own smart contract for their own use case. This nearly infinite number of use cases leads to the fragmentation of data schemas for different purposes. Standardizing these schemas requires deep domain expertise to turn esoteric technical outputs into clear information for specific concepts like tokens, NFTs, stablecoins and DEXs.
Allium abstracts the complexity with a simple way to query blockchain data
Allium tames the chaos by ingesting, sanitizing, and standardizing all this data. As of this post, the data we’ve archived across 40+ blockchains is in the petabytes and growing exponentially.
Google and Bloomberg had to organize the world's public financial and webpage data, Allium is on a mission to do the same for blockchain data
This is one of the rare times in history where indexing a giant public dataset is sorely needed by all - similar to what Bloomberg did for financial data and what Google organized for public webpage data. With this indexed data, we are fortunate to support trailblazers in this industry and play some role the industry’s most exciting trends:
About our customersWe serve 2 groups of customers today with the same data but different platform. Analysts who need to answer data questions about the blockchain (think BI) and Engineers who need highly reliable data queryable in near realtime (think Application backends). Our customers include the biggest institutions Visa, Stripe, Grayscale and also the biggest crypto companies such as Phantom, Uniswap. Allium is one of the unique companies in the industry that bridge blockchain and non blockchain worlds.
About the RoleWe love engineers and wizards who love solving new problems every single day. While wizards are not engineers, they contribute and give the best product roadmap guidance to ensure the engineers build the right things in the right way.
Data Egress - How does one transport 100s of TBs of data around the world without breaking the piggybank? (https://www.databricks.com/blog/announcing-public-preview-delta-sharing-cloudflare-r2-integration)
Handle high traffic - How can we support the biggest applications in this industry and allow handle 100,000 QPS at peak traffic and not go down? (https://www.allium.so/post/the-inside-story-of-the-jup-airdrop-at-phantom)
Botnets - This industry is in its early days, how does one catch botnets based on their behavioral patterns? (https://www.allium.so/post/initial-report-for-the-10-6k-botnet)
Fraud (Sybil) Detection - Is it possible to transfer the same fraud detection heuristics into this blockchain world? (https://www.allium.so/post/from-eligibility-to-sybil-detection-a-deep-dive-into-wormholes-multichain-airdrop)
Who is real? - What constitutes meaningful and organic transactions on the blockchain? (https://www.allium.so/post/visa-x-allium-making-sense-of-stablecoins)
Bring Your Own Transformation - How do we let our customers design their own APIs and transform their own realtime data streams? (https://docs.allium.so/allium-app/explorer-api/quickstart)
Data Governance - We pride ourselves on our data quality - How can we ensure our data is consistent across every copy and every region 24/7?
AI and LLMs - How does one design the LLM and AI experience on top of our data to lower the barrier of entry to crypto data? (https://docs.allium.so/allium-app/allium-explorer/allium-ai-assistant)
Data Transformation Holy grail: How can one unify streaming and batch transformation logic into a single code base?
More specific past work Allium data wizards have done:
Diving deep into the guts of Ordinals data to power research like this: https://x.com/0xren_cf/status/1813606266553446596
Sybil Detection (https://www.allium.so/post/from-eligibility-to-sybil-detection-a-deep-dive-into-wormholes-multichain-airdrop)
Creating Wallet360 (https://docs.allium.so/historical-data/supported-blockchains/evm/ethereum/wallet-360)
Deploy washtrading filters (https://docs.allium.so/historical-data/supported-blockchains/evm/ethereum/nfts/wash-trading-flag)
Powering Brevan Howard Digital's stablecoin industry reports
Designing the most intuitive DEX schemas for ALL DeFi researchers to use easily (https://www.allium.so/post/announcing-v3-polars-the-dex-analytics-portal-sponsored-by-the-uniswap-foundation)
Ensure Grayscale's State of Ethereum Report had the right staking and fees data: (https://www.grayscale.com/research/reports/the-state-of-ethereum)
Design chain level metrics (Fees, Activity..) for all chains: (https://docs.allium.so/historical-data/chain-metrics)
Hunt down and curate wallet entities and labels (https://docs.allium.so/historical-data/identity)
Account abstraction - Ensure we have all the right decoded logs to power https://www.bundlebear.com/overview/all
Some qualities
Sherlock & Enola Holmes level of curiosity to find peculiarities in the data and help the industry redefine the narratives
Ability to parse and understand new blockchain schemas fast and well
Proficient understanding of NFT, DEXs, Decoded Logs, and Smart Contracts to transform the data to the product and customer's needs quickly
Giant infrastructure budget per head
You will make mistakes, costly mistakes, but at Allium's expense. We have an internal leaderboard of the costliest infrastructure mistakes made, and we (try to) learn from them. We don't have fancy Michelin-starred meal budgets, but we have a huge infrastructure budget for one to get better at your craft. Why? We leverage every tool (no prereqs) out there because we meet our enterprise customers where they are at:
Every OLAP: Snowflake, Databricks, Bigquery, Clickhouse*
Every OLTP: Postgres, Aurora
Every Event bus: Kafka, SNS, PubSub
Every Cloud Provider: AWS, GCP, Azure (one day)
A copy of data in every region: US East, Central West, Europe, Asia
Every data transformation and orchestration tool: Apache Beam, Materialize, TinyBird, DBT, SQLMesh, Temporal
Data governance tools: DataFold
What some ~cool people have to say about us:
Mario Gabriele from The Generalist's Future 50 Startup List: https://www.allium.so/post/allium-named-awardee-of-the-generalists-inaugural-future-50-startups
Tomasz Tungus from Theory Ventures: https://tomtunguz.com/allium/
Bucky Moore from Kleiner Perkins: https://www.kleinerperkins.com/perspectives/allium-series-a/
Ok.. now for some tough love, here are the values we strive for at Allium:
Pro Athlete Mindset - Consistency. Day in and day out, in pursuit of excellence. A win yesterday does not guarantee (or even imply!) a win tomorrow. I hope anyone who supports a failing sports team will feel the pain (cough Man United fans) of inconsistency
Figure It Out & Extreme Ownership - Every day is unexplored territory. There are new engineering frameworks, new legal docs, new compliance, new sales, new regulations, and new operational procedures every single day. If you don’t know it, learn it. If you can’t learn it, find someone or a product that does it. If you can’t find someone, find someone who can find someone.
High Agency - (One of) the highest commonality between all successful people is their responsiveness, most successful billionaire CEOs still reply to emails within minutes (within working hours). And when you reply, respond fast with effective solutions - and even better, resolutions. If you’re looking for a superpower, you can’t go wrong with responsiveness. Well of course this doesn't make sense when you're an engineer coding in flow, but in general high agency of problem solving gets one very far in life
Leading from the Front - No one is going to listen (and adopt) your suggestion unless you lead by example. It’s one thing to say We need to do XYZ this better & it’s another thing to build an MVP and say “This is the way we should do things”. The proof of work and momentum goes a long way.
Strong Opinions On the Future (loosely held) It is okay to be wrong, but what is not okay is not to have an idea of how a better future should be. Alliumites take pride in trying to improving everything about the company all the time.
Sense of (allium) business smell - There are number of folks who live to eat at Allium, but the Allium smell we are talking about is that we love folks who naturally want to know why and how the work they are doing builds leverage for their teammates and also relates to the business goals
We invite engineers of all sorts of backgrounds (https://www.allium.so/about). We have engineers who learnt coding much later in life, who learnt coding on the side, we have engineers who are still in school and we also have engineers who went to the top schools (CMU, Stanford, UIUC, UPenn, Oxford, NUS, Cornell), all are welcome if one comes in with a curious mind and an infectious work ethic.
Administrative Benefits
Medical, Dental, Vision, Life and AD&D insurance - US folks get 100% coverage for Gold plans, 80% for dependents
Note: The sun never sets on Allium - we hire from any geographical location as long as you are willing to overlap 2 hours overlap on NYC mornings Mon-Thurs from 10am-12pm ET. We have people based in New York, Seattle, Singapore and Australia
All applicants have to answer this pop quiz: "What is an Allium? What is your favorite Allium?". Bonus points for the right pronunciation.
* Salary range is an estimate based on our AI, ML, Data Science Salary Index 💰
Tags: APIs AWS Azure BigQuery Blockchain Crypto Databricks Data governance Data quality dbt Engineering GCP Kafka LLMs MVP OLAP PostgreSQL Research Snowflake SQL Streaming
Perks/benefits: Health care Startup environment
More jobs like this
Explore more career opportunities
Find even more open roles below ordered by popularity of job title or skills/products/technologies used.