Show HN: Airbyte Agents – context for agents across multiple data sources

<- Back

Show HN: Airbyte Agents – context for agents across multiple data sources

mtricot

Comments (23)

swyx
(former employee here) congrats Michel! so glad to see you guys adapting to the AI age so well (and using the crap out of Devin!)hmm so airbyte agents could serve as a form of MCP gateway, or a key building block of an MCP gateway, which btw is how anthropic uses mcp themselves for all their internal apps https://www.youtube.com/watch?v=CD6R4Wf3jnY&t=1s&pp=0gcJCd4K...i think my most sad/interesting observation about ai engineers is that many ai apps are super data hungry, but many dont have the necessary data engineering background to even know they need an airbyte or what tradeoffs to make in an etl pipeline. would love a "data engineering for ai engineers" type braindump session from someone from airbyte at AIE (https://ai.engineer/cfp )
jessewmc
Looks interesting!If I'm reading correctly, the indexing (Context Store) is neutral/unopinionated? How does it select fields for indexing?Have you done any testing on guided indexing, or metadata layers on top of the data? My experience so far on similar work is that getting data in front of an agent isn't enough context to get useful/reliable answers enough of the time. I.e. _what_ you index, and how you signpost for agents, becomes really important (unless your data is super clean I guess). This does look like a good foundation for that kind of tooling though!
slurpyb
Your billing support email forwards to a google group which rejects the email entirely. So i embedded my question inside the websites sales enquiry form and received multiple rounds of emails that couldn’t be further from human.It’s not why we started using posthog but it definitely sealed the deal when you see how simple and reliable that experience is
nerdright
This is such a great direction airbyte is taking and congrats to the lunch! I think you're very well-positioned for this opportunity than most people realize, given your reputable brand and your uncanny expertise in etl. It's honestly a natural progression of airbyte as far as the current AI landscape goes. Kudos to you and the team!(We use airbyte at my company, although we self-host it.)
jscheel
I feel like we've been working in parallel here :) We are using PyAirbyte (hi aaronsteers) for our users to connect their data sources to our agents. We originally wanted to use the airbyte white-label platform, but the team said that it was being deprecated. I think this really drives home just how crucial it is to have a clear model for accessing your data, and Airbyte has been great at that for quite a while.
tomrod
What actions does agents enable that weren't already available from Airbyte?
ritonlajoie
Hi Michel, congrats and I have nice memories of working with you in lafayette street !! Keep up the good work on airbyte ! :)
mtricot
Just want to call out a couple of nuances in our methodology. In general, we tried our best to do apples-to-apples comparisons where we could, and gave ourselves a discount where we couldn’t. Unsurprisingly, it’s a challenge to find MCPs for various vendors (which is another reason we are trying to solve this). Here’s a video walkthrough of the benchmark harness:https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54accWhere the comparison wasn't valid or not apples-to-apples:Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.The general test set:2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!Where the vendor MCP wins or ties:Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).Where the vendor MCP is costly to context:Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.
ecares
Did you find that some data model patterns were easier to detect for some LLM ? I am curious on how training might have made some agents better at graph navigation for instance?
pjm331
sounds very familiar to what I ended up doing on my internal system - especially anything to do with search - much better to just sync everything to a DB and give the agent access to the DB
anon
undefined
aayushkumar121
[flagged]
clauderodriguez
[flagged]
dailoxxxx
[flagged]
Dorrell
[flagged]
dubovskiyIM
[flagged]