The Privacy Insider Podcast
AI Doesn’t Need More Data; It Needs Context with Philip Rathle of Neo4j
We sit down with Philip Rathle, Chief Technology Officer of Neo4j, to explore a question that’s becoming urgent in the age of AI: What happens when powerful models operate without context, governance, or explainability?
As generative AI reshapes enterprise technology, graph databases are quietly becoming a foundational layer for accuracy, transparency, and data control. Philip shares why AI systems struggle without structured relationships, how graphs reduce hallucinations, and what this means for privacy teams navigating Customer 360, data subject requests, and regulatory pressure.
Episode Highlights:
-
00:00 Introduction.
-
02:30 From chemical engineering to data architecture.
-
05:45 What a graph database actually is, and why it’s simpler than it sounds.
-
10:30 Why relational databases struggle with complex, connected data.
-
17:45 The AI tailwind: hallucinations, explainability, and governance.
-
23:10 Customer 360 and resolving fragmented identities.
-
28:15 Handling data subject access and deletion requests with graphs.
-
30:45 The double-edged sword: when graph power becomes surveillance risk.
-
37:00 AI models, privacy controls, and why not everything belongs in an LLM.
- 41:30 Confessions from a CTO: privacy habits in real life.
Episode Resources:
Philip Rathle: the technology that we built happens to solve, really the top problems that you end up with in AI applications. when. Not using graphs, which are hallucinations, uh, lack of explainability, lack of, let's say data governance slash discernment, slash judgment of knowing what data is appropriate to use for what purpose, and then have access to explicit context. Hello, my name is Arlo Gilbert. I'm the founder of osano, a leading data privacy company. Today, I'm your host on the Privacy Insider. We're joined today by Philip Ratley. Philip is the CTO at Neo4j. Neo4j is the leading dominant graph database. It's been downloaded hundreds of millions of times.
Arlo: It's used by 84 of the Fortune 100, and their last valuation was in excess of $2 billion. Philip has overseen the growth of this company from its early days. Philip, welcome to the show.
Philip Rathle: Thanks, Arlo. Pleasure to be here.
Arlo: So Philip, we have a lot of people on this show who represent the privacy side.
We've had regulators, we've had technologists, we've had ethicists and philosophers. Uh, we've even had leaders of technology and politicians join us, but we've never really had a deep technical mind join us on this show before. And you're working on really interesting technology that has a lot of overlap with.
Privacy and ai. And uh, we were hoping today that we might learn a little bit about some of the technology that you're using and developing. but before we start with that, we always want to know. Who are you? How did you get here? I mean, you're the CTO of Neo4j, which is arguably the world's most popular and dominant graph database, and by virtue of that engagement, I'm imagining that you've got your fingers on the pulse of a lot of things that are happening.
So how did a guy like you get to this place in life?
Philip Rathle: It really started with, you know, first job working in tech as a consultant and then somehow stumbling into data very early on, on the data warehousing, but also the operational side. And, you know, some of my early experiences with data within, within a few years into my career, I was doing data modeling and DBA work. Eventually got into solution architecture kind of roles where I was sitting somewhere between the business and the technology, uh, which is where I've, you know, where I still like to be. And, uh, you know, and really seeing the power of data, even going back a couple decades and how, for example, I was brought in as part of a SWAT team to do some work. at MCI back in the day, if you remember them, and probably those of you who are old enough like me, remember getting lots of telemarketing calls from MCI. Well, it turns out there's, they had this massive marketing engine that marketed 300 million households. Per month, which by the way, is more households than exist in the United States. you know, lots of business op uh, you know, implications around that, like people getting multiple calls and, you know, so data. So that was an introduction to data quality and the. Reputational impacts and technical costs. And then underneath that, there was this batch job that was part of it that took nightly batch job that took 28 hours to run.
So like, all right, well there's this whole realm of it's, it's a non-starter for a nightly batch job to run. 28 hours or even 24 hours, like it probably should run maximum maybe like three or four hours. and, uh, you know, got very much into performance, scalability, but also data quality and how ultimately the fuel for business capability becomes how well you understand your data.
Another, I implementation I was involved with early on was building United Airlines first Customer 360 system back when. They'd outsourced their website, to a third party vendor who then owned the profile. And then they had their own, you know, mileage plus system, but they also were coming up with like a initial, uh, WAP phone and IVR and, you know, how do you tie these together?
So again, there from an operational standpoint, having. Systems, infrastructure that would enable the business to treat customers as the same person, regardless of the touchpoint with all the latest information so that you're dealing with someone who just missed a flight knowing that they just missed a flight. and, uh, like likewise on the analytics side for doing customer value, lifetime value calculations and that kind of thing. So yeah, lot, lots. I fell into data really. Early on and stuck with it. Got into database tooling, uh, for about seven years. Uh, ended up heading up a product portfolio at a company called Embarked Aero Technologies, which had had, and has one of the leading database, modeling tools, ER studio, uh, along with a number of tools for DBAs and developers, um, working with data and databases.
And then, uh. Yeah. 14 years ago or so now joined, uh, had an opportunity to pioneer, the graph category as the first product hire and, and product leader at Neo4j.
Arlo: So I'm, I'm curious about two things on that. Um, first off, how did you end up getting connected with the Neo4j team? I mean, there's usually a story in the early startup days of, you know, they weren't a thousand people yet. Uh, they didn't have. Billions of dollars. What? What was that like?
Philip Rathle: There were 20 something people and they were, I met the founder who was raising Emel, who's still our CEO raising for his series A. But you know, as these things often happen, it was through the graph, through the professional graph in this case. And a friend of mine who, worked with me in product marketing. happened to know the first CMO at Neo4j, um, who happened to, know that the CEO was looking for a first, uh, product leader. you know, 20 people. The product founder is still in, in charge of every like, product and engineering and, and so on. And, uh, you know, he is looking for someone to post series a handoff stewardship of the product, um, and of the product vision too. to someone who, could, could, could lead that and was also grounded in what the world was doing and needed of a database platform. Um, so, uh, yeah, Stu stumbled into it really just through, knowing the right people at the right time
Arlo: That's amazing. It's always, it's always good fortune that that turns out to be one of the predictors of, of amazing outcomes. And uh, and when you talk about your background, I mean, it sounds like there was a lot of data involved in there. What is it about. Data that you, that you love, because there are a lot of ways that you could spend time as a technologist and product leader.
Is it, you know, is there, is it, is it the structure? Is it the connections? Is there something about data that pulled you versus, I don't know, going into, you know, coding, uh, you know, and, and, and being a developer or things like that?
Philip Rathle: Yeah, I, I, I definitely did some amount of coding in my, in my time, but to, to me, um, I studied chemical engineering because I love physics and chemistry and math and, you know, and, and the challenge and what I realized coming into software is. In the same way that you're flowing some materials through a set of, you know, reactors and crystallizers and distillation columns and what, whatever else, and you're transmuting these, you know, the, these chemicals, and their physical properties inside of each step In the enterprise, you have all these pipelines, and what all the programs and code is doing is it's effectively transmuting and operating on data. So I, I saw the purpose of code. Obviously this is just my perspective and you know, you could argue, argue the opposite, but in my mind the purpose of the code and the applications and everything else was actually to move data through so that you could then use that data to operate your business from like a store and retrieve or LTP standpoint and to make better decisions. And then ultimately predictions And, you know, we can get into AI from, from analytics, but um, I really saw. As the core essence of, you know, where a lot of value then came from. Obviously you need to marshal up your data into applications that then will be used by users and create great experiences and so on.
That's definitely a part of it, and that was, that's been the appeal to me of being in product management and technology more broadly. Um. and, and then as technology moved on and we got into the early innings of AI what previously was done as code got moved upstream and became a BA data problem.
This is what, like supervised machine learning is, is instead of creating rules or speculating, I'm actually gonna move that. Up into a data problem. So the the data thesis and perspective I had in some ways became strengthened. by AI and then strengthen even more. So as we get into generative ai, where all the magic that language models are doing just comes outta being trained on masses amounts of data.
Now, what we don't think about is that?
data's often curated, you know, you've got companies like Scale AI, and, you know, and, and others who actually do all this manual labeling and so on. So it seems like it's just masses of uncurated data. from the outside, but there's still, know, a little bit of a, very much of a garbage in, garbage out. and so data still remains as or more important than ever more.
Arlo: Well, let's talk about data. You know, I, it was 20 13, 20 14, uh, I was in San Francisco and, um, there was a meetup, at the medium headquarters in that triangular building. And, uh, I remember going there 'cause I, I was, I saw that there was a meetup about graph databases and I was really curious about graph databases.
This was a new technology and there was a lot of. Question about whether graph as a technology was actually even a real database and a real thing that people would want to use in, in a commercial, scalable way. And I believe that early on, companies like Medium were some of the big early adopters of that technology, but.
A lot of people hear the word graph and their mind goes to, you know, the paper they had in school, you know, you had your graph paper or they heard about Facebook building a social graph to monitor all of us. And I thought it would be really helpful if you might take our audience through kind of a 1 0 1 of what is a graph database Exactly.
And, and how does that work? Because I think most people understand. Broad concept, at least conceptually they understand what a tabular database is or a relationship database is. Right. It's a, it's like an Excel spreadsheet and if you're familiar with pivot tables, you kind of understand joins but beyond that, graphs are way more complicated and I think it would be helpful to understand that.
Philip Rathle: Yeah. And perspective is they're not more complicated. They're simpler. But, but let me, let me get to it and explain why, really, the idea with. Neo4j and what would kind of gave birth to the, the technology. And the idea is a lot of the world just shows up as you know, there are systems in the world that we need our software to be able to understand.
And this is even more true of AI and those systems. How do they show up? Well, you could say that what gave birth to relational databases was the need for business process automation. and then what, what's the data problem underlying business process automation? I have data in warehouses and filing cabinets and paper forms. So paper forms have the characteristic of being, Human generated. They're pretty straightforward. You can easily come up with a set of rules to break them apart, remove your redundancy. These are all the normalization rules. and then to, to rehydrate it back into, you know, some form or subset of the form when I'm working through my system.
So that's a bit oversimplified, but not far from the truth. The, increasingly as we've mastered that and moved on to managing, you know, having to deal with these complex dynamic. You know, world of business where everything's interconnected and with the build applications that do, you know, digital transformation and AI and, I need to deal with networks of people, networks of computers, networks of spread of ideas, of influence, of geopolitics, of, networks of payments, networks of biology, networks of ecology.
A lot of this, a lot of things in the world show up as networks. Likewise, you, you, you have, networks that are maybe more like up and down, hierarchically shaped or shaped like trees. This is like an HR hierarchy. Which by the way, is not strictly top down. It is from a reporting structure perspective, but in terms of the way the org actually operates, and you know, I have, I report into a project, uh, in a, maybe a project manager there and I have a mentor, and then there's the historical dimension, and then there's my skills all around me and so on and so forth.
But you could say that's, you know, broadly hierarchical. permissions, asset, asset ownership, supply chain. These things are all like real world systems, digital world systems that are more hierarchical. And then you have paths and journeys through that customer journey, patient journey. And so the, the observation that are founders had, and that I really resonated with me as a thesis is. Look like we're taking all this stuff that shaped like networks and hierarchies and stuffing into tables. And there's a huge impedance mismatch, you know, which is a jargon for, there's a great distance between the way the data is shaped and shows up in the real world and even the way that a business person will like whiteboard their domain. they always do that as like circles in lines that are connecting each other, which is a graph. Um, and. You know, they're not gonna say, look, here's, here's my supply chain. And like, draw out tables and then like join tables and recursive joins. And that's just not how we intuit data. It's not how it naturally shows up. not that you can't put it into a relational database. Anything you can put into a graph, you can put in a relational database and vice versa. but the, the observation was for. A world of business that's fast moving where I don't know from one minute to the next, what problem I'm gonna need to solve next to be competitive and where I'm next gonna get and, and what piece of data is gonna give you me the most value. So you want a model where the distance between the, business conception and a developer's, way a developer works with the data is. Has as little different just as possible between that and the way the data actually lives in the data structures in the database that's managing it as well as to to have a model where ideally I should be able to add. To my data without going through this whole schema migration exercise and spend, you know, multiple weeks or months, um, you know, getting data modelers into a room to figure out how to adapt the model, and then doing a whole development effort around, you know, how, how do I change my application now that my s scheme has changed. So schema flexibility is another core principle here. that is, you know, a part of the core implementation, in Neo4j and in many graph databases. And of course you have an option of then after the fact adding constraints once you understand the data and want to, you know, implement those, uh, at a lower level. And then also having a query language that is based on, Understanding how things are connecting. So expressing connectivity in a query language. so that instead of having like, you know, 50, a hundred line queries that are doing 15 way joins in order to do, you know, some supply chain query that's multiple levels out, or some social, you know, uh, arbitrary, you know what? Kevin Bacon type queries, which are actually useful in many business contexts, of what's the shortest path between these two points in a network, that, uh, being able to express those queries and then being able to run them very efficiently. and those are all the. Let's say core ingredients of what make graph databases different and also useful.
So relational databases are super useful and obviously by far, you know, by 10 x margin, the most common commonly deployed with respect to all other database technologies combined, not withstanding any AI application that you build today, you're gonna want to, and I'm sure we'll get into this. deal with fast moving data cut across the silos have a language that both humans and, and models and AI agents can, easily generate and, and express queries in. and it turns out, uh, the graph database, checks all of those boxes, um, particularly the way Neo4j's implemented it. So think of it as a, to bubble up, a database management system built from the ground up for. Contemporary hardware. So where, it's memory abundant memory's fairly cheap, very fast storage substrate rather than spinning disc. and that Is.
designed to store and work with and analyze, but also to be the, you know, real time system behind agents for, doing retrievals in worlds where you, you've got one or more of these, um. Complex, complex, you know, kinds of real world, uh, systems that show up as networks.
Arlo: So then the takeaway and what I'm hearing is, is that relational databases, as we think about them, I mean at their core, they weren't really designed for relationships. They were designed for storing, tabular
data
whereas graph databases Actually were built by design to be about connecting the dots between different things.
Philip Rathle: That's right. Yeah. The relationship is a first class object in the database, and what that technically translates into is I have nodes which I can use to represent things, and I have relationships, which is like, what's, you know, it's what it sounds like. It's how, how does this thing relate to that thing?
And relationships have a type. They have a direction, they can have a number of properties. So you can have attribution of relationships, which is great, like from a start date, end date, level of certainty if I'm dealing with identity, and so on and so forth. And this model has been, you know, let's say blessed and you know, by the International Standards organization and the same body that came up with the SQL standard. Nearly 40 years ago, like SQL 86 was adopted by ISO in 87. there literally is no other for for 30 plus years, there was no other ISO standard. Database model, um, or language to go with it up until the graph model came around and now there's something called gql, which for all intent intents and purposes is more or less the same as, uh, Neo4j cipher query language, which is the, was already the defacto language for graph databases. So it's, it's, it's something that has, you could say, your generational seal of approval, from, uh, you know, from the key standards body that governs database standards.
Arlo: That's great. Um, well, I, I'd love to talk about the privacy implications of that, but before we talk about that, I'm curious with the, with the growth of generative AI and all of this need to do lots of disparate retrievals, I'm assuming that this has been a bit of a renaissance for Neo4jI mean, it was already a.
Fast growing company, but has this AI wave really had a significant impact on technology decisions and and customer buying behavior? Are you seeing a lot of AI first buyers?
Philip Rathle: It's been, it's been a massive tailwind. And of course, whenever there's a big generational, um, platform shift like this, you know, fir, first of all, the pendulum swings over and everyone starts using just the one piece of new technology to solve everything. So LLMs, and then very quickly. you know, you discover, alright, what are, what's the right mix of new technologies?
So like, vector embeddings is another one with other existing technologies like graph databases, to, let's say balance and make, uh, create a, you know, one plus one plus one equals 10, let's say in the case of LMS vectors and, and graphs. So the, um. Yeah, it's the, the buying behavior has definitely shifted and I think for everyone in enterprise tech, very heavily in the direction of ai. And lucky for us, the technology that we built happens to solve, really the top problems that you end up with in AI applications. when. Not using graphs, which are hallucinations, uh, lack of explainability, lack of, let's say data governance slash discernment, slash judgment of knowing what data is appropriate to use for what purpose, and then being able to have access to explicit context.
Arlo: So we talk about AI and, you know, let's, let's shift over to privacy because that, that is a big piece of anything that you build these days. I mean, if you're gonna put the data in, then you now have all these obligations and various regulatory regimes that require you to be able to get the data out, to be able to cleanse it, to be able to control access to it.
how are graph databases and Neo4j in particular? Um, you know, how do those contribute to privacy and, and how, how can privacy professionals think about this technology that, for a lot of people, feels very abstract, right? Uh, if you're not a database person, then you're probably not super familiar with it and anything.
Anything new or anything unpredictable obviously creates fear in the mind of people who are involved in risk. so how do people manage to govern, uh, a Neo4j instance? How do they implement privacy controls on a Neo4j instance?
Philip Rathle: l let me start with customer 360 and, you know, cut, uh, unifying silos is a way. To, get, get into the answer to your question. one of the most common use cases that we've had over the years is, how do you solve the problem, which every company has, that you have each department. Each division has one or more silos, you know, usually many, many silos, that are either internally built applications or, you know, some third party app.
And. When you're engaging with the customer, what we all know as customers is we want companies to take into account the entirety of our relationship across all channels, across all the identities I've used across all the channels, you know, phone, email, um, text, et cetera, social media, and across all the different departments and lines of business. And so there's always been this problem of, you know, how do you get to a single identity and. where Neo4j's used a lot is to say, let's take create a graph that has all the different identities. All the original identities and how they relate to each other. 'cause you know what those are, you might have to go hunting in a bunch of different systems to pull it up. But then, but. you can put those into a graph where I can say, here are all the different aliases for what ultimately is one customer. Have a relationship there. And then for each alias have a relationship with all the different identities, multiple email addresses that people have used. Uh, roll those up to households and. This very messy structure that's really problematic to deal with in relational databases becomes actually very easy to deal with in the graph. And then from that point, you can put the graph behind like a service layer that will call out. You know, spider across the graph, figure out what the identity is, and then go, and if there's some payload, you need to grab an existing system.
You can all do that. And this ends up being a happy medium between the two extremes, neither of which works. One is, let's just federate absolutely everything. Which doesn't work, or the other is let's create one system to rule them all. And, you know, as the, as the meme goes, then, you know, if you had 14 systems and you create the ones who rule them all, then you just have 15 systems at the end of it. and so what does that mean for privacy? So from privacy, there are things that I want as a consumer, that you, you, you could say almost are, In the opposite direction of privacy, but it's, I want the company to know all the things about them that I've told them that I expect them to know. On the other hand, I want to be able to make sure that the company is not using information in ways that I don't want it to marketing, opt out and, and whatnot. And so by, this unified. View that sits over and straddles your existing systems. You can then make sure that if there's a marketing opt out over here, that that bubbles up appropriately. and then isn't over implemented in cases where maybe I want to opt out of certain things and, but I really, really am interested in getting information about other things. Another thing you can do in the graph is there's information which a customers disclose. There's also information which you gather by virtue of. Their activity, which might end up in web logs and so on, where the customer hasn't explicitly identified themselves, but I might know who the person is or I might know what all if, if they haven't yet disclosed, who they are. I might know that all this activity adds up to a certain individual based on, you know, looking at cookies, trackers, MAC addresses, IP addresses, and so on and so forth, and resolving those in some way. So. Obviously there are respectful and appropriate and illegal ways to do this. but for me it's convenient as a consumer, if as I'm engaging with a website multiple times that they use what they know of my activity and serve up more and more useful information. And they don't need to know my name to do that. and the, the kinds of things that you can do with graphs on the analytics side are to take all these breadcrumbs, do some fancy analytics, come up with a suspected identity, use it for marketing, and we've seen companies get literally as high as like 600% uplift in engagement. Which is completely unheard of
Arlo: Wow.
Philip Rathle: marketing world, where usually like low single digits
Arlo: Yeah.
Philip Rathle: um, the norm for a successful marketing campaign. by using graphs in, in this way on the analytic side.
Arlo: so when you use the graph in that manner that's being used, you know it helps to serve the end user, but it largely is helping the business. When you think about the. The ability for a consumer to opt out, to delete, to redact their information. We call those the kindergarten rules of data privacy.
Right? You know, be, you know, if you want something from somebody, ask permission. If they want it back, give it back to them. And if they want to know where you're keeping it, be honest. Would it be fair to say that with a graph database, it's a lot easier to be able to identify? Here's the information I have about you, in for the purpose of reporting out.
Hey, I, I happen to have information about your buying activity here. I might have some behavioral activity or here, and because it's all in a similar graph or it's, or it's connected, it's far easier for me to be able to surface the answer to what data do you store about me?
Philip Rathle: Yeah, that's right. And that comes up in, there are two situations where we need to do that. One is people exercising the right to know what information is being stored, and then the second is right to be forgotten. And these both can be very expensive exercises if you don't have the right data infrastructure. And if you have a graph, then you. Can essentially have the, structure through which you can easily, like just work your way down the graph, just f follow the dots literally. and follow the path into what data is stored, at, at a different level. So that's definitely a use is responding to those requests in a way that's. Faster and you know, let's say in line with compliance timelines and faster for the customer and so on, but is way, way, way, way cheaper. Now, I won't say that that's necessarily the highest value use, I, I think the higher, but it's a, a super Yeah. Is one of the val, uh, value added uses. I'd say the highest value is making sure that you are using. Either your, your data appropriately across channels based on what customers have requested. Respecting, this is less privacy, but kind of in the same ballpark of what enterprises need to do, respect the organizational firewalls. So in banking, for example, the investment bank can't use certain information from, from corporate banking, for, you know, regulatory and anti-competitive reasons.
Um, but The implementation is the same, is understanding what data at, at a fine grain level, can and cannot be used by what party, for what purpose, for what person?
Arlo: Now conversely, so those are, those are great examples of, you know, the pros of the graph. So, you know, marketing teams can quickly get better, uplift through better identification of properties and users. Sounds like there's some real pros to the privacy side of being able to do the same thing. For the purpose of understanding what you store about me, what are the downsides?
Uh, when we think about graph databases? Again, I come back to this, this Facebook, era where all we heard about was the knowledge graph. And so the word graph itself has some connotations around societal profiling and and things like that. Are there any downsides or any challenges that you think graphs introduce into the privacy equation?
Philip Rathle: I don't see downsides insofar as good actors being use using the technology as a tool, because what you can do is actually very rich and I, and I'll add one more capability that I missed, is you can. Add weights to the relationships in the graph that reflect your level of certainty that this person is, this, this identity, or this is, is this other identity or this thing happened or this didn't happen. so that, that gives you a very, very rich set of tools. Then what the risk become is bad actors using this technology, um, which then means you need to defend against it. So, you know, there's this. Saying on the, InfoSec perspective, uh, cybersecurity that I was coined by, um, someone at Microsoft years ago, which is attackers think in graphs. And so if your defender is thinking in lists or tables, then you are at a disadvantage. Like it's, you know, bringing a, knife to a gunfight kind kind and so what, what are, what are some of those things? Um. people who are doing, again, this is outside the realm of privacy, but you, you're, think the listeners will get the, the analogy, what is money laundering?
Money laundering is nothing but sending data from multiple places to one through many intermediaries and taking advantage of the fact that the systems that most companies have, at least up until pretty recently knew nothing about being able to. You know, understand paths across multiple intermediaries.
And so that ends up being a big gaping hole for someone who takes a graph perspective on things. likewise, from a privacy perspective, a, a good example is blockchains like the Bitcoin blockchain for example, where everything is public. All the transactions between wallets are. public, of course, the average person doesn't know who these wallets belong to, but the reality is you could have a million transactions and they're all anonymous. But then the second you need to pull money out of a financial institution. your account's been KYCed and all it takes is that one link to tie, tie you back and now it can spider through the graph and all these million transactions become very well known and understood.
Arlo: Fair enough. So, so graphs give you superpowers and you can choose to use them for good or for evil. And, and it does sound to me like the scary part of a, of a, well, a well-orchestrated graph is getting into the hands of a government who now all of a sudden can explore my shopping habits and who I talked to and how I voted in the last election, what I said on social media.
And so. There are definitely some some scary, scary outcomes. It's not necessarily the graph database itself that caused that, but the graph database does enable that discovery at a new level.
Philip Rathle: That's a good thing for people to keep in mind because to the degree that our personal information, our customers, our families, is exposed in the public. World, you know, you can have an entire graph of social activity and that's in its walled garden until you, have just the one link that, Connects it into some other data set and now all of a sudden I know all this additional information. So you know, the positive view when you're using this technology for good is, that's amazing from the perspective of data network effects and actually use case network effects, which is a, term I came up with just observing the, you know, if I sell for fraud detection. I might have most of the day that I need to do better recommendations, and then anti-money laundering and entirely unrelated things. This is the beauty and power of the graph. On the other hand, what it means is things that you have probably assumed, and I've probably assumed up until. now that remain in a certain domain and can't be connected, can very, very easily get stitched together and used and so activities that seem very, very remote end up being very transparent to companies.
So one example is, you know, there's. This ongoing debate of, you know, are my iPhones and my apps on my iPhone listening to me. 'cause Facebook just, you know, served, up this ad. And it's exactly what I was talking about over dinner last night. And, one of the techniques that companies like Facebook use is, okay, let's say, let's assume they're not listening on the phone, which supposedly they're not, is. They know based on location sharing information that I was at dinner with the other people who were at the table. And if one of those people during dinner searches for a particular thing, then. All the people who ate dinner presumably are good to target with that particular thing, especially if that thing is a product that, is an advertiser. So that is a hundred percent happening and that's something that listeners can take advantage of and guard themselves against as appropriate.
Arlo: That's right. We, we always joke in my house when we, you know, we, we talk about something in public or in front of our, one of our smart devices that, you know, our ads are gonna change real soon. So as, as we think about the world of technology, do you hear much, I mean, there's a little while where privacy was really a top topic for, for a couple of years.
It was in the news constantly. A lot of that came out of the Cambridge Analytica scandal. Do you feel like the, um, I mean, and you're, you're deep in the heart of, of Silicon Valley building a, a, a very powerful tech company. Do you feel like privacy is. Something that is being considered by many technologists, is it being kind of put off for another day?
How does that conversation rise to the rise to you?
Philip Rathle: I, I would say it's somewhere in between. It's a really big deal in Europe. Um, in fact, last year I was on a panel at Viva Tech with the, um, ANU, who's, uh, chair of the EU Data Protection Board, and. You know, and AI is a big area of concern because the assumption, I think both rightly and wrongly is that, every model is getting trained on everybody's information. I think, I think the nuance there is foundation models have access to certain amount of things, and then companies aren't necessarily training their data and models or trying to, trying to solve. Privacy sensitive problems in that particular way. But, there's definitely the feel fear of this among regulators and among, you know, the, uh, and among the populace. And, uh, so, the answer from that perspective is yes, very top of mind. I think where it becomes top of mind among execs that I've met with is, If you do, you know, one of the ways that you could get models to understand your business more is to train them on your data and then, but then if you're training them on your sensitive employee data or sensitive customer data, then anything the model's been trained on is fair game. And so people can easily use this to deliberately go and mine information about other people, and violate privacy. Laws and norms. and so, you know, therefore the model isn't the right place to do it in. But a, a lot of the rhetoric from, you know, the, foundation model providers kinda suggest that, well, I should just trust the model and it'll do the right thing.
And, and so to me, the, the counterbalance to that is don't trust the model with everything. Like, look. We do have technology that can provide fine-grain access controls. and in the graph you have very fine-grained access control, you?
know, down to, property level, which is kind of like column level, as well as controlling whether a relationship can be traversed or not.
So knowing that these two things are connected versus not, or knowing things about the relationship or even down to is this individual document. Should it be accessible to a person based on, say, their clearance level versus the classification level of a particular document. So we definitely have tools to do this. Those tools don't exist at all in the LMS themselves, and that's fine. That's just how enterprise tech has always worked, is let's use a combination of technologies and ai therefore becomes a. Composition problem of what's the right set of technologies to use and how do I, delegate my different concerns? so, graphs, you know, play, play the role of in cases where I need more accuracy or even a hundred percent accurate and explainable answer that can respect all the different privacy rules, graphs are super well suited to that when used as a knowledge layer that works, as part of an AI system.
Arlo: Yeah, it's, it's interesting, you know, we hear the, the fears about ai, traversing our data, and it's, it's a, it's a conundrum because in some ways these LLMs are capable of far more than we, than we. Might assume they can do, but at the same time we tend to ascribe a lot of power to these models, assuming that they have tools and data and capabilities and, and you know, the general public often doesn't realize that when you talk to chat GPT on your consumer app, you're not just talking to the model, you're talking to a model that is.
Armed with access to lots of different things. And so the model itself isn't the dangerous thing anymore than the person is the dangerous thing. It's, it's what does that model do and what did the people whom, who decided to use the model, what did they give that model access to?
Philip Rathle: A hundred percent. Yeah. Model is used. Tools and the foundation model. providers in their consumer products like ChatGPT, Gemini, Claude, and, and, and so on, perplexity, can make callouts. They can make callouts to the web. Um, of course in an enterprise context, they can call out to?
tools via MCP, and that can include, graph databases or any database for that matter, any microservice, and you name it.
Arlo: Will, we've talked a lot about privacy on this show and a lot about graph databases, and I'm curious, you know, we all support data privacy and we. We try our best to adhere to best practices. Uh, you know, the graph database sounds like a powerful way to be able to exert some controls with those, those governance controls you were talking about.
But, you know, sometimes it's a case of do as I say, not as I do. Is there anything that you have or any behaviors that you engage in that maybe don't live up to the best privacy practices in the world that you're willing to confess to us today?
Philip Rathle: sure. So I, I, I have a couple, but let me, so one, one of course is recognizing that when you're not paying for the product, you are the product. And, you know, this is all the different social media platforms and, you know, being, being ju judicious there, but I'll call it the one that actually impacts my life more viscerally, which is. Giving out my phone number because I, I have a cell phone. I Just have one. I use it for business and personal. and I travel a lot. And early on I had, you know, business cards that I would give out very liberally that had my phone number on them. And though those, those would end up in some database and I'd end up with, you know, many, many calls per day from vendors selling all kinds of just random things to me, you know, at any time of day or night. Because you know, I'll be in Sydney expecting a call from a driver at 5:00 AM as I'm trying to get to the airport, and it turns out it's someone from. The Bay area trying to sell me some router or, you know, some random thing. Um, and I've still not learned my lesson and, you know, taken the time to work out a, you know, set of virtual numbers that I, you know, some for one time use and not give them out at conferences.
So I'm, I'm definitely behind there. If anyone has any good tips, please reach out to me.
Um.
Arlo: Just don't, just don't reach out by phone.
Philip Rathle: Just don't reach out by phone. No, actually, this call, this call, I would welcome.
Arlo: Yeah. And you know, it's, it's funny, you, you, you mentioned the phone and it immediately takes me back to the examples that you're using of, we know all this information about you here, we know all this information about you here, and it just takes one piece of knowledge to be able to connect those. And that phone number might be one of those pieces.
Philip Rathle: a hundred percent.
Arlo: Well, Philip, thank you very much for joining. And uh, folks, if you'd like to learn more about graph databases, I'd encourage you to go over to Neo4j.com. That's NEO the number four j.com. And, uh, on there they have some great documents. Uh, Philip has been part of building up a thing called the Graph Rag Manifesto, and they also have a graph academy there.
Uh, I would encourage everybody, even if you're not intending to use a graph database, this is a seminal technology that has taken over our planet in many places that you don't realize it. And having an understanding of this technology is gonna be critical to being able to succeed as a privacy and governance professional if you're responsible for making sure that people govern these databases appropriately.
So, Philip, thank you for joining us today. This has been an illuminating conversation.
Philip Rathle: It's been a pleasure, Arlo.
Meet the host
Arlo Gilbert is the host of The Privacy Insider Podcast, CIO and cofounder of Osano, and author of The Privacy Insider Book. A native of Austin, Texas, he has been building software companies for more than twenty-five years in categories including telecom, payments, procurement, and compliance.