We’ve noticed you’re visiting from NZ. Click here to visit our NZS site.
We’ve noticed you’re visiting from NZ. Click here to visit our NZS site.
If your team is using Microsoft Copilot or similar AI tools to analyse qualitative datasets, there is a reasonable chance your conclusions are harder to defend than you think. Not because AI tools are useless, they are not, but because they struggle with large datasets, but won’t tell you.
This can lead to themes without evidence trails, outputs that change if you re-run them, and analysis you cannot reproduce when leadership questions it.
In this practical session, Allen + Clarke's AI team will demo where generic AI tools fail for thematic analysis and give you a six-step method for doing it properly with the tools you have access to and are already using. We will also be honest about the limitation of this method, and what the answer looks like when your dataset is too large or too high-stakes for off-the-shelf tools to handle.
What you'll learn:
Perfect for: Government agencies, local councils, and regulated organisations handling large volumes of consultation submissions, stakeholder surveys, or open-text feedback – especially where leaders or the public will be scrutinising the results.
[Jason]
Kia ora and welcome to today's Allen + Clarke webinar. We're filming in both Australia and New Zealand today so as a customer in Australia I'd like to acknowledge the traditional owners of the lands we're meeting on and pay my respects to elders past, present and emerging. I also pay my respects to tangata whenua and tangata tiriti for those joining us in Aotearoa.
So I'm Jason and for those of us those of you joining us for the first time Allen + Clarke is a consultancy that helps organisations make complex high-stakes decisions with evidence, defend them with confidence and build them to work for the people and communities affected. So we specialise in strategy, change management, evaluation and policy and increasingly in helping organisations figure out how AI fits into that work responsibly. So over the last six months or so we've been working with technology and applied AI experts at Qamcom Research to develop new capabilities and so with me today in the webinar we've got Adam who leads our AI capability at Allen + Clarke and Dr Matthew Walker an AI expert with our technology partner.
So Matthew could you introduce yourself?
[Matthew]
Absolutely, it's been a fascinating, interesting challenge developing tools together. It's really aligned with the sort of things that we do at Qamcom. If you're interested in developing custom data-centric or AI-centric processes or applications then we'd love to talk.
[Adam]
Yeah and kia ora everyone, I'm Adam and I look after AI here at ANC and you may recognise me from other AI webinars that we've run but I spend my days figuring out how we at Alan and Clark can use AI to assist experts both internally and externally. So I come at AI from a practical kind of application angle rather than Matthew's deep expertise in kind of building these systems.
[Jason]
And you fooled us before Adam, can you just confirm it's actually you this time and not an AI?
[Adam]
Yeah no it's this real Adam, not AI Adam like we did previously.
[Jason]
So call in from our Melbourne office. Before we get into it, quick housekeeping, pop your questions into the chat as they come to you. We love taking them as we go and after the session you'll get the recording, the slides and other content that we cover today.
So please don't feel like you have to take notes from anything on the slides. So today's session is how to use AI for thematic analysis of qualitative data. So Adam we've had 400 questions and challenges submitted at registration.
I've been reading through them and it's a very clear pattern. Do you want to take a guess at what the top two concerns were?
[Adam]
Yeah I'm pretty sure Matthew and I know because we would have cheated. We also took a quick squiz at the questions and challenges and I think my takeaway was that the main themes were around time and trust.
[Jason]
Surprisingly that is correct. So about a fifth of you said your biggest challenge has not been able to trust the output, so accuracy, hallucinations, knowing whether what the AI gave you is actually right, were the sort of messages that came through there. And almost a quarter of you said time, so that you don't have enough of it to do the analysis justice.
So we've heard that from senior advisors, evaluators, policy analysts, like across the board from the people doing the work, was that time was a real critical factor. And what we'll talk about today is that those two things are actually in quite critical tension. So the less time you have, the more you're tempted to trust the AI output without checking it, and the less you check it, the less defensible it becomes.
So that tension is what we're going to cover today and it's what the session is all about. So to kick us off, like why do Copilot and ChatGPT fall short? So I've supported too many survey and submissions analysis projects to count, but what I found is that the AI tools struggle because we're trying to retrofit them to tackle the general issues with any large-scale qualitative analysis.
And so on this slide you'll see sort of the eight qualitative data analysis pains that many of you will have experienced when doing this kind of work, and also helps to explain why off-the-shelf AI tools fail to alleviate the pains. So we'll touch on the top few. So the top one there, defensibility.
So this is the one that keeps people up at night and it's the main thing that leaders today highlighted. So your analysis has to withstand ministerial or leader scrutiny, OIA requests, FIA requests, legal challenge, public release. If you used AI and it's a black box, if you can't show which specific submissions support which theme, you've got a career level problem.
So the tool will give you a really clean, very professional sounding, very confident summary, but if you ask it where did this come from, it's normally got nothing, and so it's a claim that's made with no evidence behind it. The second one there, again people talked about is accuracy. So regardless of whether or not the analysis is AI assisted or not, this is critical.
So the issue with off-the-shelf tools is that you must enforce codebook discipline to all the data equally, and where the tools can define themes differently every time you prompt them. So if you run the same data set and you know you do it twice with a slightly different instruction, you'll get different themes, different proportions, and sometimes entirely different findings. So there's no anchor, there's no consistency.
So if you've tried this, you've probably seen across sessions, across batches, across your team, you can't actually replicate the results because each time you do it, it will do a new instance and it will do something slightly different. So you can't reproduce it and you can't defend it.
[Matthew]
Security and sovereignty, that's up next. We put this next because it's heightened by the use of AI. Overseas tools don't meet New Zealand or Australian data expectations, and getting new tools through stringent IT requirements takes time that most teams don't have.
ChatGPT, Claude, they process data offshore. Enterprise Copilot stays in your tenant, but what does that mean? It's not processed in Australia or New Zealand.
Most teams we've talked to, they haven't actually checked where their data goes, not because they don't care, but because the tool is there and the deadline's real. The rest are about the squeeze. You have to resource based on how many responses you think you'll get, and most don't come until the last few days.
This puts massive pressure on deadlines, quality and depth of insight, and off-the-shelf AI doesn't solve this in the way people expect. It doesn't meaningfully reduce your analysis workload.
[Jason]
And we've seen that play out in some of the conversations we've had with people across the government sector. So today we're going to walk through three options for addressing those eight pains. So no matter what tools you have access to, we know everyone here has different types of tool depending on the organisation and what sort of permissions you've managed to wrangle out of your IT team, but we want to make sure that when you leave today there is something that you can use.
So the first one that we'll talk through is our six-step method. So this is a structured process that you can apply on whatever tool you've already got. So whether it's Copilot, ChatGBT, Claude, DeepSeq, whatever your organisation has approved.
Sorry, if you don't know why we're laughing, you can go and Google. How many organisations do you reckon have DeepSeq, Jason? Not many would be my guess, but if you don't know why that's a joke, you can go and Google DeepSeq.
Very capable, but some slight concern. That's a nerd joke. That's an AI nerd joke right there.
So whichever one you're using, it doesn't solve all eight pains, but it does directly address defensibility and accuracy, and so it's something you can start using this week. And so for those of you that wrote in saying that your challenge is knowing where to start, so this six-step method is your starting point. So step one, Matthew.
[Matthew]
This is the most important step. It happens before the AI even touches a single response. Your team needs to define what you're looking for.
A codebook is a structured list of categories, each with a clear name, a description of what it covers and what it doesn't cover, and at least one example. It's built from your research questions and policy content. It's not from the data itself.
So this is the intellectual work of the analyst. The AI applies it. You own it.
[Jason]
Yeah, and that's exactly right. And one of the principal analysts wrote in saying that they had concerns that they might miss a theme, so they always do a framework first, and so when we read that, we sort of said, that instinct, absolutely right. So the codebook is the thing that makes everything else reviewable.
Without it, every subsequent output is just, you know, stuff that the AI said. So with a codebook and taking this approach, you can actually check whether the coding makes sense against the defined standard, and that becomes really important as you work through. One of the other benefits, and this is the same for human and AI assisted, is that it forces your team to actually agree on what the analysis must answer before the time pressure forces shortcuts.
So that alignment conversation, you know, have it at the start, and then it saves an enormous amount of time later on. So jumping to whatever's interested, how do you build one? So first, review the consultation questions, really important, and have your policy context discussion with your lead analyst.
So really do spend some time on this, half a day. If you have responses to your survey or your, you know, consultation document already, read a sample of them, 15, 20, like start to read them, understand the language that people are using, you know, surface the types of things they're talking about, and then use that to then work through the process further. If you don't have submissions already, if you're doing this in advance, which is often something that we do recommend, get ahead of it, you sort of look to the end product.
What do you actually need? What decisions you need to make? Who needs them?
And then work backwards from that, and you start building your code books around those key questions that you're going to need to answer from the data. So, you know, as you work through, so take each code, give it a label, you know, one paragraph description, it has to be clear, people have to be able to apply them, whether it's the AI or a human. We like to do inclusion criteria, so what's in, exclusion criteria, what's out, it just becomes a really obvious way to find where the edge is, and then ideally include a verbatim example.
So whether you make that up, have chatGBT, create some mock responses, but do put in an example of what you're trying to find, that really helps whether it's a human or an AI. And then finally, really, really crucially, test it. So have, if you're going to use AI, have two analysts use the six-step method to code 10 submissions independently and then compare.
So disagreements with the outputs show whether the definitions need tightening. And again, that's the same whether it's humans or AI, it really is a similar process around how you enforce the discipline.
[Adam]
Yeah, that's right. Unfortunately, I have to be the party pooper and talk about the limitations here. So the codebook obviously captures what you anticipated.
And obviously, for open consultations, there'll be things in the data that you didn't expect. So you need, if you're using AI, then you need to build in an explicit does not fit or not applicable category, so that your AI surfaces these for human review. And so you can kind of identify those emerging things.
Otherwise, what you'll find is that the AI will silently miscode or not code them at all. And these are kind of important and valuable insights that you can get. So yeah, I'll hand over back to Matthew for step two.
[Matthew]
Yeah, so for our next step in the process, there's one analytical question at a time. For surveys, this is probably more clear. But for consultation on a bill, you'll have analytical questions like what's the submitter's position on the bill?
Do they oppose it? Do they support it? What specific clauses do they comment on?
The temptation might be to throw all those questions into one prompt. It sounds super efficient. But the problem is the model gets confused.
It mixes up what it's trying to do. A concern gets coded as a recommendation, for example. The outlook looks entirely plausible, but it's not clean.
So this step requires you to run one analytical question at a time. The first question might be, what's the submitter's position on the bill? The second question might be, do they talk about clause 43?
Each question gets its own master or system prompt, its own codebook subset, and then its own clean output.
[Jason]
And the limitation here is real. So submitters don't write in neat boxes a long form submission on the bill. It might address position, clauses, concerns, recommendations, all in a single paragraph, let alone through the 50 pages that they might have submitted.
And so that is why, though, you need to process one question at a time, because you have to look at that data each time to extract the question that you want to do.
[Matthew]
The next step is working in small batches. The step directly addresses the biggest issue many of you raised, accuracy. The issue you're experiencing is linked to a combination of content and token limits.
It's about putting too much in for the AI to think about in one go. Just like humans, the more information you give the AI, the more it struggles to accuracy, accurately analyze it. There are a few research papers in this space.
But the basic idea is there's a lost in the middle space. The AI pays a lot of attention to what's happening at the beginning. It pays a lot of attention to what happens at the end.
But the stuff in the middle, just like a human, the AI can sometimes just not pay a lot of attention to. So you might be tempted to load all 200 responses into your prompt. That's just going to overload it.
Don't do that. Instead, you need to break down your data set into smaller batches. In practice, this means for open text survey questions, you can safely process maybe 20 responses at a time.
For responses that run into pages, you want to limit that down even further. And we've found that the best solution in all cases is to process just one response at a time.
[Adam]
Yeah. And, you know, processing one response at a time comes with serious limitations, obviously. Because, obviously, we're trying to reach that accuracy in our processing.
But processing one response at a time means that for your team, it's a very repetitive logistical task. Again, annoyingly, at the moment, one response at a time is really the only way to ensure your AI is accurately and consistently applying that codebook. And, obviously, we're aiming for kind of human level or above coding accuracy.
So that's why we kind of hammer that message home.
[Matthew]
So next on the list, step four, we've got the requiring AI to produce structured output. This tip is for everyone who told us their challenge is not spending so much time on QA that using AI was a waste of time. That's a direct quote from a principal evaluator, and it's absolutely right.
What you want to do here is require the output in Excel, for example. You want columns, submission ID, the code assigned, a key quote from the text, and its rationale. The key quote and the rationale are as close as you're going to get as to an audit trail when you're dealing with these generic tools.
Set up your tracking spreadsheet before the first batch runs, not after. By the end, you'll have your coded dataset across all questions with rationale notes and source quotes attached to every coding decision.
[Adam]
Yeah, and I'll just, before you move on, Jason, I'll just jump in quickly here with another limitation. We did promise to be honest, so here we are. You know, whilst having that rationale and the quote column in there makes QA much easier and faster, the limitation is that those don't necessarily mean that the code is absolutely accurate.
So, to explain that, and Jason will tell me to shut up soon because I'll go on, but you'll all be familiar with the word hallucination. It essentially is where an AI makes a mistake or produces content that is false, but it's confidently false. For leading AI models at the moment, these rates are reported to be as low as kind of 1% to 2%, and the long-term trend is that these are going down towards zero.
[Jason]
And if the two people that want to talk about that further can reach out to Adam and Matthew, but, you know, what does that mean for us?
[Adam]
I'll just quickly explain what that means, though, for this. Thank you, Jason. I hear you.
Hurry up. Got it. So, what that actually means for your work is that it's very likely that a small portion of your codes will be misattributed or essentially hallucinated.
The challenge here is because the hallucination rate is so low, that 1% to 2% mark, in order to find those hallucinated codes, you essentially have to recode the whole dataset. So, obviously, I'm sorry to say that that is a limitation you can't resolve with off-the-shelf tools. But I would highlight that a traditional kind of human coder approach will, of course, have an error rate as well.
So, I wouldn't suggest that this is a deal breaker for the use of AI.
[Jason]
And we are on a tight time today, so we won't go through the remaining two steps. And that does not mean that they're not important. In fact, these are probably the most important steps of the whole process.
But the people registered today, like you guys, are mostly qualitative data experts, so you'll be adept at those steps from your normal practice. We do have some tips for you, though, which are in the slide pack you can download afterwards, so be sure to grab those from the webpage and we'll send you a link afterwards. So, that's the six-step method.
So, codebook first, one question at a time, small batches, structured outputs, spot check as you go, human synthesis at the end. So, done properly, this works and will produce substantially better and more defensible output than anything you'd get from dumping your data into Copilot and asking for themes. So, we've got a quick demo now of how the first four steps all link together, which we'll play now.
[Adam in Voice Over]
Hello everybody, this is a quick demo of the first four steps of our six-step method in action. I'm not going to cover step five, QA or spot check as you go, and step six, human synthesis. You're all qual data experts, so you should know what to do there.
So, what I've got here is the prompt, and I'll just quickly talk you through it. So, this first section here is the system prompt. This tells your AI tool how to behave and what the task is.
The next section of your prompt is that all-important step one from our six-step method, the codebook. So, each one of these is a code, so you can see here this one's support, and it goes through our recommended structure, and for each code, you get the code and the structure. Then at the end here, this is the part, the all-important step four from our six-step method.
This tells your AI the format that you want the output in, and that's what gives it that nice Excel format. You can download this prompt from the webpage after the webinar. So, we're just going to copy that, and we're going to drop it into Copilot here.
Now, what we're going to do is we're going to grab a couple of submissions, and we'll drop them in. So, these are PDFs. Now, again, we're doing it in small batches.
That's that all-important step three to process your data in small batches. If you are using Copilot, you could, of course, instead of attaching the files, you could link it through to a SharePoint folder which has your submissions in it. Again, you need to be careful about the batch size, so just be wary of that.
We're just doing this as a demo in the chat, so we're going to push go on that one. Now, to speed up the demo, I've already run this data just before. So, you can see here, here is our output Excel sheet.
So, you've got the response ID, the text, so it's extracted the text from the PDFs. It's assigned a code based on our codebook. This is the reference text, so this should be a direct quote from the response text that led it to apply this code, and then it gives you its rationale.
Why did it apply that code, and obviously, why is this reference text important? So, that is just a quick demo of those first four steps. Obviously, we've only applied one question, which is, you know, does the submission support or oppose, et cetera, the proposed changes?
So, those are the four steps in action, and now I'll pass back to the studio.
[Jason]
And we're back. So, you can download the prompt and the demo codebook from the web page, and one thing just to add to that, one thing when you send out the codebooks to give attention to is, do you want the two themes or the codebooks to be mutually exclusive or not? If you want clear stacked things, you have to make them mutually exclusive.
If you can have the same data in multiple places, then go for, like, let them bleed over, but do think carefully about how you want to set them up. But what that clip did show you is the first four steps and how they get applied, but, you know, key takeaway, look at what doing it properly actually involves. So, for 200 submissions across four analytical questions, you're looking at dozens of batches, continuous spot checking, codebook refinement, a lot of data logistics, and then the synthesis on top.
So, with two analysts in two weeks, achievable, but unless your submissions run into pages and links, this full process is unlikely to deliver much speed or quality benefits over a human-led process. So, we've confirmed that forcing the AI to not make mistakes is hard, and managing the process to ensure accuracy and defensibility takes a lot of effort. So, where does that leave us?
[Adam]
Yeah. Obviously, if you've been to our webinars before, we're not going to just leave you hanging there. So, that's where the next option on our list comes into it.
So, if, you know, you'll be watching, there'll be some people out there who are probably a bit more AI experienced who are thinking, hey, there's probably there's definitely ways I can speed that six-step method up, and there absolutely is. Essentially, you can take away the admin-heavy kind of data management process, which will work to up to about a thousand-ish responses. And again, that's the second option, which we call the DIY processor.
So, to build that, you essentially have to have access to the full feature set of the most advanced AI tools, such as Claude. So, as I said, you can build your own tool that will do steps three and four, which is three being the small batch processing and four being the structured outputs for you. So, in a moment, we'll demo what that looks like.
But for reference, what we've built and what we'll show you is something that put together in less than an hour that's purely for the purposes of this webinar. So, we can show you kind of the upper limit of what today's AI tools are capable of, again, if you're using kind of the leading products. Tons of you will ask if we can give it away.
We did debate that, but we decided not to, because this is literally Adam smashing his keyboard for an hour. So, there's no, it's not been QA'd, tested, it's not been used on actual work. So, that's the reason we're not giving it away, because we'd hate for somebody to take it, think it's a finished product, use it, and then find that there's bugs and other such things in it.
So, that's enough sandbagging. Let's see that demo.
[Adam in Voice Over]
Okay, everyone, here is the quick demo of the DIY processor in action, and this is kind of the upper limit of what you can achieve in AI tools at the moment as kind of an average user.
So, you can see this is our six-step method. You've seen just before how to do steps one to four in Copilot, and what you can imagine happening is you're going to end up having to put that prompt in, add more submissions, run that, et cetera, and it's going to be quite a repetitive process for you. So, here's where the DIY processor comes in.
So, it can really help you with steps three and four, because it can automate those steps. So, we'll jump over to the processor now, and I'll quickly show you how it works. So, this is the actual processor that we quickly whipped up for this webinar.
So, it only takes about an hour to put this together. Now, of course, we've not QA'd this or anything like that. This is just a quick demo to show you what's possible.
Obviously, you would QA it, test it, et cetera, before you started using it on real work. Anyway, so, we've got to add our codes. You can see here you could manually add the codes in using our recommended codebook structure for AI.
I've already kind of prepared a codebook here just to speed up the demo. So, here is our demo codebook here, and you can see it's loaded in those same codes as we used in the last copilot four-step, first four steps demo. So, it's loaded them in.
Great. So, now we're good to go. So, we'll click next.
Now, it's looking for the data. So, we will chuck the data in here, and I've got a couple of data sets here. So, we're just going to choose this one, which is a messy one.
Now, we've got to tell it what columns to look at. So, we choose that, and then it just gives us a quick exam preview of what the data looks like. So, we're going to confirm that, and now we're going to start processing.
So, while that's processing, we're So, just like with our copilot example that I showed you before, the processor gives you the structured output exactly the same. The difference here is that, you know, it's coded the 21 submissions that we have and put them all into one Excel file for me. So, it's quick and clean, and if we jump back to the processor, let's check on the progress, and you can see it's already completed all 21 of those codes for us.
So, that's how quick it is, and obviously much faster than manually doing it in copilot. So, you know, let's just quickly check that it's done a reasonable job. So, this was our copilot four-step demo that we did just before.
So, first three are opposed, conditional, and unstated. So, the submissions are exactly the same, and then what have we got here? Opposed, conditional, unstated.
So, they match. So, we can see our processor is working pretty well. It's matching the same as those copilot ones.
That's the processor in action. Back to you guys in the studio.
[Jason]
It's pretty mind-blowing to see what we can create in just an hour. Matthew is our resident AI expert. So 12 months ago, how long would it have taken to make something like that, even with some of the bit of expertise?
[Matthew]
Yeah, I mean, Adam, you gave yourself lots of space there, but I was really impressed with what you produced. Before 2020, it wasn't even a thing, right? AI couldn't understand human text.
But even, I would say, before November last year, it would have taken weeks or months to produce what you produced. I think it's crazy.
[Jason]
And again, a great reminder of where we're at in the AI development process and how fast it is moving. So yeah, we often talk about November last year as a really big delineator, but even February this year, I felt like we had a massive jump forward. So those jumps are happening, but a lot of it's around the application rather than the technology as well.
So Adam is obviously more on the power user end of the spectrum, but he did vibe code that in an hour, which is pretty amazing. And so a reminder to you, non-power users, that the point of that is not to show that you must build this, but it's to show the upper limits of what you can do with off-the-shelf tools. So just to sum up where we are, so if you're working with off-the-shelf tools, the six-step method gives you a structured process you can apply immediately.
And if you go down the DIY processor route, you can sort of automate the most repetitive parts of that process today. Both of those do have a real ceiling. The six-step method requires your team to manage the entire process.
The DIY processor caps out about 200 submissions per run or about 1,000 total. Neither totally solves accuracy or auditability or defensibility. And of course, they do not solve for data sovereignty.
That data in that instance was through Claude, so Europe, you know, America maybe, you know. So there's still some issues and limits with it. So what happens when you expect between 1,000 or a million submissions?
You know, a tight deadline. Fine, is it going to the CEO or minister and will face public scrutiny?
[Matthew]
Well, that's exactly why we built Thematic. A million submissions is going to stress me out, but this is the third option. The reason we're showing it to you is you can see what's possible when a system is purpose-built for this work.
Thematic is our human-led, AI-assisted analysis system. It's built specifically for high stakes, high volume, qualitative data. We're finding they must be defensible.
[Jason]
And so we worked with Qamcom to build this because we had the exact same problems you do. So we handle large volumes of unstructured qualitative data for our clients. So consultations submissions, stakeholder surveys, open text feedback, and we're experiencing those same eight pains.
So defensibility under scrutiny, always accuracy, especially at scale, massive. Sovereign data requirements, increasingly something that's front and centre in many parts of the public's minds. So always a feature, immovable deadlines.
So all those things meant that we tried to make off-the-shelf tools work, but we hit that same ceiling that you're hitting or that we've demonstrated today. And so we then actually went to build a system that addresses all eight of those qualitative analysis pains from the ground up. And so we can now show you how that works.
[Voice Over]
When you've got thousands of qualitative responses and tight deadlines, traditional coding can't keep up. Nuance gets lost and findings become harder to defend. That's the problem Thematic was built to solve.
Thematic is a human-led AI-assisted analysis system designed to turn large volumes of qualitative text into decision-ready insight. The entire system is built to stand up to the expectations of ministers, CEOs, and the public, which is why every theme links directly back to the source text, providing an audit trail that you can stand behind. Not only is Thematic accurate, but it accelerates the entire qualitative coding process from data extraction and sorting to coding and data delivery.
So you can do the important part, understanding what the data means and making sound decisions. Here's how it works. We upload a range of files containing your qualitative data, such as PDF submissions, survey response spreadsheets, or interview transcriptions.
Thematic then automatically extracts the text from your data, retaining important formatting such as headings, bold, and strikethrough text. Attachments are automatically associated with submissions. Submitter names, document codes, and other information are also collated.
Next up are codebooks. Our expert team work alongside yours to help you develop the AI-first codebooks that Thematic uses. You can upload an unlimited number of codebooks, ensuring you uncover every theme from your dataset.
What normally takes days, weeks, or months of manual coding, Thematic can process in minutes or hours. For your security, data is processed and stored in-country. New Zealand data is kept in New Zealand and Australian data is kept in Australia.
And it's permanently deleted upon project completion, meeting data sovereignty requirements and reducing procurement and reputational risk. If you're working with large-scale qualitative data, Thematic can accelerate your analysis without sacrificing rigour, nuance, or accountability. Thematic, talk to us about your dataset.
[Matthew]
Thematic uses the six-step method we spoke about as the base. It's codebook-led, human-verified structured outputs. The difference is scale, rigour, and accountability.
Every theme accurately traces back to the source text due to the bespoke processing methods. The QA is embedded throughout and includes line sampling. And your data stays in-country.
So if you're in New Zealand, New Zealand data stays inside New Zealand, and Australian data stays inside Australian infrastructure. The system is configured for every project and it's decommissioned at completion. And it's already being used on high-stakes government work right now.
[Adam]
Yeah, so look, we've covered a lot of ground. I know there is tons of questions, so we're going to get to those in just a sec. But we don't want to leave you with just those three options.
We want to help you kind of decide when you should use those options. So on the slide is a table to help you do just that. So essentially, if your dataset is a few hundred responses, you've got analyst time and kind of the risk is lower, and then the six-step method is your starting point.
If your dataset is kind of a few hundred to a thousand responses, and or you need to move faster, then I definitely recommend creating a DIY process for yourself or your organisation. It'll automate that repetitive part that we showed earlier. And of course, if you're dealing with thousands or a million, eh, Matthew, submissions, you know, really tight deadlines and your outputs, your coding faces serious scrutiny, that's where purpose-built tools kind of need to be used.
So we obviously have one thematic, but, you know, you could obviously build one yourself. And that just helps ensure you're hitting that kind of those eight pains well. So that's kind of it.
But before we leave you with that, I just want to say that, obviously, we are, if you're not sure what your situation is, and you're not sure exactly when you should be using which option, we're happy to talk. I've done tons of these webinars. I'm a giant AI nerd.
People message me, ring me all the time to have a quick chat. I know Matthew is exactly the same. So is Jason.
So if you want to chat about your dataset or about how you might use AI in your workplace, just give us a shout. We'd love to chat. I think the team are popping a button on your screen now.
Of course, while we do have a solution, it's not about trying to sell you something. It's more just about trying to help you.
[Jason]
So we'll get to some of your questions now. So if we flick through the first one. So somebody's sent to the question, how do I know that AI's output is accurate and not hallucinated?
We touched on this briefly, but Adam, if you give us a quick rundown.
[Adam]
Yeah, look, the best way is the QA and blind sampling. As we touched on earlier, the challenge there is what that does is that helps check whether your codebook is well structured, but it won't help you find those hallucinations if you're using off the shelf tools. Again, if you're using a kind of custom built solution, then that will have taken these issues into account.
[Jason]
Thanks very much. So next question, how do I stop the analysis losing nuance or eclipsing minority voices? I'll answer that one.
So this is something that we care deeply about when we run any submissions process, whether it's manual or AI. And the risk is real. So AI tools optimize for frequency.
So they surface what appears the most often. But in qualitative data, significance is definitely not the same as frequency. And so a single submission from a directly affected community could be more important relatively than 200 submissions from a well-organized campaign.
So the six-step method approaches this in two different ways. So one is it has things like do not fit flags or ways to surface the information that doesn't align with what you might have been expecting. And the second step, human synthesis.
So really, really important. So we want your analysts to apply judgment. So we don't want to outsource the full process to the AI.
We want to speed up the human subject matter experts to make those decisions, to be served that data, to then make that decision around weighting.
[Matthew]
Yeah, absolutely. You want to be superpowering the humans. You don't want to be replacing them.
[Jason]
Moving on to some live questions we've had in the chat. Please do keep them going. We'll answer as many as we can.
So Chloe has said, you said something like, does the submission mention section 12 on enforcement? Will AI only pick it up if section 12 is written in the submission? What if they say S12 or something more vague?
[Matthew]
The AI is absolutely capable of pulling out S12, given the context of a submission analysis. That's certainly not the problem. Yeah.
[Adam]
I think that comes down to the codebook, right, Matthew? If your code for that obviously wouldn't just say, does the submitter mention section 12? You would go on to describe what section 12 is, or the theme of section 12.
And then you would have, again, inclusion, exclusion criteria, and an example. And those things combined mean that the submitter doesn't have to say section 12. They just have to talk generally about the theme of section 12.
And the AI tool will go, oh, yeah, that's section 12. Bam, section 12.
[Jason]
Or you could set up a book that said, I only want when someone explicitly says section 12. And that would be, you know, we have done that sort of thing as well.
[Adam]
Just one more point on that. One of the cool things about using AI, and I know Jason loves this, so I'll mention it, is that there is no real upper limit to the number of themes that you can extract from your data. The upper limit really comes down to how much time do you have to put together a codebook?
So I think one example where we've used our thematic, our custom tool, I think we had like 180 or something codes, didn't we, Jason? To kind of, and as you can imagine, you know, if you gave that to a human coding team, I mean, that would just be an impossible volume of work. But for the AI system, you can be like, oh, look, you know, I've discovered this emergent theme, create a codebook, push the button, and it will run, it will analyze or code that data.
[Jason]
Yeah. So a question here from Sean. Do you have any advice for how organizations running consultation processes should design their data collection instruments to enable more efficient analysis via AI?
Who wants to take that one?
[Matthew]
One of the things that I would really like to see more of is open text questions. So in surveys, for example, you might see, do you agree or disagree? And people might be able to select yes or no.
But that just drops all the nuance. For me, I think we're at a point where we're beyond the yes, no kind of questions. I'd like to see much more rich text because that's the sort of thing that's going to give us understanding about why people have these positions.
[Jason]
Yeah. And I think the way that the tools and the steps process the design is that it should be able to take unstructured qualitative text. So yes, headings will make it easier, but people often don't follow headings.
And as everybody on screen knows, a peak body will ignore your survey instrument and send you an email no matter what. So there's always going to be some sort of edge cases. So yes, you can make it easier.
But I think it's more around trying to design a process that doesn't be limited by where that human has decided to write their information and make sure it can be extracted no matter what.
[Adam]
Yeah, I'll just quickly add to that. Matthew, you can probably talk about this a bit better. The system that we've built really just kind of handles the data for you.
So you kind of just dump it in and it figures out and extracts the data. So that might be PDFs, poorly formatted PDFs, might be submissions with attachments, et cetera.
[Jason]
Yeah. And the admin part is one of my favorite features about the tool is removing some of the admin overhead. Question from Dan, is AI more likely to hallucinate numbers or direct quotes?
[Matthew]
Oh, it's a hard question. I wouldn't say there was one specific type of text that it's going to hallucinate. But the thing is, you want to be able to detect it.
You want to know when it has hallucinated.
[Jason]
Russell said, I find these off-the-shelf AI take the prompt as a strong suggestion, even when I make some of the absolute rules absolute. So is there a way to further underscore to the model absolute must not?
[Adam]
Hmm, that's interesting, because I guess it depends on your setup. So one way, if you're using Copilot or something like that, is if you have access to agents, then you can enforce kind of system rules in your agent setup. If you go, that's a terribly complex answer.
Sorry, Russell. If you go back to one of our webinars, Wrecking My Brain, what is it called? It's called 4 Tools.
There's a, it's on our website, it's 4 AI Tools. There's actually a section in that that talks about agents in Copilot and projects in ChatGPT, etc. So I'd take a quick look at that, because you can essentially install system prompts using that feature.
[Matthew]
I would have suggested that you could also use AI to solve this problem. So if you've got a prompt, you've asked a question, it's produced an answer. Why don't you use AI to check whether or not it's followed your rules?
You can get a binary response. These are the rules I've specified. Does this answer satisfy the rules?
Yes or no? And if it doesn't, do it again.
[Jason]
In getting started with AI, we talked a lot about using AI to prompt the AI. The AI knows a lot about the AI. So ask it the question of how do I put some absolute rules in and see what you can then work out and test.
But do test it. Anita says, do you ever differentiate between analytics codes and the more operational flags, e.g. for responses requiring follow-up or further review? If so, any hot tips on doing this?
[Matthew]
That could be a codebook in itself. Does this response require follow-up? Have they asked for interaction with somebody?
Have they said, I would like to talk to someone?
[Jason]
Or even if it's something complex, like we talked briefly around Te Reo Māori earlier, like you might want to flag all the ones that use a certain amount of Te Reo Māori to be flagged for review by your expert team. So you can set up a codebook to just flag that this one meets the threshold for checking. This one talks about international jurisdictions.
Therefore, Andy wants to read that one. So you can create a codebook to either create the data or flag the data or whatever you want. So you can, the codebook is just a tool to force the AI to do one thing.
That's the secret sauce. Yeah. Question from Jade.
How should researchers approach informed consent when AI is used in research analysis? And what practical steps can research teams take to explain AI use in plain language so that participant consent remains meaningful and genuinely informed?
[Adam]
Oh, jeepers. You're probably best asked, Jase.
[Jason]
I think it's an emerging area. One of the things we have discussed is that whole data sovereignty part. So we've had engagements where the engaged parties did not want their data to be sent overseas.
And so it really depends a lot on what you're doing, what sort of data you're collecting. Is it really sensitive patient data that you probably shouldn't send to USA, for example? I think it's a really sticky question.
We can probably, you know, we can have one of our ethics and AI consent people sort of have a discussion with you, but really important thing to discuss. And we should probably think about what we can put out in that space in the future. Connected to security as well.
[Adam]
That might be a good topic for us to tackle in another session or something, Jase.
[Jason]
Yeah, and the public's expectations are shifting so fast as well. So I know a year ago, two years ago, if you said AI was gonna do my submission, you have a bit of an uproar. These days, it's how can I show you that it's gonna be accurate?
There is shifting, I guess, permission settings, but it's been super transparent with what's happening with it, where it's going is super, super important. There you go, Jase, you might have sparked the first webinar from a question. That's good.
So that brings us to the end of our scheduled time. So we are keen to keep answering questions for those that can stay. But before we continue, I wanted to let you know about our next webinar, which is a restream, Mastering Business Change, Approaches to Navigate Change and Thrive.
So technological, government, or customer change is constant. So how do you effectively lead your organisation or team through this? And as a team member, how can you be prepared?
So join us for that restream and hear from our change experts as they explore the capabilities and approaches needed to understand and capitalise. Onscreen so I think it should be a button to register now So you can do that. And otherwise, we'll go back to answering some more questions for those that don't have to dart back to work We've got a question here from Greg So can these techniques be used to theme submissions in more depth e.g Opposes due to cost opposes due to practicalities opposes use the morals etc And then have the submissions been able to be assigned to multiple themes and to short answer. Yes, absolutely Longer answer we talked a little bit.
We sort of skipped it in the video Mentioned briefly is that you can have Codebooks to be mutually exclusive or to overlap and it really depends on what you're trying to do So all of those things can absolutely be set up as code books But you could also set up an overarching code book that says do they oppose this proposed change? You know and then either take that data set or you know Do a second one that says did they oppose this change due to practicalities do they change duty cost? So yes, absolutely By using that code book structure You can build a code book that specifically asked that question and extracts that information as you work down or if your data See, it's not huge that first code.
Does it oppose and then manually? It's not that hard once you've got these are the people that oppose this proposal. There's only 50 I can just work through and do that manually.
It just really depends on you know, how about your data set? What are you trying to do?
[Matthew]
But yes possible work out the best way if the data set is large then your first code book to find all of those who oppose Reduces the total data set. So then the next code book on what was the question opposed you to practicalities? So you might have a data set just on a smaller data that runs on that smaller data set.
[Jason]
Yeah and Question from Tomosi Tomochi Sorry, if I've mispronounced your name and with the six-step method before doing analysis Is it important to train AI with the evaluand context? And if so, what info is useful and how much information do you provide the AI? Short answer is yes.
[Matthew]
Absolutely the question of what and how much is actually quite an interesting one I don't know if you want to quickly cover that Especially on government data we found that many clients are not keen that we train AI on their data So it's not a requirement to train AI on a specific data set. In fact the general AI solutions May be sufficient just out of the box But yeah, we could definitely talk about fine-tuning and I think there's but it's two parts of that question
[Jason]
So that if you Matthew's talking about actually training the instance of the AI to understand really deeply what you're talking about and There's also if you're just using an off-the-shelf Claude or co-pilot to give it context about what you're doing So yes, absolutely give it context what you're doing You can go further and actually train you can provide instances of these types of submissions is what you're looking for These are the nuances.
This is what council say. This is what peak bodies say. That's the Hardcore end of the spectrum, but if you're just using the off-the-shelf tools, yes, you give it context How much too much overwhelms it not enough misses the point finding that sweet spot is actually it's it's quite tricky And just you do have to test a little bit.
[Adam]
Yeah, and I'll just quickly add to that if you download the Example prompt that was used in the demo of the six-step method. You can see that We're in the in that structure that we've added like a little bit of context Around that you mentioned there So I'd recommend taking a quick squiz of that
[Jason]
Message from Laura Do you have a reference on mapping the Microsoft 365 tools that can be used in each of the six steps e.g Power Automate co-pilot, etc
[Adam]
We don't currently But I mean we could we could put one together kind of full full disclosure. We have co-pilot internally But we don't actually use it We we use Claude Because it's better Sorry But we could we could absolutely put something together I assume Laura that you must be a reasonably Reasonable power user if you're talking about things like power automate, etc because you start getting into the use of graph and some other Microsoft nerdy Microsoft things So, yeah, I think why don't you give me a shout and we can have a quick chat about it
[Jason]
Yeah, and one thing we've done previously, for example, we've had a whole bunch of PDFs. We've used script to convert that into a single Excel sheet, you know things like that We've done inside the Microsoft suite with a you know, someone like like a power user like Adam, for example Um question from Chloe Should we wait until AI has evolved a couple more generations before we start using it broadly for this type of Mahi? It's sounding on balance.
That's not at the human analysis level yet
[Matthew]
I would say it was a human analysis level already the
[Jason]
it's definitely the Capability is there the the issue is the size of the context window and that Challenges why you have very smart people like Matthew will build it you and spoke to and off-the-shelf tools that context window Like I don't you know, whether that is something that's going to change in the short term is probably you know We don't know there was these ways to design around it But it's not the capability of the tool to be as good as a human this the the limit is the The way that you force it to do.
[Matthew]
Yeah, what you need to maybe also depends on the type of work You've actually got if if you've got tens of thousands of documents your humans are going to get tired pretty quickly I'd say that the AI and that situation was was ideal. Yeah
[Adam]
If you think about the DIY processor, for example, Chloe, you know, I mean that could that could punch out, you know, you're for 600 Codes and like in in four hours, you know There's no way that you could have a have a team that have one analyst Do that and of course while the DIY processor is doing its thing you're doing other work So so, you know comparative there really isn't much comparison in terms of speed And I would say our comments around hallucinations and that sort of stuff Well within what you would expect an error rate for humans to be
[Jason]
Yeah, and so yeah I think it's we don't know what's coming down the pipeline in terms of like the next leaps a lot of the changes at the moment Taking the existing capability and making it usable in different ways So like, you know putting more tools into Claude for example co-work all those other new things which take that technology and allow you to apply In new ways, you know set up agents and then have agents running agents Which gives you know agents all the way down which thing you have 15 Matthews, you know doing things so Those aren't capability Increases as much as like the context that you apply it and is shifting really really fast Good question Ian manual entering data into copilot is quite unfriendly Does the machine learning LLM's libraries available or Python so I can use programming to automate
[Matthew]
absolutely Python especially It's extremely easy.
Just ask the the AI to write some Python code for you out or pop Yeah
[Jason]
But we had to talk about that like a lot of the things are more Available to you if you have looser IT requirements and can run Python or react or bash all these other things I'm not sure on specific government agencies and how they are configured But there is there is ways to do it if you are a power user and you have access to those tools
[Adam]
Yeah, exactly. I mean that's in that's exact. I mean we showed The DIY processor just doing a part of that But you know, there's no reason why with another hour the DIY processor couldn't have have handled the the date The data entry as well or the data ingest and ingesting
[Jason]
Ronald says any ideas about using this as the game changer for responsive Organizations using the AI to monitor unstructured text each your client feedback to improvement of services rather than just like a one-off consultation That's a great question, right?
That's something we've been turning our minds to we've we've literally had discussions in the last sort of couple of weeks You know things like if you're a regulator and you want to understood how a change work How can you you know remove the fear of engaging from you know? Because of the time the cost, you know the use of analyst time How do you use these tools to superpower the humans but de-risk some of that ongoing engagement with people? so absolutely, I'm on board with the the use of You know AI to unlock better engagement between government business industry communities People that have been on the receiving end of a regular regulatory change.
Yes, absolutely and then Matthew mentioned earlier we have a survey webinar which talks a lot about how you If you're trying to get comparable data, you do need to have follow the rigor of how you make sure that happens That's still going to be a thing But what these tools unlocked and we do have some other technology coming down the pipeline around how do you unlock that? Qualitative why at scale? So how do you remove some of that, you know fear that?
Can come from the cost of engaging on an ongoing basis. So yes, great question. Absolutely.
It can be a game-changer and And So we're going to go back to a Question that someone wrote in so how do I write effective prompts for this kind of analysis? So who wants to take on that one?
[Adam]
I can I can take that. So if you follow the six-step method, so Prompting actually becomes the least important part. So as we keep banging on about it feels like The code book is is doing a lot of the heavy lifting and if you're asking Specifically, you know, what does a prompt for the 60th method look like?
There'll be a link coming out here to the web page on there is literally a word document You can download it and that is the exact prompt that was used in that demo of the 60th method Obviously, it's cut It's made up for the purposes of the demo, but it gives you the kind of structure etc that we'd recommend
[Jason]
And that the importance of prompts is shifting like, you know If you did this a year ago The prompt was kind of like this really important part of the process with the capabilities of the newer tools It's far less important like the machine can pick up nuance and context and understand human text to a much much greater level Yeah And how I can handle data with today or multi or multilingual content and it's the question you've turned your mind to it Great length.
[Matthew]
Well, we we did this together. I really enjoyed this we pulled out submissions that Alan and Clark had already found were in Tara and we asked thematic to Detect whether or not it's in English or in Ontario and if I remember rightly we got 96% recall, so I'd say that was a pretty strong
[Jason]
Capability and then we then through the full process the machine 96% and it's it's slightly more complex than you may think Trying to work out whether a whole submission half a submission.
Is it just a fuck a toki in a mahi and then in English? How do you want to deal with it? There's a lot actually goes into whether something is in today or another language or not Which we can discuss in great length if you would like to but I'm assuming most of you don't want to but please do Reach out if you have that question, but yes, the machines can read other languages.
They can read today Oh one thing we do often do is if you're in a ministry, for example You probably have some experts and engaging with you know, multi stakeholders from uniquely multi perspectives We do often flag that content to then go into a different pathway And so that is something we talked about the flags earlier in one of the questions You know if something meets a certain threshold, maybe it's from a certain type of submitter Maybe it's it's got a certain amount of the text in today You can ask it to flag to say this needs to be escalated and raised to our separate team So you can do stuff like that too, but it can read Essentially any language, you know, I don't have any limitation Your analyst may not be able to read the output, but the machine can read the you know This why he or whatever that goes into it How do I avoid bias in the analysis?
[Adam]
The order Who's gonna take that? Do you want to take that Matthew or I can go to go? Yeah, if you want to go for gold Yes, so bias and thematic analysis that we've described comes from two places.
So The code book three places the code book the coda and the AI tool itself So, of course, what that means is that AI doesn't eliminate bias It can amplify it and so especially if your code book has blind spots So the protections are kind of built into the six-step method and that comes down to their Sampling of data your QA process, etc.
[Matthew]
So Yeah, just like humans The AI has biases built in and we have to you know, kind of build human in the loop process to to manage that You could also look at using different AI solutions Yeah, there might be bias that's specific inside Claude and that bias might be different inside Chat GPT it might be different in the Chinese model deep seek. Maybe look at applying many models and See if you can see the bias Exposed that way.
[Adam]
Yeah, great.
[Jason]
Great addition and And question here it says what about unstructured or meaty data messy data transcripts mixed formats inconsistent responses something you can I think that The eyes is very very capable of handling messy data.
[Matthew]
The people are Putting putting their ideas together and it's it's Not the the perfect Prose, it's it's still able to pull out the the critical information
[Jason]
I feel yeah, and they the mix formats like a lot of it depend on the tool But you know the standard sort of PDFs Excel words There's no real issue with the machines understanding or reading any of those It can be more efficient for you to standardize them We like we talked about using a Python to turn everything into Excel For example, if you're using an off-the-shelf off-the-shelf tool And but it's yeah, the machines are generally capable now of reading all of those things and it's more You know data management issue for you about how you want to manage and keep track of everything and standardizing does make that a lot
[Matthew]
Easier tables can be a challenge.
[Jason]
Yeah You've got data with lots of tables subject There's an interesting conversation about how tables make sense to humans. They do not make sense to an LLM so, you know read across until you get to here and then magically go down here until you can then go up here like there's a Yeah, there's this thing that you have to train LLMs to do But again, that's a very technical thing that if you want to discuss you can you can reach up So that's probably a good place to wrap up Thank you all so much for those who stayed on and for all your thoughtful questions So discussions are great and these make these sessions really valuable for everyone because if you're thinking it someone else probably is too If any of today's discussion has sparked ideas for your own work or if you have any further burning questions about AI We're always happy to continue the conversation. So just click on the screen and we will get in touch But thank you so much for joining us.
We really appreciate it and have the great rest of your day everyone.
[Adam]
Thanks very much Thanks everyone. See you later