How to Efficiently Transcribe Audio Recordings and Generate Action Items with Word and OpenAI APIs
- Mark Stankevicius
- Apr 15, 2024
- 13 min read
Updated: Apr 26, 2024

Introduction to automated action item generation
As we explore the applications of Generative AI in daily work, more and more methods are being developed to integrate Generative AI to enhance productivity. This technology helps facilitate task completion in less time and with greater precision. Generative AI can also support adherence to business templates and processes by providing guidance when we need to comply with standards, whether for corporate, governmental, or industry regulations. We can guarantee alignment with specific guidelines using Generative AI for document creation tailored to our business requirements.
ChatGPT, an AI language model developed by OpenAI, is a quick and efficient method for generating automated action items from meeting minutes or discussions. Utilizing the power of Microsoft Word's built-in generative AI and OpenAI generative AI, you can quickly create complete action items in Microsoft Word from an audio recording of a meeting.
In this step-by-step tutorial, I will guide you through:
Setting up a Gen AI Action Item generator to create action items from a meeting audio recording.
Using Generative AI built into Word to transcribe the meeting from the audio recording.
Generate the action items from the meeting transcript using OpenAI Generative AI APIs (chat/completions).
Automated insertion of the action items into a Microsoft Word document.
This tutorial follows a similar approach to another article I wrote, which uses status meeting minutes text (not audio) as input: 'How to Use OpenAI APIs to Generate Action Items in Word'. However, there are some nuanced differences in the methods employed. This article uses an audio recording as input vs meeting minute text, and the approach used to find and extract the text from the document uses the entire document text as input vas using text start and end delimiters.
Introduction to the OpenAI (ChatGPT) API
The ChatGPT API empowers developers to incorporate the model’s capabilities into their applications, enabling programmatic text generation. Accessing the API is straightforward, allowing for creating prompts that ChatGPT can respond to with precision. This can be done with other AI Chatbots and the APIs they expose, but for this tutorial, I will be using OpenAI APIs.
ChatGPT's capabilities are based on its ability to predict the next word or phrase given a prompt. By leveraging this generative AI capability, we can provide ChatGPT with a meeting transcription and, using the correct prompt, let it generate a concise list of action items. With the current state of the art in Generative AI models, such as generative text AI, we will still need to review and modify the action items to suit our needs, but it very quickly gives us a starting point.
What is OpenAI-API? OpenAI provides several ways to use the power of generative AI. One approach is to programmatically call the OpenAI functions, which requires an API. For this particular use case, the OpenAI chatcompletion API suits our needs to review and understand the meeting minutes and generate the action items. By understanding how the OpenAI API works, we can seamlessly integrate the AI capabilities into a Word Macro to create a ChatGPT-generated list of action items.
Prereqs for these Step-by-Step instructions
There are several requirements for you to automate the calling of OpenAI APIs in a Word document. They are the following:
An OpenAI API account: You will need access to the OpenAI APIs to call them directly from the Word Macros, which doesn’t require a ChatGPT Plus account but does necessitate an OpenAI API plan. You can find more details here: https://openai.com/product. The pricing model is quite affordable, based on the number of tokens utilized. Create an account before you begin the steps in this document.
Familiarity with programming with Visual Basic (VB) macros in Microsoft Word: The OpenAI API will be called from a VB macro in the Word document. Therefore, some familiarity with programming would be helpful to ensure you can set up the environment and correctly copy the code for the ChatGPT Action Item generator.
Access to Microsoft Word: I used “Microsoft® Word for Microsoft 365 MSO (Version 2402)” in these step-by-step instructions. Some steps may vary depending on your version, but the general approach will apply.
A Microsoft Word version that supports the “Transcribe” feature: At the time of writing this blog, only some versions of Microsoft Word supported the “transcribe” capability. Word for Web and Word for Windows both support “Transcribe”. Word for Mac, Android, and iPhone do not currently support this capability. Check the Microsoft website for the latest status on what versions support “Transcribe”.
Macro-enabled Word document: For security reasons, the default Word document type does not allow macros to be invoked. You must use a “Microsoft Word Macro-Enabled Document,” which allows you to create and add macros.
Important Note: Enabling macros in Word documents could introduce malicious code. Some Malware is distributed via VB macros. In addition to the code I have written and provided, I use two other VBA macro libraries: VBA-JSON and VBA-Dictionary. I scanned both libraries with a VB malware scanner to ensure they do not introduce malware into the word document.
Step-by-step tutorial on programmatic Action Item generation in Word from audio recording
Now that we have a basic understanding of automated action item generation and the tools involved, let's dive into the step-by-step tutorial on generating action items programmatically in Word.
In this automated action item generator, the input is an audio recording of the meeting. I’m using a fictitious recording of a status meeting you might typically hear when reviewing a project’s status. The workflow would be:
Hold the status meeting to discuss the project status.
Record the meeting discussion as it occurs, and generate an audio output file.
Nowadays, many meetings happen virtually or with remote participants, so it’s often easy to record the discussion using meeting technology. If you don’t hold the meeting virtually, you will need a method of recording all the meeting discussions.
Using Generative AI use the recording to transcribe and generate a list of action items discussed.
Review the output of the automated action item generation and update any missing or incorrect information.
The approach I will describe starts with an empty Word document. It assumes that you will use a new document for each meeting.
The VBA macro code instructs the OpenAI API on what action to take. I have documented the approach in the code using comments. The details must be specified in the OpenAI prompt to ensure the chat API performs the correct prompt action.
Step 1: Check that you have “Transcribe” feature in Word
The power of this approach starts with the generative AI capability built into Word. The feature provides speech-to-text generative AI. You could also use the speech-to-text OpenAI feature, but since this was already a part of Word, I decided that it's best to keep it simple and use what is already provided by Word.
If you want to use the speech-to-text capability of OpenAI it's available with OpenAI's Whisper API. I will not be covering the details of using the Open AI Whisper API in this post.
On the “Home” tab of Word, you should see “Dictate” and a microphone icon. It usually is on the right-hand side. Under the “Dictate” button is the “Transcribe” feature. It looks like the image shown below:

If you do not see this feature, check “File” -> “Options” -> “Customize Ribbon” to see if the “Transcribe” feature is available but not currently shown on the “Home” tab. You may just need to check the box to display it on the tab.
If it’s not available, then you will have to obtain a version of Microsoft Word that provides this capability before continuing.
Step 2: Transcribing the audio recording to text
Now that we have the "Transcribe" feature enabled in Word, we can transcribe an audio recording of a meeting into text. I have provided an audio recording of a fictitious status meeting that you can use to try out the transcribe feature if you don't have your own file.
The audio file is located here: Sample Audio file of status meeting discussions.
I created this audio file using a generative AI tool called Speechify. I was impressed by how well it made the audio sound like an honest discussion, and I especially liked the ability to use different voices.
If you're interested, I wrote the following script and used it as input for the Speechify app. The script is located here: Sample status meeting script.
1) On the Word top Ribbon under the "Home" tab use "Dictate" -> "Transcribe".

2) Upload an audio file to transcribe using the "Upload audio" and choose a file to upload. You may also want to change the language used in the audio file so that it properly understands the audio and is able to do a correct transcription.

3) Transcription of the audio to text may take some time, but the Word UI will indicate the progress as the audio is being transcribed as shown in the image below.

4) Once the transcription is complete, you can add it to the Word document. However, I also found it useful to edit the speaker's name. You will notice that it identifies the various speakers in the audio and just names them Speaker 1, 2, etc. I found that Word did an excellent job of matching the spoken sentences to the correct speaker throughout the audio. By updating the speaker's name to the actual name of the person, the final action items that OpenAI generates will also be able to do a better job of assigning the owner. To edit the name, hover over the name of the speaker, and a pencil icon will appear that allows you to update the name.

5) Notice that you can also easily change all the instances of a speaker throughout the transcription by checking the box beside the "Change all" as shown in the image below.

6) Once you have updated the transcription with the proper names, you add it to the Word document using the "Add to document" -> "With speakers" at the bottom of the transcription.

7) This will add the transcription into the Word document as shown in this example:

Now that the meeting transcription is added to the Word document we can proceed with the remaining steps to setting up the VBA macro code to call the OpenAI API.
Step 3: Setting up Word for Developer enablement to code VBA macros
To interact with the OpenAI API, we must use a Word document with macro support. A Word Macro is a set of instructions that automates repetitive tasks in Word. In this case, we will use a Macro programmed in Visual Basic to invoke the OpenAI API, retrieve the generated action items, and insert them into the Word document.
To set up the Visual Basic Word macro, follow these steps:
1) Open Word and navigate to the "Developer" tab.
2) If you do not see the "Developer" tab in Word, add it to your view. It is not shown by default in the tabs in Word.
3) To add the “Developer” tab, navigate to “File” -> “Options” -> “Customize Ribbon”

4) Check the “Developer” Tab box and click “OK”

5) You should now see the “Developer” tab on your Ribbon

6) The “Developer” tab opens the Visual Basic Editor, which allows you to create Visual Basic macros to call OpenAI APIs.
Step 4: Setting up the VBA code to call the OpenAI API
With the Developer tab now in place, we're poised to integrate the OpenAI API in the subsequent phases of our setup.
1) Open Word and navigate to the "Developer" tab.
2) Click on "Visual Basic” to open the Microsoft Visual Basic for Applications (VBA) editor.

This will open the VBA editor in another window and allow you to create the macro needed to call the API.
3) On the top toolbar of the VBA editor, select “Insert” -> “Module” to open an empty module creation window.

You will want to copy the below VBA code into this empty module window. Copy and paste all the VB code into the new module.
Check to ensure you have copied all the VBA code from the Github location into the new module in the VBA editor.
4) Rename the module in the VBA editor window by choosing “View” -> “Properties Window”, which will open the properties for this new module. Notice the module is initially named “Module1”. In the Properties window, choose a more meaningful name. I have chosen “ActionItemsFTransc”

5) The “JsonConverter” module is also used in this code and must be inserted and referenced for the macro code to run properly. It is used to parse the response returned from the API. If you do not import this module, you will receive a “Run-time error ‘424’ Object required”. Which indicates that the module cannot be found. The instructions for installing this module can be found on the Readme page of that module's GitHub URL. Import it using “File” -> “Import File” on the VBA editor toolbar.
6) The JsonConverter library also depends on “VBA-dictionary”. I also used the VBA Dictionary and imported it into the VBA editor. Download the zip file and extract it to a local folder. Then, import the “dictionary.cls” file into your VBA project using steps similar to those above. Once you have imported the JsonConverter and the Dictionary file, you will see the following in the VBA editor.

There are a few things to note in the VBA macro:
The default model (LLMs - Large Language Models) used in the text generation is "gpt-4-turbo-preview" since it gave me the best results for the action items. You can change this to use other models of your choice. I suggest starting with gpt-3.5, and once you have everything set up and working, you can change it to other models that may give you better results in the action items generated.
The OpenAI API called to generate the text is the “chat/completions” API. This API is compatible with a specific list of language models, which changes over time. I recommend checking the OpenAI website to ensure you use a compatible model.
The crucial line of code that instructs OpenAI on the desired behavior is the line of code assigning a value to the “systemPrompt” variable. This directive is a key part of the ChatGPT action item prompt, and I’ve found it to be effective in guiding OpenAI in producing the most accurate set of action items. You can modify these instructions to try different text and see the resulting set of action items. You will notice that in the system prompt I also give guidance to OpenAI on the formatting of the action items to a format I desire for use in the Word document.
The code has error handling to ensure the meeting minutes are found. If something is missing in the document and the text delimiters cannot be found, messages will be sent to the Word UI.
Step 5: Setting up the API key to call the OpenAI API
Keep going; your determination is about to be rewarded as you approach the final stages before you can create and execute the macro. Just a few more steps remain, and you'll be all set to give it a try.
1) Copy your API key from the OpenAI API account site into the code. The Macro will make an API call to the OpenAI endpoint, retrieve the generated content, and insert it into your Word document. Login to your OpenAI API account that you created in the Prereqs step and access the API key tab in the menu or use this API Key link.

2) In the API keys UI, click the “Create new secret key” button to create a new API key for the API call.

3) Once the API key is created, copy it into the indicated line of macro code shown below. Carefully replace the placeholder with your unique key to ensure secure access to the OpenAI API from your personal account. Remember, this key is confidential and should not be shared, as it is your exclusive access to your account.

Step 6: Running and testing the Macro call to ChatGPT API chat completions
1) Now that you have the VBA macro created and using your API key, you can test the macro and ensure it is set up properly. Ensure you have added the meeting transcription to the document. If the document is ready, go to the quick access toolbar and click View Macros.

2) If you do not see the “View Macros” on the quick access toolbar, you will need to choose “File” -> “Options” -> “Quick Access Toolbar” and add “Macros”.
3) Once you can view the macros, a window with the macros in your document will open, as indicated below. Select macro “ActionItemsFTrans” and then click “Run” (you may have named your macro differently than shown), to run the macro on the currently open document.

4) If the macro call to ChatGPT is successful, a list of action items will be generated and inserted into the Word document after any existing text. I have included a screenshot of a partial example below:

If no text is inserted, check for any error messages indicating possible problems. If there are errors, check the steps above to ensure that all actions have been implemented properly. You can also refer to the section Error resolution for a description of possible errors.
5) The OpenAI chat completions request body has a couple of parameters you can modify to try different outputs and determine which provides the best results. These are the "temperature", "top_p", “frequency_penalty”, and “presence_penalty” parameters. A detailed description of these parameters can be found in the OpenAI API documentation for Chat completions API. To view the values I used, check the VBA code to see the setting that provided a good set of action items.
Step 7: Checking the action items generated for accuracy
Although the AI Action item generator powered by ChatGPT offers a foundation, it's important to carefully review the action items to ensure their accuracy. Through my testing, I discovered that this approach occasionally missed some action items, inaccurately estimated due dates, or misrepresented dependency details. Sometimes, the discussion may not include enough detail for ChatGPT to capture the action items accurately. This will require you to add missing details or correct due dates based on your additional knowledge.
Over time, you may find how you discuss items in the status meeting will change to ensure that the recording of the meeting has all the relevant details.
Despite the need for revisions, utilizing this tool as a starting point can significantly reduce the time required for updates compared to creating action items from scratch.
Conclusion and future possibilities for automated action item generation
Automated action item generators are just one of many tools available to simplify day-to-day tasks and enhance productivity. Combining generative AI in Word with OpenAI offers a powerful approach to leveraging AI and streamlining your work. This serves as a starting point and encourages you to explore other tasks that can be similarly automated, reducing your workload.
As AI technology advances, the accuracy of generated action items will continue to improve. With additional training on your business processes or templates, this automation process will become even more efficient.
By automating action item creation using the OpenAI API in MS Word, you can focus on more complex tasks that require in-depth investigation and analysis, tasks that cannot be automated with Generative AI at present. Follow this step-by-step tutorial to harness the power of AI, enhance your overall productivity, and contribute to the success of your business.
Error resolution
There are a few possible problems you may encounter when running the generator. Below are some error conditions and possible resolutions:
Truncation of the text: If the action items generated look truncated, it’s likely due to the number of tokens specified in the request. The sample code limits it to 1000 tokens, and if your list of action items is extensive, you will need to increase this limit. To make this change, increase the number in the parameter “max_tokens” on the requestData.
If you encounter the error below, there is a problem with the JSON used as input to the chat/completions request. A possible cause may be that the text extracted from the Word document has hidden ASCII characters. The function VBA “CleanText” removes some hidden ASCII characters, such as line feeds, but your text may have other characters causing issues in the JSON string. To determine if this is the cause of the problem, you can display the hidden characters in the document with “File” à “Options” à “Display” and then check the “Show all formatting marks”. This will display the formatting in the document. If you have additional formatting characters that are not already handled by the “CleanText” function you will need to add code to remove the additional ASCII characters.

If you see the following error, it indicates that you have inserted the incorrect API key or that the API key is missing. Check that the API key is inserted into the proper location as indicated in Step 5.3 above and that it is the exact key from your OpenAI account. The variable “OpenAI_Api_Key” must contain your API key.

Comments