- Speech to Text
- Text to Speech
- OpenAI API
- Authentication
- Realtime Database
- Classes (and Layouts)
- Libraries
It works using the Android Speech Recognition service to transform user's voice into text. The text is then sent to the OpenAI API endpoint to retrieve the generated response.
- Check the
RECORD_AUDIO
permission at runtime- If not granted, request permission
- If granted, continue
- Create a
SpeechRecognizer
instance - Set a
RecognitionListener
to receive speech recognition events - Call
startListening()
to begin recognition whenmicButton
is pressed - Receive results in
RecognitionListener.onResults()
- Send the transcript (if not empty) to the OpenAI API endpoint within the whole chat conversation
micButton
size relies on the RMS value of the user's voice, which is received in RecognitionListener.onRmsChanged()
.
override fun onRmsChanged(p0: Float) {
// Resize micButton given RMS value
val scale = p0 / 2
if (scale < 1 || scale > 4) return
binding.micButton.scaleX = scale
binding.micButton.scaleY = scale
}
RMS stands for Root Mean Square and is a measure of the average power or intensity of a signal.
Converso also uses Android Text to Speech to read assistant's messages out loud.
- Create a
TextToSpeech
instance - Call
setLanguage()
to download voice data (if not installed) e.g.Locale.US
. - Queue responses by calling
speak()
- Shutdown properly in
onDestroy()
It should only speak messages from the assistant, so we can check the message's role
.
// Speak message if it comes from the assistant
if (message.role == "assistant") {
textToSpeech.speak(message.content, TextToSpeech.QUEUE_FLUSH, null, null)
}
Where Message
is a data class that represents a chat message in the same format received from the OpenAI API.
data class Message(val role: String, val content: String)
Where role
can take 3 different values: user
, assistant
or system
(system is used for context e.g. "Keep responses short") and content
is just the text.
A data class is a class specifically designed for storing data, so it automatically generates common methods like equals
, hashCode
, and toString
.
The crown jewel of this project is the super easy-to-use OpenAI API which takes the whole chat conversation (with the new user message at the end) and retrieves the generated response from the GPT models. Commonly using GPT-3.5-turbo
or GPT-4
.
Given an API Key, that could be edited at the app:
- Create an
OkHttpClient
instance - Build a request with the API Key in the
Authorization
header - Send the request to the OpenAI API endpoint
- Deserialize the JSON response with Gson using ResponseJson.kt
- Add the new assistant message to the chat RecyclerView using the MessageAdapter
messages
is an ArrayList<Message>
that contains the whole chat, and is sent to the endpoint as part of the request body.
private fun sendChatToOpenAIAndRetrieveResponse() {
val headers = Headers.Builder()
.add("Authorization", "Bearer ${binding.OpenAIAPIKeyEditText.text}")
.add("Content-Type", "application/json")
.build()
val requestData = mapOf(
"model" to "gpt-3.5-turbo",
"messages" to messages
)
val jsonMediaType = "application/json; charset=utf-8".toMediaType()
val requestBody = Gson().toJson(requestData).toRequestBody(jsonMediaType)
val request = Request.Builder()
.url("https://api.openai.com/v1/chat/completions")
.headers(headers)
.post(requestBody)
.build()
okHttpClient.newCall(request).enqueue(object : Callback {
override fun onFailure(call: Call, e: IOException) {
// if the request fails, show the error message to the user
runOnUiThread {
Toast.makeText(this@MainActivity, e.message, Toast.LENGTH_LONG).show()
}
}
override fun onResponse(call: Call, response: Response) {
if (response.isSuccessful) {
val responseBody = response.body?.string()
val responseJson = Gson().fromJson(responseBody, ResponseJson::class.java)
val responseMessage = responseJson.choices?.get(0)?.message?.content
if (responseMessage != null) {
runOnUiThread {
addMessageToChatRecyclerView(Message("assistant", responseMessage))
}
}
} else {
// if the response fails, show the error message to the user
runOnUiThread {
Toast.makeText(this@MainActivity, response.message, Toast.LENGTH_LONG).show()
}
}
}
})
}
It works thanks to the Firebase Auth service, which provides an easy way to authenticate users with an email and password.
- Initialize
FirebaseAuth
instance - Check if user is logged in
- Go to AuthActivity if not
- Let the user create an account or sign in
- Redirect to MainActivity
On AuthActivity if the user clicks the signInButton
and the credentials are valid, we call signIn()
private fun performAuth(signIn: Boolean) {
val email = binding.emailEditText.text.toString()
val password = binding.passwordEditText.text.toString()
if (isEmailValid(email) && isPasswordValid(password)) {
toggleButtons(false) // disable buttons to avoid multiple requests
if (signIn) signIn(email, password)
else signUp(email, password)
}
}
Where isPasswordValid()
uses this regex:
Regex("^(?=.*[A-Z])(?=.*[0-9]).{8,}$")
^ # start-of-string
(?=.*[0-9]) # a digit must occur at least once
(?=.*[a-z]) # a lower case letter must occur at least once
(?=.*[A-Z]) # an upper case letter must occur at least once
(?=.*[@#$%^&+=]) # a special character must occur at least once you can replace with your special characters
(?=\\S+$) # no whitespace allowed in the entire string
.{4,} # anything, at least six places though
$ # end-of-string
Each time the user opens the app, it goes to MainActivity and checks if they are logged in. If not, they are redirected to AuthActivity as follows:
if (savedInstanceState == null) {
auth = Firebase.auth
database = Firebase.database
// go to AuthActivity if user is not logged in
if (auth.currentUser == null) {
startActivity(Intent(this, AuthActivity::class.java))
finish()
}
}
Where savedInstanceState
helps to preserve the session when the activity is recreated e.g. rotated device.
Using the Firebase Realtime Database we can store the token usage of each user, and limit the number of requests per day.
Incrementing tokenUsage
for each assistant response
tokenUsage += responseJson.usage.total_tokens
And then updating the database
val userRef = database.getReference("users/${auth.currentUser?.uid}/token_usage")
userRef.get()
.addOnSuccessListener {
userRef.setValue(tokenUsage)
}
.addOnFailureListener {
Log.e("Error updating tokenUsage", it.message!!)
}
- AuthActivity: Login/Sign up screen.
- MainActivity: Main screen where you can chat with the assistant or edit your OpenAI API Key and System Prompt.
- Message: Data class formatted as the OpenAI API expects.
- MessageAdapter: Chat's RecyclerView Adapter.
- MessageViewHolder: Chat's RecyclerView ViewHolder.
- Inflates user_message.xml or assistant_message.xml depending on the message's
role
.
- Inflates user_message.xml or assistant_message.xml depending on the message's
- ResponseJson: Classes to represent the JSON response from OpenAI.
- User: Data class to represent a user in the database.
- OkHttp: HTTP client
- Gson: JSON serialization/deserialization
- Material Design: UI components
- Firebase
- Auth: Authentication
- Realtime Database: Database