Retrieves text embeddings from OpenAI's models (text-embedding-3-small or large). Supports optional dimension truncation and batch processing via the OpenAI Batch API.
Usage
embed(
texts,
org = "openai",
size = c("small", "large"),
dimensions = NULL,
batch = FALSE,
timeout = 60,
...
)
Arguments
- texts
Character vector. The text(s) to embed. Required.
- org
Character string. The LLM provider. Currently must be "openai". Argument exists for future expansion.
- size
Character string. The embedding model size. Allowed: "small" (default), "large". Maps to "text-embedding-3-small" and "text-embedding-3-large".
- dimensions
Integer or NULL. The desired number of dimensions for the output embeddings. If NULL (default), the model's full dimensions are used. If set, must be a positive integer (OpenAI may have specific constraints).
- batch
Logical. Use batch processing via the OpenAI Batch API? Default
FALSE
.- timeout
Numeric. Request timeout in seconds. Applies to synchronous API calls or the batch initiation steps. Default is 60.
- ...
Currently unused. For future expansion.
Value
If
batch = FALSE
: Alist
where each element is a numeric vector representing the embedding for the corresponding input text. ReturnsNULL
on API error.If
batch = TRUE
: The OpenAI batch job ID (character string). ReturnsNULL
if batch initiation fails.
Details
For non-batch requests (batch = FALSE
), the function sends the texts to the
OpenAI embeddings endpoint. Note that the standard API endpoint itself can handle
multiple texts in a single request (up to API limits). If a large number of
texts (>50) is provided with batch = FALSE
, a message suggests using batch
mode for potential cost savings.
For batch requests (batch = TRUE
), the function prepares and uploads an input
file to OpenAI and initiates a batch job targeting the embeddings endpoint. It
returns the batch job ID. Use check_batch()
and workspace_batch()
to monitor
and retrieve results later. Note that workspace_batch()
will return a list of
numeric vectors for completed embedding batch jobs.
Currently, only org = "openai"
is supported.
Examples
if (FALSE) { # \dontrun{
# Ensure API key is set
# Sys.setenv(OPENAI_API_KEY = "YOUR_OPENAI_KEY")
my_texts <- c("The quick brown fox jumps over the lazy dog.",
"R is a language for statistical computing.")
# --- Synchronous (Non-Batch) Example ---
embeddings_list <- embed(texts = my_texts, size = "small")
if (!is.null(embeddings_list)) {
print(paste("Number of embeddings received:", length(embeddings_list)))
print(paste("Dimensions of first embedding:", length(embeddings_list[[1]])))
}
# Example with dimension truncation
embeddings_short <- embed(texts = my_texts, size = "small", dimensions = 256)
if (!is.null(embeddings_short)) {
print(paste("Dimensions of truncated embedding:", length(embeddings_short[[1]])))
}
# Example triggering the long vector warning
long_texts <- rep("Test text.", 60)
embeddings_long_warn <- embed(texts = long_texts) # Will show message
# --- Batch Example ---
batch_texts <- c("Embed this first.", "Embed this second.", "And this third.")
embedding_batch_id <- embed(texts = batch_texts, batch = TRUE)
if (!is.null(embedding_batch_id)) {
print(paste("Embedding batch job created with ID:", embedding_batch_id))
# Use check_batch(embedding_batch_id) and
# workspace_batch(embedding_batch_id) later...
}
} # }