Title: | Text Analysis Through the 'Receptiviti' API |
---|---|
Description: | Send text to the <https://www.receptiviti.com> API to be scored by all available frameworks. |
Authors: | Receptiviti Inc. [fnd, cph], Kent English [cre], Micah Iserman [aut, ctr] |
Maintainer: | Kent English <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.9 |
Built: | 2024-11-13 10:18:14 UTC |
Source: | https://github.com/receptiviti/receptiviti-r |
The main function to access the Receptiviti API.
receptiviti(text = NULL, output = NULL, id = NULL, text_column = NULL, id_column = NULL, files = NULL, dir = NULL, file_type = "txt", encoding = NULL, return_text = FALSE, context = "written", custom_context = FALSE, api_args = getOption("receptiviti.api_args", list()), frameworks = getOption("receptiviti.frameworks", "all"), framework_prefix = TRUE, as_list = FALSE, bundle_size = 1000, bundle_byte_limit = 7500000, collapse_lines = FALSE, retry_limit = 50, clear_cache = FALSE, clear_scratch_cache = TRUE, request_cache = TRUE, cores = detectCores() - 1, use_future = FALSE, in_memory = TRUE, verbose = FALSE, overwrite = FALSE, compress = FALSE, make_request = TRUE, text_as_paths = FALSE, cache = Sys.getenv("RECEPTIVITI_CACHE"), cache_overwrite = FALSE, cache_format = Sys.getenv("RECEPTIVITI_CACHE_FORMAT", "parquet"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"), url = Sys.getenv("RECEPTIVITI_URL"), version = Sys.getenv("RECEPTIVITI_VERSION"), endpoint = Sys.getenv("RECEPTIVITI_ENDPOINT")) receptiviti_status(url = Sys.getenv("RECEPTIVITI_URL"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"), verbose = TRUE, include_headers = FALSE)
receptiviti(text = NULL, output = NULL, id = NULL, text_column = NULL, id_column = NULL, files = NULL, dir = NULL, file_type = "txt", encoding = NULL, return_text = FALSE, context = "written", custom_context = FALSE, api_args = getOption("receptiviti.api_args", list()), frameworks = getOption("receptiviti.frameworks", "all"), framework_prefix = TRUE, as_list = FALSE, bundle_size = 1000, bundle_byte_limit = 7500000, collapse_lines = FALSE, retry_limit = 50, clear_cache = FALSE, clear_scratch_cache = TRUE, request_cache = TRUE, cores = detectCores() - 1, use_future = FALSE, in_memory = TRUE, verbose = FALSE, overwrite = FALSE, compress = FALSE, make_request = TRUE, text_as_paths = FALSE, cache = Sys.getenv("RECEPTIVITI_CACHE"), cache_overwrite = FALSE, cache_format = Sys.getenv("RECEPTIVITI_CACHE_FORMAT", "parquet"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"), url = Sys.getenv("RECEPTIVITI_URL"), version = Sys.getenv("RECEPTIVITI_VERSION"), endpoint = Sys.getenv("RECEPTIVITI_ENDPOINT")) receptiviti_status(url = Sys.getenv("RECEPTIVITI_URL"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"), verbose = TRUE, include_headers = FALSE)
text |
A character vector with text to be processed, path to a directory containing files, or a vector of file paths.
If a single path to a directory, each file is collapsed to a single text. If a path to a file or files,
each line or row is treated as a separate text, unless |
output |
Path to a |
id |
Vector of unique IDs the same length as |
text_column , id_column
|
Column name of text/id, if |
files |
A list of file paths, as alternate entry to |
dir |
A directory to search for files in, as alternate entry to |
file_type |
File extension to search for, if |
encoding |
Encoding of file(s) to be read in. If not specified, this will be detected, which can fail, resulting in mis-encoded characters; for best (and fasted) results, specify encoding. |
return_text |
Logical; if |
context |
Name of the analysis context. |
custom_context |
Name of a custom context (as listed by |
api_args |
A list of additional arguments to pass to the API (e.g., |
frameworks |
A vector of frameworks to include results from. Texts are always scored with all available framework –
this just specifies what to return. Defaults to |
framework_prefix |
Logical; if |
as_list |
Logical; if |
bundle_size |
Number of texts to include in each request; between 1 and 1,000. |
bundle_byte_limit |
Memory limit (in bytes) of each bundle, under |
collapse_lines |
Logical; if |
retry_limit |
Maximum number of times each request can be retried after hitting a rate limit. |
clear_cache |
Logical; if |
clear_scratch_cache |
Logical; if |
request_cache |
Logical; if |
cores |
Number of CPU cores to split bundles across, if there are multiple bundles. See the Parallelization section. |
use_future |
Logical; if |
in_memory |
Logical; if |
verbose |
Logical; if |
overwrite |
Logical; if |
compress |
Logical; if |
make_request |
Logical; if |
text_as_paths |
Logical; if |
cache |
Path to a directory in which to save unique results for reuse; defaults to
|
cache_overwrite |
Logical; if |
cache_format |
Format of the cache database; see |
key |
API Key; defaults to |
secret |
API Secret; defaults to |
url |
API URL; defaults to |
version |
API version; defaults to |
endpoint |
API endpoint (path name after the version); defaults to |
include_headers |
Logical; if |
A data.frame
with columns for text
(if return_text
is TRUE
; the originally entered text),
id
(if one was provided), text_hash
(the MD5 hash of the text), a column each for relevant entries in api_args
,
and scores from each included framework (e.g., summary.word_count
and liwc.i
). If as_list
is TRUE
,
returns a list with a named entry containing such a data.frame
for each framework.
If the cache
argument is specified, results for unique texts are saved in an
Arrow database in the cache location
(Sys.getenv(
"RECEPTIVITI_CACHE")
), and are retrieved with subsequent requests.
This ensures that the exact same texts are not re-sent to the API.
This does, however, add some processing time and disc space usage.
If cache
is TRUE
, a default directory (receptiviti_cache
) will be looked for
in the system's temporary directory (which is usually the parent of tempdir()
).
If this does not exist, you will be asked if it should be created.
The primary cache is checked when each bundle is processed, and existing results are loaded at that time. When processing many bundles in parallel, and many results have been cached, this can cause the system to freeze and potentially crash. To avoid this, limit the number of cores, or disable parallel processing.
The cache_format
arguments (or the RECEPTIVITI_CACHE_FORMAT
environment variable) can be used to adjust the format of the cache.
You can use the cache independently with open_database(Sys.getenv("RECEPTIVITI_CACHE"))
.
You can also set the clear_cache
argument to TRUE
to clear the cache before it is used again, which may be useful
if the cache has gotten big, or you know new results will be returned. Even if a cached result exists, it will be
reprocessed if it does not have all of the variables of new results, but this depends on there being at least 1 uncached
result. If, for instance, you add a framework to your account and want to reprocess a previously processed set of texts,
you would need to first clear the cache.
Either way, duplicated texts within the same call will only be sent once.
The request_cache
argument controls a more temporary cache of each bundle request. This is cleared when the
R session ends. You might want to set this to FALSE
if a new framework becomes available on your account
and you want to process a set of text you already processed in the current R session without restarting.
Another temporary cache is made when in_memory
is FALSE
, which is the default when processing
in parallel (when cores
is over 1
or use_future
is TRUE
). This contains
a file for each unique bundle, which is read in as needed by the parallel workers.
text
s are split into bundles based on the bundle_size
argument. Each bundle represents
a single request to the API, which is why they are limited to 1000 texts and a total size of 10 MB.
When there is more than one bundle and either cores
is greater than 1 or use_future
is TRUE
(and you've
externally specified a plan
), bundles are processed by multiple cores.
If you have texts spread across multiple files, they can be most efficiently processed in parallel
if each file contains a single text (potentially collapsed from multiple lines). If files contain
multiple texts (i.e., collapse_lines = FALSE
), then texts need to be read in before bundling
in order to ensure bundles are under the length limit.
Whether processing in serial or parallel, progress bars can be specified externally with
handlers
; see examples.
## Not run: # check that the API is available, and your credentials work receptiviti_status() # score a single text single <- receptiviti("a text to score") # score multiple texts, and write results to a file multi <- receptiviti(c("first text to score", "second text"), "filename.csv") # score many texts in separate files ## defaults to look for .txt files file_results <- receptiviti(dir = "./path/to/txt_folder") ## could be .csv file_results <- receptiviti( dir = "./path/to/csv_folder", text_column = "text", file_type = "csv" ) # score many texts from a file, with a progress bar ## set up cores and progress bar ## (only necessary if you want the progress bar) future::plan("multisession") progressr::handlers(global = TRUE) progressr::handlers("progress") ## make request results <- receptiviti( "./path/to/largefile.csv", text_column = "text", use_future = TRUE ) ## End(Not run)
## Not run: # check that the API is available, and your credentials work receptiviti_status() # score a single text single <- receptiviti("a text to score") # score multiple texts, and write results to a file multi <- receptiviti(c("first text to score", "second text"), "filename.csv") # score many texts in separate files ## defaults to look for .txt files file_results <- receptiviti(dir = "./path/to/txt_folder") ## could be .csv file_results <- receptiviti( dir = "./path/to/csv_folder", text_column = "text", file_type = "csv" ) # score many texts from a file, with a progress bar ## set up cores and progress bar ## (only necessary if you want the progress bar) future::plan("multisession") progressr::handlers(global = TRUE) progressr::handlers("progress") ## make request results <- receptiviti( "./path/to/largefile.csv", text_column = "text", use_future = TRUE ) ## End(Not run)
Retrieve the list of frameworks available to your account.
receptiviti_frameworks(url = Sys.getenv("RECEPTIVITI_URL"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"))
receptiviti_frameworks(url = Sys.getenv("RECEPTIVITI_URL"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"))
url , key , secret
|
Request arguments; same as those in |
A character vector containing the names of frameworks available to your account.
## Not run: # see which frameworks are available to your account frameworks <- receptiviti_frameworks() ## End(Not run)
## Not run: # see which frameworks are available to your account frameworks <- receptiviti_frameworks() ## End(Not run)
Custom norming contexts can be used to process later texts by specifying the
custom_context
API argument in the receptiviti
function (e.g.,
receptiviti("text to score", version = "v2",
options = list(custom_context = "norm_name"))
,
where norm_name
is the name you set here).
receptiviti_norming(name = NULL, text = NULL, options = list(), delete = FALSE, name_only = FALSE, id = NULL, text_column = NULL, id_column = NULL, files = NULL, dir = NULL, file_type = "txt", collapse_lines = FALSE, encoding = NULL, bundle_size = 1000, bundle_byte_limit = 7500000, retry_limit = 50, clear_scratch_cache = TRUE, cores = detectCores() - 1, use_future = FALSE, in_memory = TRUE, url = Sys.getenv("RECEPTIVITI_URL"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"), verbose = TRUE)
receptiviti_norming(name = NULL, text = NULL, options = list(), delete = FALSE, name_only = FALSE, id = NULL, text_column = NULL, id_column = NULL, files = NULL, dir = NULL, file_type = "txt", collapse_lines = FALSE, encoding = NULL, bundle_size = 1000, bundle_byte_limit = 7500000, retry_limit = 50, clear_scratch_cache = TRUE, cores = detectCores() - 1, use_future = FALSE, in_memory = TRUE, url = Sys.getenv("RECEPTIVITI_URL"), key = Sys.getenv("RECEPTIVITI_KEY"), secret = Sys.getenv("RECEPTIVITI_SECRET"), verbose = TRUE)
name |
Name of a new norming context, to be established from the provided |
text |
Text to be processed and used as the custom norming context. Not providing text will return the status of the named norming context. |
options |
Options to set for the norming context (e.g.,
|
delete |
Logical; If |
name_only |
Logical; If |
id , text_column , id_column , files , dir , file_type , collapse_lines , encoding
|
Additional
arguments used to handle |
bundle_size , bundle_byte_limit , retry_limit , clear_scratch_cache , cores , use_future , in_memory
|
Additional arguments used to manage the requests; same as those in
|
key , secret , url
|
Request arguments; same as those in |
verbose |
Logical; if |
Nothing if delete
if TRUE
.
Otherwise, if name
is not specified, a character vector containing names of each
available norming context (built-in and custom).
If text
is not specified, the status of the
named context in a list
. If text
s are provided, a list
:
initial_status
: Initial status of the context.
first_pass
: Response after texts are sent the first time, or
NULL
if the initial status is pass_two
.
second_pass
: Response after texts are sent the second time.
## Not run: # get status of all existing custom norming contexts contexts <- receptiviti_norming(name_only = TRUE) # create or get the status of a single custom norming context status <- receptiviti_norming("new_context") # send texts to establish the context ## these texts can be specified just like ## texts in the main receptiviti function ## such as directly full_status <- receptiviti_norming("new_context", c( "a text to set the norm", "another text part of the new context" )) ## or from a file full_status <- receptiviti_norming( "new_context", "./path/to/text.csv", text_column = "texts" ) ## or from multiple files in a directory full_status <- receptiviti_norming( "new_context", dir = "./path/to/txt_files" ) ## End(Not run)
## Not run: # get status of all existing custom norming contexts contexts <- receptiviti_norming(name_only = TRUE) # create or get the status of a single custom norming context status <- receptiviti_norming("new_context") # send texts to establish the context ## these texts can be specified just like ## texts in the main receptiviti function ## such as directly full_status <- receptiviti_norming("new_context", c( "a text to set the norm", "another text part of the new context" )) ## or from a file full_status <- receptiviti_norming( "new_context", "./path/to/text.csv", text_column = "texts" ) ## or from multiple files in a directory full_status <- receptiviti_norming( "new_context", dir = "./path/to/txt_files" ) ## End(Not run)