- +1
-
A vote in favor of something.
⊗ Español, Français - 68-95-99.7 rule
-
Expresses the fact that 68% of values lie within one standard deviation of the mean, 95% lie within two, and 99.7% lie within three. Conversely, about 0.3% of values lie more than three standard deviations above or below the mean in most cases.
⊗ Afrikaans, Arabic, Français, Português - abandonware
-
Software that is no longer being maintained.
⊗ Afrikaans, Arabic, Español, Français, Português - absolute error
-
The absolute value of the difference between the observed and the correct value. Absolute error is usually less useful than relative error.
⊗ Afrikaans, Arabic, Français, Português - absolute path
-
A path that points to the same location in the filesystem regardless of where it is evaluated. An absolute path is the equivalent of latitude and longitude in geography.
→ relative path
⊗ Afrikaans, Arabic, Español, Français, Português - absolute row number
-
The sequential index of a row in a table, regardless of what sections of the table is being displayed.
⊗ Afrikaans, Español, Português - abstract method
-
In object-oriented programming, a method that is defined but not implemented. Programmers will define an abstract method in a parent class to specify operations that child classes must provide.
⊗ Afrikaans - abstract syntax tree (AST)
-
A deeply nested data structure, or tree, that represents the structure of a program. For example, the AST might have a node representing a
while
loop with one child representing the loop condition and another representing the loop body.
⊗ Afrikaans - actual result (of test)
-
The value generated by running code in a test. If this matches the expected result, the test passes; if the two are different, the test fails.
⊗ Afrikaans - affordance
-
A property of something that suggests how it can be used, such as a handle or button.
⊗ Afrikaans - aggregation function
-
A function that combines many values into one, such as
sum
ormax
.
⊗ Afrikaans, Español, Français, Português - aggregation
-
To combine many values into one, e.g., by summing a set of numbers or concatenating a set of strings.
⊗ Afrikaans, Español, Français, Português - agile development
-
A software development methodology that emphasizes lots of small steps and continuous feedback instead of up-front planning and long-term scheduling. Exploratory programming is often agile.
⊗ Afrikaans, Português - aliasing
-
To have two or more references to the same thing, such as a data structure in memory or a file on disk.
⊗ Afrikaans - anchor
-
In a regular expression, a symbol that fixes a position without matching characters.
^
matches the start of the line, while$
matches the end of the line and\b
matches a break between word and non-word characters.
⊗ Afrikaans, Français, Português - anonymous function
-
A function that has not been assigned a name. Anonymous functions are usually quite short, and are usually defined where they are used, e.g., as callbacks.
⊗ Afrikaans, Français - anti join
-
A join that keeps rows from table A whose keys do not match keys in table B.
⊗ Afrikaans - append mode
-
To add data to the end of an existing file instead of overwriting the previous contents of that file. Overwriting is the default, so most programming languages require programs to be explicit about wanting to append instead.
⊗ Afrikaans - Application Programming Interface (API)
-
A set of functions and procedures provided by one software library or web service through which another application can communicate with it. An API is not the code, the database, or the server: it’s the access point.
⊗ Afrikaans - argument
-
The term should not be confused with, and is not a synonym for parameter. An argument is an expression in the comma-separated list that is passed to a function. It is the actual value that is passed.
⊗ Afrikaans, Arabic, Español, Português - arithmetic mean
-
See mean.
⊗ Português - ASCII
- A standard way to represent the characters commonly used in the Western European languages as 7- or 8-bit integers, now superceded by Unicode.
- assertion
-
A Boolean expression that must be true at a certain point in a program. Assertions may be built into the language (e.g., Python’s
assert
statement) or provided as functions (e.g., R’sstopifnot
). They are often used in testing, but are also put in production code to check that it is behaving correctly. - associative array
- See dictionary.
- asynchronous
-
Not happening at the same time. In programming, an asynchronous operation is one that runs independently of another, or that starts at one time and ends at another.
→ synchronous - attribute
-
A name-value pair associated with an object, used to store metadata about the object such as an array’s dimensions.
⊗ Français - auto-completion
-
A feature that allows the user to finish a word or code quickly through the use of pressing the TAB key to list possible words or code that the user can select from.
⊗ Français, Português - automatic variable
-
A variable that is automatically given a value in a build rule. For example, Make automatically assigns the name of a rule’s target to the automatic variable
$@
. Automatic variables are frequently used when writing pattern rules.
→ Makefile - backpropagation
- An algorithm that iteratively adjusts the weights used in a neural network. Backpropagation is often used to implement gradient descent.
- backward-compatible
-
Software which is able to be used the same way as earlier versions of itself without problems.
⊗ Français, Português - base R
-
The basic functions making up the R language. The base packages can be found in
src/library
and are not updated outside of R; their version numbers follow R version numbering. Base packages are installed and loaded with R, while priority packages are installed with base R but must be loaded prior to use.
→ Tidyverse
⊗ Português - Bayes' Rule
- See Bayes’ Theorem.
- Bayes' Theorem
-
An equation for calculating the probability that something is true if something related to it is true. If P(X) is the probability that X is true and P(X Y) is the probability that X is true given Y is true, then P(X Y) = P(Y X) * P(X) / P(Y).
→ Bayesian network, naive Bayes classifier, prior distribution - Bayesian network
-
A graph that represents the relationships between random variables for a given problem.
→ Bayes' Theorem, Markov Chain, naive Bayes classifier - bias
-
A statistic is biased if it is systematically or consistency different from the parameter it is supposed to estimate.
→ variance, overfitting, classification, systematic error - big data
-
Any data that until recently was too big for most people to work with on a single computer.
→ three Vs - binary expression
-
An expression with two arguments or parameters, such as
1 + 2
.
→ nullary expression, ternary expression, unary expression - binary large object (BLOB)
- Data that is stored in a database without being interpreted in any way, such as an audio file. The term is also now used to refer to data transferred over a network or stored in a version control repository as uninterpreted bits.
- binary
- A system which can have one of two possible states. In computing often represented as being in the state 0 or 1. Represented in Boolean logic as false (0) or true (1). Computers are built upon systems which store 0s and 1s as bits.
- binomial distribution
-
A probability distribution that arises when there are a fixed number of trials, each of which can produce one of two outcomes, and the probability of those outcomes does not change. As the number of trials increases, the binomial distribution approximates a normal distribution.
→ discrete random variable, histogram - bit
-
A unit of information representing representing alternatives, yes/no, true/false. In computing a state of either 0 or 1.
→ binary, Boolean - block comment
-
A comment that spans multiple lines. Block comments may be marked with special start and end symbols, like
/*
and*/
in C and its descendents, or each line may be prefixed with a marker like#
.
⊗ Português - boilerplate
- Standard text that is included in legal contracts, licenses, and so on.
- Boolean
-
Relating to a variable or data type that can have either a logical value of true or a value of false. Named for George Boole, a 19th Century mathemetician. Binary systems, like all computers are built on this foundation of systems of logical evaluations between states of true and false, 1 or 0.
→ truthy, falsy, binary
⊗ Arabic - branch-per-feature workflow
- A common strategy for managing work with Git and other version control systems in which a separate branch is created for work on each new feature or each bug fix and merged when that work is completed. This isolates changes from one another until they are completed.
- branch
- See Git branch.
- A set of supplementary navigational links included in many websites, usually placed at the top of the page. Breadcrumbs show the users where the current page lies in the website; the term comes from a fairy tale in which children left a trail of breadcrumbs behind themselves so that they could find their way home.
- breadth first
-
To go through a nested data structure such as a tree by exploring all of one level, then going on to the next level and so on, or to explore a problem by examining the first step of each possible solution, and then trying the next for each.
→ depth first - bug report
- A collection of files, logs, or related information that describes either an unexpected output of some code or program or an unexpected error or warning. This information is used to help find and fix a bug in the program or code.
- bug tracker
- A system that tracks and manages reported bugs for a software program, to make it easier to address and fix the bugs.
- bug
- A missing or undesirable feature of a piece of software; the digital equivalent of a weed.
- build manager
-
A program that keeps track of how files depend on one another and runs commands to update any files that are out of date. Build managers were invented to compile only those parts of programs that had changed, but are now often used to implement workflows in which plots depend on result files, which in turn depend on raw data files or configuration files.
→ build rule, dependency, Makefile - build recipe
- The part of a build rule that describes how to update something that has fallen out of date.
- build rule
- A specification for a build manager that describes how some files depend on others and what to do if those files are out of date.
- build target
-
The file(s) that a build rule will update if they are out of date compared to their dependencies.
→ Makefile, default target - byte code
- A set of instructions designed to be executed efficiently by an interpreter.
- cache
- Something that stores copies of data so that future requests for it can be satisfied more quickly. The CPU in a computer uses a hardware cache to hold recently-accessed values; many programs rely on a software cache to reduce network traffic and latency. Figuring out when something in a cache is out of date and should be replaced is one of the two hard problems in computer science.
- caching
- To save a copy of some data in a local cache to make future access faster.
- call stack
- A data structure that stores information about the active subroutines executed.
- callback function
-
A function A that is passed to another function B so that B can call it at some later point. Callbacks can be used synchronously, as in generic functions like
map
that invoke a callback function once for each element in a collection, or ascynrhonously, as in a client that runs a callback when a response is received in answer to a request. - camel case
-
A style of writing code that involves naming variables and objects with no space, underscore (
_
), dot (.
), or dash (-
), with each word being capitalized. Examples includeCalculateSum
andfindPattern
.
→ kebab case, pothole case - Cascading Style Sheets (CSS)
-
A way to control the appearance of HTML. CSS is typically used to specify fonts, colors, and layout.
⊗ Français - catch (an exception)
-
To accept responsibility for handling an error or other unexpected event. R prefers “handling a condition” to “catching an exception”.
→ condition, handle (condition)
⊗ Arabic - CC-0
- A Creative Commons license that imposes no restrictions whatsover, thereby putting a work in the public domain.
- CC-BY
- The Creative Commons - Attribution license that requires people to give credit to the author of a work but imposes no other restrictions.
- centroid
-
The center or anchor of a group created by a clustering algorithm.
⊗ Português - character encoding
-
A specification of how characters are stored as bytes. The most commonly-used encoding today is UTF-8.
⊗ Arabic - chi-square test
-
A statistical method for estimating whether two variables in a cross tabulation are correlated. A chi-square distribution varies from a normal distribution based on the degrees of freedom used to calculate it.
⊗ Afrikaans - child (in a tree)
- A node in a tree that is below another node (call the parent).
- child class
-
In object-oriented programming, a class derived from another class (called the parent class).
⊗ Español - class
-
In object-oriented programming, a structure that combines data and operations (called methods). The program then uses a constructor to create an object with those properties and methods. Programmers generally put generic or reusable behavior in parent classes and more detailed or specific behavior in child classes.
⊗ Español - classification
-
The process of identifying which predefined category an item belongs to, such as deciding whether an email message is spam or not. Many machine learning algorithms perform classification.
→ supervised learning, clustering - client
- Typically, a program such as a web browser that gets data from a server and displays it to, or interacts with, users. The term is used more generally to refer to any program A that makes requests of another program B. A single program can be both a client and a server.
- closure
- A set of variables defined in the same scope whose existence has been preserved after that scope has ended.
- clustering
-
The process of dividing data into groups when the groups themselves are not known in advance.
→ centroid, classification, supervised learning, unsupervised learning
⊗ Português - code coverage (in testing)
- How much of a library or program is executed when tests run. This is normally reported as a percentage of lines of code: for example, if 40 out of 50 lines in a file are run during testing, those tests have 80% code coverage.
- code review
- To check a program or a change to a program by inspecting its source code.
- coercion
- see type coercion.
- cognitive load
- The amount of working memory needed to accomplish a set of simultaneous tasks.
- comma-separated values (CSV)
- A text format for tabular data in which each record is one row and fields are separated by commas. There are many minor variations, particularly around quoting of strings.
- command history
- An automatically-created list of previously-executed commands. Most REPLs, including the Unix shell, record history and allow users to play back recent commands.
- command-line argument
- A filename or control flag given to a command-line program when it is run.
- command-line interface (CLI)
- A user interface that relies solely on text for commands and output, typically running in a shell.
- comment
-
Text written in a script that is not treated as code to be run, but rather as text that describes what the code is doing. These are usually short notes, often beginning with a
#
(in many programming languages).
⊗ Français, Português - commit message
- A comment attached to a commit that explains what was done and why.
- commit
- As a verb, the act of saving a set of changes to a database or version control repository. As a noun, the changes saved.
- compile
- To translate textual source into another form. Programs in compiled languages are translated into machine instructions for a computer to run, and Markdown is usually translated into HTML for display.
- compiled language
- Originally, a language such as C or Fortran that is translated into machine instructions for execution. Languages such as Java are also compiled before execution, but into byte code instead of machine instructions, while languages like Python are compiled to byte code on the fly.
- compiler
-
An application that translates programs written in some languages into machine instructions or byte code.
⊗ Arabic - Comprehensive R Archive Network (CRAN)
-
A public repository of R packages.
→ base R, Tidyverse - computational linguistics
-
The study or application of computational methods for parsing or understanding human languages. Early approaches were algorithmic; most modern approaches are statistical.
→ natural language processing - computational notebook
- A combination of a document format that allows users to mix prose and code in a single file, and an application that executes that code interactively and in place. The Jupyter Notebook and R Markdown files are both examples of computational notebooks.
- condition
-
An error or other unexpected event that disrupts the normal flow of control.
→ handle (condition) - conditional expression
-
A ternary expression that serves the role of an if/else statement. For example, C and similar languages use the syntax
test : ifTrue ? ifFalse
to mean “choose the valueifTrue
iftest
is true, or the valueifFalse
if it is not”. - confidence interval
-
A range around an estimate that indicates the margin of error, combined with a probability that the actual value falls in that range.
⊗ Afrikaans - console
- A computer terminal where a user may enter commands, or a program that simulates such a device.
- constructor
-
A function that creates an object of a particular class. In the S3 object system, constructors are a convention rather than a requirement.
⊗ Español - continuation prompt
- A prompt that indicates that the command currently being typed is not yet complete and will not be run until it is.
- continuous integration
- A software development practice in which changes are automatically merged as soon as they become available.
- continuous random variable
-
A variable whose value can be any real value, either within a range or unbounded, such as age or distance.
→ discrete random variable - copy-on-modify
-
The practice of creating a new copy of aliased data whenever there is an attempt to modify it so that each reference will believe theirs is the only one.
→ aliasing - correlation coefficient
-
A measure of how well correlated two variables are. If the correlation coefficient between X and Y is 1.0, knowing X allows perfect prediction of Y. If the correlation coefficient is 0.0, knowing X tells you nothing about Y, and if it is -1.0, then X predicts Y, but change in X causes an opposite change in Y.
⊗ Afrikaans, Português - correlation
-
How well two variables agree with each other. Correlation is usually measured by calculating a correlation coefficient, and does not imply causation.
⊗ Afrikaans, Português - covariance
-
How well two variables agree with each other. The correlation coefficient is a normalized measure of covariance.
⊗ Português - Creative Commons license
-
A set of licenses that can be applied to published work. Each license is formed by concatenating one or more of
-BY
(Attribution): users must cite the original source;-SA
(ShareAlike): users must share their own work under a similar license;-NC
(NonCommercial): work may not be used for commercial purposes without the creator’s permission;-ND
(NoDerivatives): no derivative works (e.g., translations) can be created without the creator’s permission. Thus,CC-BY-NC
means “users must give attribution and cannot use commercially without permission The termCC-0
(zero, not letter ‘O’) is sometimes used to mean “no restrictions”, i.e., the work is in the public domain. - cross join
- A join that produces all possible combinations of rows from two tables.
- cross-validation
-
A technique that divides data into training data and test data. The training data and correct answers are used to find parameters, and the algorithm’s effectiveness is then measured by examining the answers it gives on the test data.
→ machine learning - cryptographic hash function
- A hash function that produces an apparently-random value for any input.
- current working directory
-
The folder or directory location that the program is operating in. Any action taken by the program occurs relative to this directory.
⊗ Français, Português - data engineer
-
Someone who sets up and runs data analyses. Data engineers are often responsible for installing software, managing databases, generating reports, and archiving results.
→ data scientist, data wrangling - data engineering
-
The pragmatic steps taken to make data usable, such as writing short programs to put mailing addresses in a uniform format.
→ data science - data frame
-
A two-dimensional data structure for storing tabular data in memory. Rows represent records and columns represent variables.
→ tidy data
⊗ Français - data mining
- The use of computers to search for patterns in large datasets. The term data science is now more commonly used.
- data package
- A software package that, mostly, contains only data. Is used to make it simpler to disseminate data for easier use.
- data science
-
The combination of statistics, programming, and hard work used to extract knowledge from data.
⊗ Português - data scientist
-
Someone who uses programming skills to solve statistical problems.
⊗ Português - data wrangling
- A colloquial name for small-scale data engineering.
- decision tree
-
A tree whose nodes are questions and whose branches eventually lead to a decision or classification.
→ random forests - Decorator pattern
- A design pattern in which a function adds additional features to another function or a class after its initial definition. Decorators are a feature of Python and can be implemented in most other languages as well.
- deep learning
- A family of neural network algorithms that use multiple layers to extract features at successively higher levels.
- default target
- The build target that is used when none is specified explicitly.
- default value
- A value assigned to a function parameter when the caller does not specify a value. Default values are specified as part of the function’s definition.
- defensive programming
- A set of programming practices that assumes mistakes will happen and either report or correct them, such as inserting assertions to report situations that are not ever supposed to occur.
- Delegate pattern
- A design pattern in which an object does most of the work to complete a task, but uses one of a set of other objects to complete some specific parts of the work. Delegation is often used instead of inheritance to customize objects’ behavior.
- dependency
- See prerequisite.
- dependent variable
-
A variable whose value depends on the value of another variable, which is called the independent variable.
⊗ Português - depth first
- To go through a nested data structure such as a tree by going as far as possible down one path, then as far as possible down the next and so on, or to explore a problem by following one solution to its conclusion and then trying the next.
- design pattern
-
A recurring pattern in software design that is specific enough to be worth naming, but not so specific that a single best implementation can be provided by a library. For example, data frames and database tables are instances of the same pattern.
→ Iterator pattern, Singleton pattern, Template Method pattern, Visitor pattern - destructuring assignment
- Unpacking values from data structures and assigning them to multiple variables in a single statement.
- dictionary
- A data structure that allows items to be looked up by value, sometimes called an associative array. Dictionaries are often implemented using hash tables.
- Digital Object Identifier (DOI)
-
A unique persistent identifier for a book, paper, report, software release, or other digital artefact.
→ ORCID - dimension reduction
-
Reducing the number of dimensions in a dataset, typically by finding the dimensions along which it varies most.
→ principal component analysis - discrete random variable
-
A variable whose value can take on only one of a fixed set of values, such as true or false.
→ continuous random variable - distro
- See software distribution.
- docstring
- Short for “documentation string”, a string appearing at the start of a module, class, or function in Python that automatically becomes that object’s documentation.
- Document Object Model (DOM)
- A standard in-memory representation of HTML and XML. Each element is stored as a node in a tree with a set of named attributes; contained elements are child nodes. Modern programming languages provide many libraries for searching and modifying the DOM.
- documentation generator
- A software tool that extracts specially-formatted comments or dostrings from code and generates cross-referenced developer documentation.
- DOM selector
-
A pattern that identifies nodes in a DOM tree. For example,
#alpha
matches nodes whoseid
attribute is “alpha”, while.beta
matches nodes whoseclass
attribute is “beta”.
→ regular expression - domain knowledge
- Understanding of a specific problem domain, e.g., knowledge of transportation logistics.
- double square brackets
-
An index enclosed in
[[...]]
, used to return a single value of the underlying type.
→ single square brackets - double
- Short for “double-precision floating-point number”, meaning a 64-bit numeric value with a fractional part and an exponent.
- down-vote
-
A vote against something.
→ up-vote - dynamic loading
- To import a module into the memory of a program while it is already running. Most interpreted languages use dynamic loading, and provide tools so that programs can find and load modules dynamically to configure themselves
- dynamic lookup
-
To find a function or a property of an object by name while a program is running. For example, instead of getting a specific property of an object using
obj.name
, a program might useobj[someVariable]
, wheresomeVariable
could hold"name"
or some other property name. - dynamic scoping
- To find the value of a variable by looking at what is on the call stack at the moment the lookup is done. Almost all programming languages use lexical_scoping instead, since it is more predictable.
- edge
- A connection between two nodes in a graph. An edge may have data associated with it, such as a name or distance.
- element
-
A named component in an HTML or XML document. Elements are usually written
<name>
…</name>
, where “…” represents the content of the element. Elements often have attributes.
→ empty element - Emacs (editor)
- A text editor that is popular among Unix programmers.
- empty element
-
An element of an HTML or XML document that has no children. Empty elements can always be written as
<name></name>
, but may also be written using the shorthand notation<name/>
(with a slash after the name inside the angle brackets). - empty vector
-
A vector that contains no elements. Empty vectors have a type such as logical or character, and are not the same as null.
⊗ Français - environment
- A structure that stores a set of variable names and the values they refer to.
- error (in a test)
- Signalled when something goes wrong in a unit test itself rather than in the system being tested. In this case, we do not know anything about the correctness of the system.
- error handling
- What a program does to detect and correct for errors. Examples include printing a message and using a default configuration if the user-specified configuration cannot be found.
- escape sequence
-
A sequence of characters used to represent some other character that would otherwise have a special meaning. For example, the escape sequence
\"
is used to represent a double-quote character inside a double-quoted string. - evaluation
-
The process of taking an expression such as
1+2*3/4
and turning it into a single irreducible value. - exception handler
- A piece of code that deals with an exception after it is caught, e.g., by writing a log message or retrying the operation that failed.
- exception
- An object that stores information about an error or other unusual event in a program. One part of a program will create and raise an exception to signal that something unexpected has happened; another part will catch it.
- expected result (of test)
-
The value that a piece of software is suposed to produced when tested in a certain way, or the state in which it is supposed to leave the system.
→ actual result (of test) - exploratory programming
- A software development methodology in which requirements emerge or change as the software is being written, often in response to results from early runs.
- export
- To make something visible outside a module so that other parts of a program can import it. In most languages a module must export things explicitly in order to manage name collision.
- fail (a test)
-
A test fails if the actual result does not match the expected result.
→ pass (a test) - false
-
The logical (Boolean) state opposite of “true”. Used in logic and programming to represent binary state of something.
→ truthy, falsy - falsy
-
Evaluating to false in a Boolean context.
→ truthy - feature (in data)
- A variable or observable in a dataset.
- feature (in software)
- Some aspect of software that was deliberately designed or built. A bug is an undesired feature.
- feature branch
-
A branch within a Git repository containing commits dedicated to a specific feature, e.g., a bug fix or a new function. This branch can be merged into another branch.
→ master branch - feature engineering
- The process of choosing the variables to be used as inputs to a model. Choosing good features often depends on domain knowledge.
- feature request
- A request to the maintainers or developers of a software program to add a specific functionality (a feature) to that program.
- field
-
A component of a record containing a single value. Every record in a tibble or database table has the same fields.
⊗ Français - filename extension
- The last part of a filename, usually following the ‘.’ symbol. Filename extensions are commonly used to indicate the type of content in the file, though there is usually no guarantee that this is correct.
- filename stem
-
The part of the filename that does not include the stem. For example, the stem of
glossary.yml
isglossary
. - filesystem
- The part of the operating system that manages how files are stored and retrieved. Also used to refer to all of those files and directories or the specific way they are stored (as in “the Unix filesystem”).
- filter
-
As a verb, to choose a set of records (i.e., rows of a table) based on the values they contain. As a noun, a command-line program that reads lines of text from files or standard input, performs some operation on them (such as filtering), and writes to a file or stdout.
⊗ Français - fixture
- The thing on which a test is run, such as the parameters to the function being tested or the file being processed.
- folder
- Another term for a directory.
- for loop
-
A statement in a program that repeats one or more other statements (the loop body) once for each item in a sequence, such as each number in a range or each element of a list.
→ while loop - fork
-
A copy of one person’s Git repository that lives in another person’s GitHub account. Changes to the content of a fork can be submitted to the upstream repository via a pull request.
→ branch - Frequently Asked Questions (FAQ)
- A curated list of questions commonly asked about a subject, along with answers.
- full identifier (of a commit)
- A unique 160-bit identifier for a commit in a Git repository, usually written as a 20-character hexadecimal character string.
- full join
-
A join that returns all rows and all columns from two tables A and B. Where the keys of A and B match, values are combined; where they do not, missing values from either table are filled with null, NA, or some other missing value.
→ left join, right join - fully-qualified name
-
An unambiguous name of the form
package::thing
. - functional programming
-
A style of programming in which data is transformed through successive application of functions, rather than by using control structures such as loops.
→ higher-order function, object-oriented programming - generator function
-
A function whose state is automatically saved when it returns a value so that execution can be restarted. Generator functions are typically used to produce streams of values that can be processed by for loops.
→ Iterator pattern - generic function
-
A collection of functions with similar purpose, each operating on a different class of data.
⊗ Español, Français, Português - Git branch
-
A snapshot of a version of a Git repository. Multiple branches can capture multiple versions of the same repository.
→ feature branch, fork, master branch
⊗ Français, Português - Git clone
-
Copies (and usually downloads) a Git remote repository onto the local computer.
⊗ Français - Git conflict
- A situation in which incompatible or overlapping changes have been made on different branches that are now being merged.
- Git fork
-
To make a new copy of a Git repository on a server, or the copy that is made.
→ Git clone - Git merge
- Merging branches in Git incorporates development histories of two branches in one. If changes are made to similar parts of the branches on both branches a commit will occur and this must be resolved before the merge will be completed.
- Git pull
-
Downloads and synchronizes changes between a remote repository and a local repository.
⊗ Français - Git push
-
Uploads and synchronizes changes between a local repository and a remote repository.
⊗ Français - Git remote
- A short name for a remote repository (like a bookmark).
- Git
-
A version control tool to record and manage changes to a project.
⊗ Français, Português - GitHub
-
A cloud-based platform built around Git that allows you to save versions of your project online and collaborate with other Git users.
⊗ Français, Português - global environment
-
The environment that holds top-level definitions in R, e.g., those written directly in the interpreter.
⊗ Français - global installation
-
Installing a package in a location where it can be accessed by all users and projects.
→ local installation
⊗ Français - global variable
-
A variable defined outside any particular function, which is therefore visible to all functions.
→ local variable
⊗ Español, Français - globbing
-
To specify a set of filenames using a simplified form of regular expressions, such as
*.dat
to mean “all files whose names end in.dat
”. The name is derived from “global”. - GNU Operating System (GNU)
-
“GNU” is an operating system that is free software. GNU is a recursive acronym for “GNU is Not Unix!”. The GNU operating system consists of GNU packages as well as free software realeased by third parties.
→ GNU Public License - GNU Public License (GPL)
-
A license that allows people to re-use software as long as they distribute the source of their changes.
→ GNU Operating System - gradient boosting
- A machine learning technique that produces an ensemble of weak prediction models (typically decision trees) in a stepwise fashion.
- gradient descent
-
An optimization algorithm that repeatedly calculates the gradient at the current point takes a small step in that direction, and then recalculates the gradient.
→ backpropagation - graph
-
- A plot or a chart that displays data, or 2. a data structure in which nodes are connected to one another by edges.
→ tree - graphical user interface (GUI)
-
A user interface that relies on windows, menus, pointers, and other graphical elements, as opposed to a command-line interface or voice-driven interface.
⊗ Português - group
- To divide data into subsets according to some criteria while leaving records in a single structure.
- handle (condition)
-
To accept responsibility for handling an error or other unexpected event. R prefers “handling a condition” to “catching an exception”.
→ condition, exception - hash function
- A function that turns arbitrary data into a bit array of a fixed size. Hash functions are used to determine where data should be stored in a hash table.
- hash table
- A data structure that calculates a pseudo-random key for each value and stores the value in that location. Hash tables enable fast lookup for arbitrary data at the cost of extra memory.
- header row
-
If present, the first row of a CSV file that defines column names (but tragically, not their data types or units).
→ comma-separated values - heterogeneous
-
Having mixed type. For example, an list can contain a mix of numbers, character strings, and values of other types.
→ homogeneous - hexadecimal
- A base-16 number system. Hexadecimal values are usually written using the digits 0-9 and the characters A-F in either upper or lower case. Hexadecimal is often used to represent binary values, since two hexadecimal digits exactly fit one byte.
- higher-order function
-
A function that operates on other functions. For example, the higher-order function
map
executes a given function once on each value in an list. Higher-order functions are heavily used in functional programming. - Hippocratic License
- An ethical software license that allows free use for any purpose that does not contravene the Universal Declaration of Human Rights.
- histogram
- A graphical representation of the distribution of a set of numeric data, usually a vertical bar graph.
- hitchhiker
- Someone who is part of a project but does not actually do any work on it.
- home directory
- A directory that contains a user’s files. Each user on a multi-user computer will have their own home directory; a personal computer will often only have one home directory.
- homogeneous
- Having a single type. For example, a vector must be homogeneous: its values must all be numeric, logical, etc.
- HTTP header
- A key-value pair at the top of an HTTP request or response that carries additional information such as the user’s preferred language or the length of the data being transferred.
- HTTP request
-
A message sent from a client to a server using the HTTP protocol asking for data. A request usually asks for a web page, image, or other data.
→ HTTP response - HTTP response
- A reply sent from a server to a client using the HTTP protocol in response to a request. The response usually contains a web page, image, or data.
- HyperText Markup Language (HTML)
-
The standard markup language used for web pages. HTML is represented in memory using DOM.
→ XML - HyperText Transfer Protocol (HTTP)
- The standard protocol for data transfer on the World-Wide Web. HTTP defines the format of requests and responses, the meanings of standard error codes, and other features.
- import
- To bring things from a module into a program for use. In most languages a program can only import things that the module explicitly exports.
- impostor syndrome
- The false belief that one’s successes are a result of accident or fraud rather than ability.
- in-place operator
-
An operator that updates one of its operands. For example, the expression
x += 2
uses the in-place operator+=
to add 2 to the current value ofx
and assign the result back tox
. - inner join
- A join that returns combinations of rows from two tables A and B whose keys match.
- instance
- An object of a particular class.
- Integrated Development Environment (IDE)
-
An application that helps programmers develop software. IDEs typically have a built-in editor, a console to execute code immediately, and browsers for exploring data structures in memory and files on disk.
→ read-eval-print loop
⊗ Español - integration test
-
A test that checks whether the parts of a system work properly when put together.
→ unit test - interpeter
- A program that runs other programs interactively.
- interpreted language
- A high-level language that is not executed directly by the computer, but instead is run by an interpreter that translates program instructions into machine commands on the fly.
- interpreter
- A program whose job it is to run programs written in a high-level interpreted language.
- invariant
- Something that is guaranteed to be true at some point in a program. Invariants are often expressed using assertions.
- ISO date format
-
An international for formatting dates. While the full standard is complex, the most common form is
YYYY-MM-DD
, i.e., a four-digit year, a two-digit month, and a two-digit day separated by hyphens. - issue tracking system
- Is similar to a bug tracking system in that it tracks “issues” made to a repository, usually in the form of feature requests, bug reports, or some other todo item.
- issue
- A bug report, feature request, or other to-do item associated with a project. Also called a ticket.
- Iterator pattern
-
A design pattern in which a temporary object or generator function produces each value from a collection in turn for processing. This pattern hides the differences between different kinds of data structures so that everything can be processed using loops.
→ Visitor pattern - JavaScript Object Notation (JSON)
-
A way to represent data by combining basic values like numbers and character strings in lists and name/value structures. The acronym stands for “JavaScript Object Notation”; unlike better-defined standards like XML, it is unencumbered by a syntax for comments or ways to define a schema.
→ YAML
⊗ Français - join
-
One of several operations that combine values from two tables.
→ anti join, cross join, full join, inner join, left join, right join, self join - k-means clustering
-
An unsupervised_learning algorithm that forms k groups by repeatedly calculating the centroid of the current groups and then reallocating data points to the nearest centroid until the centroids no longer move.
→ clustering - k-nearest neighbors
- A classification algorithm that classifies data points based on their similarity to nearby neighbors.
- kebab case
-
A naming convention in which the parts of a name are separated with dashes, as in
first-second-third
.
→ camel case, pothole case - key
- A field or combination of fields whose value(s) uniquely identify a record within a table or dataset. Keys are often used to select specific record and in joins.
- keyword arguments
-
Extra arguments given to a function as key-value pairs.
→ named argument, variable arguments - label (an issue)
-
A short textual tag associated with an issue to categorize it. Common labels include
bug
andfeature request
. - latent variable
- A variable that is not observed directly but instead is inferred from the states or values of other variables.
- LaTeX
-
A software system for document preparation that uses a specialized markup language to define a document structure (e.g. headings), stylise text, insert mathematical equations, and manage citations and cross-references. LaTeX is widely used in academia, in particular for scientific papers and theses in mathematics, physics, engineering and computer science.
⊗ Français - lazy evaluation
- Delaying evaluation of an expression until the value is actually needed (or at least until after the point where it is first encountered).
- left join
-
A join that combines data from two tables A and B. Where keys in table A match keys in table B, fields are concatenated. Where a key in table A does not match a key in table B, columns from table B are filled with null, NA, or some other missing value.
→ full join, right join - lexical scoping
- To look up the value associated with a name according to the textual structure of a program. Most programming languages use lexical scoping instead of dynamic scoping because the latter is less predictable.
- library
-
A reusable software package, also often called a module.
⊗ Português - license
- A legal document describing how something can be used and by whom.
- lifecycle
- The steps that something is allowed or required to go through. The lifecycle of an object runs from its construction through the operations it can or must perform before it is destroyed; the lifecycle of an issue may be “created”, “assigned”, “in progress”, “ready for review”, and “completed”.
- lift
- How well a model predicts or classifies things, measured as the ratio of the response in the segment identified to the response in the population as a whole. A lift of 1 means the model does no better than chance; higher lift means the model is doing better.
- line comment
-
A comment in a program that spans part of a single line, as opposed to a block comment that may span multiple lines.
⊗ Português - linear regression
-
A method for finding the best straight-line fit between two datasets, typically by minimizing the squares of the distances between the points and a line.
→ logistic regression - linter
-
A program that checks for common problems in software, such as violations of indentation rules or variable naming conventions. The name comes from the first tool of its kind, called
lint
. - Lisp
- A family of programming languages that represent programs and data as nested lists. Many other programming languages have borrowed ideas from Lisp.
- list comprehension
-
In Python, an expression that creates a new list in place. For example,
[2*x for x in values]
creates a new list whose items are the doubles of those invalues
. - list
- A vector that can contain values of many different types.
- literate programming
-
A programming paradigm that mixes prose and code.
→ R Markdown
⊗ Français - local installation
-
Placing a package inside a particular project so that it is only accessible within that project.
→ global installation - local variable
-
A variable defined inside a function which is only visible within that function.
→ closure, global variable
⊗ Español - log message
- A single entry in a log of a program’s execution. Log messages are usually highly structured so that data (such as the time or the severity) can be recovered later.
- log
- A record of a program’s execution containing messages written via a logging framework for later inspection.
- logging framework
- A software library that managing internal reporting for programs.
- logging level
-
A setting that controls how much information is generated by a logging framework. Typical logging levels include
DEBUG
,WARNING
, andERROR
. - logical indexing
- To index a vector or other structure with a vector of Booleans, keeping only the values that correspond to true values. Also referred to as masking.
- logistic regression
-
A method for fitting a model to some data that uses logistic (S-shaped) curves instead of straight lines.
→ linear regression - long identifier (of commit)
- See full identifier.
- long option
-
A full-word identifier for a command line argument. While most common flags are a single letter preceded by a dash, such as
-v
, long options typically use two dashes and a readable name, such as--verbose
. - loop body
- The statement or statements executed by a loop.
- machine learning
- The study or use of algorithms whose performance improves as they are given more data. Machine learning algorithms often use training data to build a model. Their performance is then measured by how well they predict the properties of test data.
- magic number
- An unnamed numerical constant that appears in a program without explanation.
- Make
- The original build manager for Unix, still in daily use after more than forty years.
- Makefile
-
A file containing commands for Make, often actually called
Makefile
. - Markdown
-
A markup language with a simple syntax intended as a replacement for HTML. Markdown is often used for README files, and is the basis for R markdown.
⊗ Français - Markov Chain
-
Any model describing a series of unfortunate events in which the probability of each event depends only on the current state, not on the path taken to reach that state.
→ Bayesian network, Monte Carlo method - markup language
-
A set of rules for annotating text to define its meaning or how it should be displayed. The markup is usually not displayed, but instead controls how the underlying text is interpreted or shown. Markdown and HTML are widely-used markup languages for web pages.
→ LaTeX, XML
⊗ Français - Martha's Rules
- A simple set of rules for making decisions in small groups.
- master branch
-
A dedicated, permanent, central branch which should contain a “ready product”. As a new feature is developed on a separate branch to avoid breaking the main code, it can be merged into the master branch.
→ feature branch - maximum likelihood estimation
- To choose the parameters for a probability distribution in order to maximize the likelihood of obtaining observed data.
- mean absolute error
-
The average error of all predicted values compared with actual values.
→ mean squared error, root mean squared error - mean squared error
-
The average of the squares of all the errors of all predicted values compared with actual values. Squaring makes larger errors count for more, making this a more popular measure than mean absolute error.
→ root mean squared error - mean
-
The average value of a dataset, more properly known as the arithmetic mean to distinguish it from the geometric and harmonic means.
→ median, mode
⊗ Português - median
-
A value separating the upper and lower halves of a sorted dataset. The median often gives a better idea of what is typical of the dataset than the mean, which can be influenced by a small number of extreme outliers.
→ mode - merge (Git)
- See Git merge
- method
-
An implementation of a generic function that handles objects of a specific class.
⊗ Español - milestone
- A target that a project is trying to meet, often represented as a set of issues that all have to be resolved by a certain time.
- MIME type
-
A standard way to identify the contents of files on the internet. The term is an acronym of “multi-purpose Internet mail extension”, and MIME types are often identified by filename extensions, such as
.png
for PNG-formatted images. - missing value
- A special value such as null or NA used to indicate the absence of data. Missing values can signal that data was not collected or that the data did not exist in the first place (e.g., the middle name of someone who does not have one).
- MIT License
- A license that allows people to re-use software with no restrictions.
- mock object
- A simplified replacement for part of a program whose behavior is easy to control and predict. Mock objects are used in unit tests to simulate databases, web services, and other complex systems.
- mode
-
The value that occurs most frequently in a dataset.
→ mean, median - model
- A specification of the mathematical relationship between different variables.
- module
- A reusable software package, also often called a library.
- Monte Carlo method
-
Any method or algorithm that relies on artificially-injected randomness.
→ Markov Chain - moving average
- The mean of each set of several several consecutive values from time series data.
- multi-threaded
- Capable of performing several operations simultaneously. Multi-threaded programs are usually more efficient than single-threaded ones, but also harder to understand and debug.
- mutation
- Changing data in place, such as modifying an element of an array or adding a record to a database.
- n-gram
- A sequence of $N$ items, typically words in natural language. For example, a trigram is a sequence of three words. N-grams are often used as input in computational linguistics.
- NA
-
A special value used to represent data that is not available.
→ null
⊗ Arabic - naive Bayes classifier
- Any classification algorithm based on Bayes’ Theorem that assumes every feature being classified is independent of every other feature.
- name collision
-
The ambiguity that arises when two or more things in a program that have the same name are active at the same time. Most languages use namespaces to prevent such collisions.
→ call stack, fully-qualified name - named argument
-
A function parameter that is given a value by explicitly naming it in a function call.
→ keyword arguments, variable arguments - namespace
-
A collection of names in a program that exists in isolation from other namespaces. Each function, object, class, or module in a program typically has its own namespace so that references to “X” in one part of a program do not accidentally refer to something called “X” in another part of the program.
→ name collision - Nano (editor)
- A very simple text editor found on most Unix systems.
- natural language processing (NLP)
- See computational linguistics.
- negative selection
- To specify the elements of a vector or other data structure that are not desired by negating their indices.
- neural network
-
One of a large family of algorithms for identifying patterns in data by mimicking the way neurons interact. A neural network consists of one or more layers of nodes, each of which is connected to nodes in the preceding and subsequent layer. If enough of a node’s inputs are active, that node activates as well.
→ deep learning, backpropagation, perceptron - node
- An element of a graph that is connected to other nodes by edges. Nodes typically have data associated with them, such as names or weights.
- non-blocking execution
- To allow a program to continue running while an operation is in progress. For example, many systems support non-blocking execution for file I/O so that the program can continue doing work while it waits for data to be read from or written to the filesystem (which is typically much slower than the CPU).
- normal distribution
- A continuous random distribution with a symmetric bell-curve shape. As datasets get larger, some of their most important statistical properties can be modeled using a normal distribution.
- NoSQL database
-
Any database that does not use the relational model. The awkward name comes from the fact that such databases do not use SQL as a query language.
→ relational database
⊗ Français - null hypothesis
-
The claim that any patterns seen in data are entirely due to chance. Other claims (e.g., “X causes Y”) must be much more likely than the null hypothesis in order to be substantiated.
→ p value
⊗ Português - null
- A special value used to represent a missing object. Null is not the same as NA, and neither is the same as an empty vector.
- nullary expression
-
An “expression” with no arguments, such as the value 3.
→ binary expression, ternary expression, unary expression - object-oriented programming (OOP)
-
A style of programming in which functions and data are bound together in objects that only interact with each other through well-defined interfaces.
⊗ Español - object
-
In object-oriented programming, a structure that contains the data for a specific instance of a class. The operations the object is capable of are defined by the class’s methods.
⊗ Español - objective function
-
A function of one or more variables used to measure or compare the goodness of different solutions in an optimization problem.
→ gradient descent - observation
- A value or property of a specific member of a population.
- off-by-one error
-
A common error in programming in which the program refers to element
i
of a structure when it should refer to elementi-1
ori+1
, or processesN
elements when it should processN-1
orN+1
. - open license
- A license that permits general re-use, such as the MIT License or GPL for software and CC-BY or CC-0 for data, prose, or other creative outputs.
- open science
- A generic term for making scientific software, data, and publications generally available.
- operating system
- A program that provides a standard interface to whatever hardware it is running on. Theoretically, any program that only interacts with the operating system should run on any computer that operating system runs on.
- optional_parameter
- A parameter that does not have to be given a value when a function is called. Most programming languages require programmers to define default values for optional parameters or assign them a special value automatically.
- ORCID
- An Open Researcher and Contributor ID that uniquely and persistently identifies an author of scholarly works. ORCIDs are for people what DOIs are for documents.
- orthogonality
- The ability to use various features of software in any combination. Orthogonal systems tend to be easier to understand, since features can be combined without worrying about unexpected interactions.
- outlier
-
Extreme values that might be measurement or recording errors, or might actually be rare events. Outliers are sometimes ignored when doing statistics, or handled or visualized separately.
→ overfitting - overfitting
-
Fitting a model so closely to one dataset that it does not generalize to others.
→ outlier - p value
- The probability of obtaining a result at least as strong as the one observed if the null_hypothesis is true (i.e., if variation is purely due to chance). The lower the p-value, the more likely it is that something other than chance is having an effect.
- package manager
- A program that does its best to keep track of the bits and bobs of software installed on a computer and their dependencies on one another.
- package
- A collection of code, data, and documentation that can be distributed and re-used. Also referred to in some languages as a library or module.
- pager
- A program that displays a few lines of text at a time.
- parameter
- A variable whose value is passed into a function when the function is called. Some writers distinguish parameters (the variables) from arguments (the values passed in), but others use the terms in the opposite sense. It is all very confusing.
- parent (in a tree)
- A node in a tree that is above another node (call a child). Every node in a tree except the root node has a single parent.
- parent class
-
In object-oriented programming, the class from which another class (called the child class) is derived.
⊗ Español - parent directory
-
The directory that contains another directory of interest. Going from a directory to its parent, then its parent, and so on eventually leads to the root directory of the filesystem.
→ subdirectory - parse
- To translate the text of a program or web page into a data structure in memory that the program can then manipulate.
- pass (a test)
-
A test passes if the actual result matches the expected result.
→ fail (a test) - patch
- A single file containin a set of changes to a set of files, separated by markers that indicate where each individual change should be applied.
- path (in filesystem)
-
A string that specifies a location in a filesystem. In Unix, the directories in a path are joined using
/
.
→ absolute path, relative path - pattern rule
- A generic build rule that describes how to update any file whose name matches a pattern. Pattern rules often use automatic variables to represent the actual filenames.
- Peanuts
- An American comic strip by Charles M. Schulz which has inspired the names of R versions.
- perceptron
- The simplest kind of [neural network])(#neural_network), which approximates a single neuron with N binary inputs by computing a weighted sum of its inputs and firing if that value is zero or greater.
- permalink
- Short for “permanent link”, a URL that is intended to last forever.
- phony target
- A build target that does not correspond to an actual file. Phony targets are often used to store commonly-used commands in a Makefile.
- pipe (in the Unix shell)
-
The
|
used to make the output of one command the input of the next. - pipe operator
-
The
%>%
used to make the output of one function the input of the next. - pivot table
- A technique for summarizing tabular data in which each cell represents the sum, average, or other function of the subset of the original data identified by the cell’s row and column heading.
- Poisson distribution
- A discrete random distribution that expresses the probability of $N$ events occurring in a fixed time interval if the events occur at a constant rate, independent of the time since the last event.
- positional argument
- An argument to a function that gets its value according to its place in the function’s definition, as opposed to a named argument that is explicitly matched by name.
- pothole case
-
A naming style that separates the parts of a name with underscores, as in
first_second_third
.
→ camel case, kebab case - prerequisite
-
Something that a build target depends on.
→ dependency - principal component analysis (PCA)
-
An algorithm that find the axis along which data varies most, then the axis that accounts for the largest part of the remaining variation, and so on.
→ dimension reduction - prior distribution
- The probability distribution that is assumed as a starting point when using Bayes’ Theorem and used to construct a more accurate posterior_distribution.
- probability distribution
- A mathematical description of all possible outcomes of a random event and the probability of each occurring.
- procedural programming
- A style of programming in which functions operate on data that is passed into them. The term is used in contrast to object-oriented programming.
- process
- An operating system’s representation of a running program. A process typically has some memory, the identify of the user who is running it, and a set of connections to open files.
- product manager
- The person responsible for defining what features a product should have.
- production code
- Software that is delivered to an end user. The term is used to distinguish such code from test code, deployment infrastructure, and everything else that programmers write along the way.
- project manager
- The person responsible for ensuring that a project moves forward.
- prompt
-
The text printed by a REPL or shell that indicates it is ready to accept another command. The default prompt in the Unix shell is usually
$
, while in Python it is>>>
.
→ continuation prompt - protocol
- Any standard specifying how two pieces of software interact. A network protocol such as HTTP defines the messages that clients and servers exchange on the World-Wide Web; object-oriented programs often define protocols for interactions between objects of different classes.
- provenance
- A record of where data originally came from and what was done to process it.
- pseudo-random number generator (PRNG)
-
A function that can generate pseudo-random numbers.
→ seed - pseudo-random number
- A value generated in a repeatable way that resemble the true randomness of the universe well enough to fool merely mortal observers.
- pull indexing
-
Vectorized indexing in which the value at location
i
in the index vector specifies which element of the source vector is being pulled into that location in the result vector, i.e.,result[i] = source[index[i]]
.
→ push indexing - pull request
-
The request to merge a new feature or correction created on a user’s fork of a Git repository into the upstream repository. The developer will be notified of the change, review it, make or suggest changes, and potentially merge it.
→ fork - push indexing
-
Vectorized indexing in which the value at location
i
in the index vector specifies an element of the result vector that gets the corresponding element of the source vector, i.e.,result[index[i]] = source[i]
. Push indexing can easily produce gaps and collisions.
→ pull indexing - Python Software Foundation (PSF)
- A non-profit organization that oversees and promotes the development and use of Python.
- Python
- A popular interpreted open-source programming language that relies on indentation to define control structure.
- quantile
- If a set of sorted values are divided into groups of each size, each group is called a quantile. For example, if there are five groups, each is called a quintile; the bottom quintile contains the lowest 20% of the values, while the top quintile contains the highest 20%.
- query string
-
The portion of a URL after the quesiton mark
?
that specifies extra parameters for the HTTP request as name-value pairs. - quosure
- A data structure containing an unevaluated expression and its environment.
- quoting function
- A function that is passed expressions rather than the values of those expressions.
- R (programming language)
-
A popular open source programming language used primarily for data science.
⊗ Français - R Consortium
- A group that supports the worldwide community of users, maintainers and developers of R. Its members include leading institutions and companies dedicated to the use, development and growth of R.
- R Foundation
- A non-profit founded by the R development core team providing support for R. It is a member of the R Consortium.
- R Hub
-
A free platform available to check a
R
package on several different platforms in preparation for the CRAN submission process. - R Markdown
-
A dialect of Markdown that allows authors to mix prose and code (usually written in R) in a single document. Cf. literate programming.
⊗ Español, Français - raise (an exception)
-
To signal that something unexpected or unusual has happened in a program by creating an exception and handing it to the error-handling system, which then tries to find a point in the program that will catch it.
→ throw (exception) - random forests
- A classification algorithm An algorithm used for regression or classification that uses decision trees. Each tree votes for a classification, and the algorithm chooses the classification having the most votes over all the trees in the forest.
- raster image
- An image stored as a matrix of pixels.
- reactive programming
- A style of programming in which actions are triggered by external events.
- reactive variable
- A variable whose value is automatically updated when some other value or values change. Reactive variables are used extensively in Shiny.
- read-eval-print loop (REPL)
-
An interactive program that reads a command typed in by a user, executes it, prints the result, and then waits patiently for the next command. REPLs are often used to explore new ideas or for debugging.
→ Integrated Development Environment - record
-
A group of related values that are stored together. A record may be represented as a tuple or as a row in a table; in the latter case, every record in the table has the same fields.
⊗ Français - recursion
-
Calling a function from within a call to that function, or defining a term using a simpler version of the same term.
⊗ Français - recycle
-
To re-use values from a shorter vector in order to generate a sequence of the same length as a longer one.
⊗ Español - redirection
- To send a request for a web page or web service to a different page or service.
- refactoring
- Reorganizing software without changing its behavior.
- regression testing
- Testing software to ensure that things which used to work have not been broken.
- regular expression
-
A pattern for matching text, written as text itself. Regular expressions are sometimes called “regexp”, “regex”, or “RE”, and are as powerful as they are cryptic.
⊗ Español - reinforcement learning
-
Any machine learning algorithm which is not given specific goals to meeting, but instead is given feedback on whether or not it is making progress.
→ supervised learning, unsupervised learning - relational database
-
A database that organizes information into tables, each of which has a fixed set of named fields (shown as columns) and a variable number of records (shown as rows).
→ SQL, table
⊗ Español, Français - relative error
-
The absolute value of the difference between the actual and correct value divided by the desired value. For example, if the actual value is 9 and the correct value is 10, the relative error is 0.1. Relative error is usually more useful than absolute error.
⊗ Français, Português - relative path
-
A path whose destination is interpreted relative to some other location, such as the current working directory. A relative path is the equivalent of giving directions using terms like “straight” and “left”.
→ absolute path
⊗ Français, Português - relative row number
-
The index of a row in a displayed portion of a table, which may or may not be the same as the absolute row number within the table.
⊗ Português - remote login
- Starting an interactive session on one computer from another computer, e.g., by using SSH.
- remote repository
- A repository located on another computer. Tools such as Git are designed to synchronize changes between local and remote repositories in order to share work.
- repository
-
A place where a version control system stores the files that make up a project and the metadata that describes their history.
→ Git, GitHub
⊗ Español, Português - reprex
- A reproducible example. When asking questions about coding problems online or filing issues on GitHub, you should always include a reprex so others can reproduce your problem and help. The reprex package can help!
- reproducible example
- See reprex.
- reproducible research
-
The practice of escribing and documenting research results in such a way that another researcher or person can re-run the analysis code on the exact data to obtain the same result.
⊗ Português - research software engineer (RSE)
- Someone whose primary responsibility is to build the specialized software that other researchers depend on.
- reStructured Text (reST)
-
A plaintext markup format used primarily in Python documentation.
→ Markdown - revision
- See commit.
- right join
-
A join that combines data from two tables A and B. Where keys in table A match keys in table B, fields are concatenated. Where a key in table A does not match a key in table B, columns from table A are filled with null, NA, or some other missing value.
→ full join, left join - root (in a tree)
- The node in a tree of which all other nodes are direct or indirect children, or equivalently the only node in the tree that has no parent.
- root directory
-
The directory that contains everything else, directly or indirectly. The root directory is written
/
(a bare forward slash). - root mean squared error (RMSE)
-
The square root of the mean squared error. Like the standard deviation, it is in the same units as the original data.
→ mean absolute error - rotating file
- A set of files used to store recent information. For example, there might be one file with results for each day of the week, so that results from last Tuesday are overwritten this Tuesday.
- S3
-
A framework for object-oriented programming in R.
⊗ Español - S4
- A framework for object-oriented programming in R.
- S
-
A language originally developed in Bell Labs for data analysis, statistical modeling, and graphics. R is a dialect of S.
⊗ Español - sandbox
- A testing environment that is separate from the production system, or an environment that is only allowed to perform a restricted set of operations for security reasons.
- scalar
-
A single value of a particular type, such as 1 or “a”. Scalars do not really exist in R; values that appear to be scalars are actually vectors of unit length.
⊗ Español - schema
-
A specification of the format of a dataset, including the name, format, and content of each table.
⊗ Français - scope
- The portion of a program within which a definition can be seen and used. Cf. closure, global variable, and local variable.
- script
- Originally, a program written in a language too usable for “real” programmers to take seriously; the term is now synonymous with program.
- search path
-
The list of directories that a program searches to find something. For example, the Unix shell uses the search path stored in the
PATH
variable when trying to find a program given its name. - Secure Shell (SSH)
- A program that allows secure access to remote computers.
- seed
- A value used to initialize a pseudo-random number generator.
- select
- To choose entire columns from a table by name or location.
- self join
- A join that combines a table with itself.
- semantic versioning
-
A standard for identifying software releases. In the version identifier
major.minor.patch
,major
changes when a new version of software is incompatible with old versions,minor
changes when new features are added to an existing version, andpatch
changes when small bugs are fixed. - sense vote
-
A preliminary vote used to determine whether further discussion is needed in a meeting.
→ Martha's Rules - server
- Typically, a program such as a database manager or web server that provides data to a client upon request.
- shebang
-
In Unix, a character sequence such as
#!python
in the first line of a runnable file that tells the shell what program to use to run that file. - shell script
- A set of commands for the shell stored in a file so that they can be re-executed. A shell script is effectively a program.
- shell variable
-
A variable set and used in the Unix shell. Commonly-used shell variables include
HOME
(the user’s home directory) andPATH
(their search path). - shell
- A command-line interface that allows a user to interact with the operating system, such as Bash (for Unix) or PowerShell (for Windows).
- short circuit test
-
A logical test that only evaluates as many arguments as it needs to. For example, if
A
is false, then most languages never evaluateB
in the expressionA and B
. - short identifier (of commit)
- The first few characters of a full identifier. Short identifiers are easy for people to type and say aloud, and are usually unique within a repository’s recent history.
- short option
-
A single-letter identifier for a command line argument. Most common flags are a single letter preceded by a dash, such as
-v
→ long option - side effect
- A change made by a function while it runs that is visible after the function finishes, such as modifying a global variable or writing to a file. Side effects make programs harder for people to understand, since the effects are not necessarily clear where the function is called.
- signal (a condition)
- A way of indicating that something has gone wrong in a program, or that some other unexpected event has occurred. R prefers “signalling a condition” to “raising an exception”.
- single square brackets
-
An index enclosed in
[...]
, used to select a structure from another structure.
→ double square brackets - single-threaded
- A model of program execution in which only one thing can happen at a time. Single-threaded execution is easier for people to understand but less efficient than multi-threaded execution.
- Singleton pattern
- A design pattern that creates a singleton object to manage some resource or service, such as a database or cache. In object-oriented programming, the pattern is usually implemented by hiding the constructor of the class in some way so that it can only be called once.
- singleton
-
A set with only one element, or a class with only one instance.
→ Singleton pattern - slug
-
An abbreviated portion of a page’s URL that uniquely identifies it. In the example
https://www.mysite.com/category/post-name
, the slug ispost-name
. - snake case
- See pothole case.
- software distribution
-
A set of programs that are built, tested, and distributed as a collection so that they can run together.
→ distro - source distribution
- A software distribution that includes the source code, typically so that programs can be recompiled on the target computer when they are installed.
- sprint
- A short, intense period of work on a project.
- SQL
-
The language used for writing queries for a relational database. The term was originally an acronym for Structured Query Language.
⊗ Español, Français - SSH key
- A string of random bits stored in a file that is used to identify a user for SSH. Each SSH key has separate public and private parts; the public part can safely be shared, but if the private part becomes known, the key is compromised.
- stack frame
- A section of the call stack that records details of a single call to a specific function.
- Stack Overflow
- A question-and-answer site popular among programmers.
- stale (in build)
- To be out of date compared to a prerequisite. A build manager’s job is to find and update things that are stale.
- standard deviation
-
How widely values in a dataset differ from the mean. It is calculated as the square root of the variance.
→ 68-95-99.7 rule - standard error
-
A predefined communication channel for a process, typically used for error messages.
→ standard input, standard output - standard input
-
A predefined communication channel for a process, typically used to read input from the keyboard or from the previous process in a pipe.
→ standard error, standard output - standard normal distribution
- A normal distribution with a mean of 0 and a standard deviation of 1. Values from normal distributions with other parameters can easily be rescaled to be on a standard normal distribution.
- standard output
-
A predefined communication channel for a process, typically used to send output to the screen or to the next process in a pipe.
→ standard error, standard input - stratified sampling
- Selecting values by dividing the overall population into homogeneous groups and then taking a random sample from each group.
- stream
- A sequential flow of data, such as the bits arriving across a network connection or the bytes read from a file.
- string interpolation
- The process of inserting text corresponding to specified values into a string, usually to make output human-readable.
- string
-
A block of text in a program. The term is short for “character string”.
⊗ Español - student's t-distribution
- See t-distribution.
- subcommand
-
A command that is part of a larger family of commands. For example,
git commit
is a subcommand of Git. - subdirectory
-
A directory that is below another directory.
→ parent directory - supervised learning
-
A machine learning algorithm in which a system is taught to classify values given training data containing previously-classified values.
→ unsupervised learning, reinforcement learning - support vector machine (SVM)
- A supervised learning algorithm that seeks to divide points in a dataset so that the empty space between the two sets is as wide as possible.
- synchronous
-
To happen at the same time. In programming, synchronous operations are ones that have to run simultaneously or complete at the same time.
→ asynchronous - systematic error
- See bias.
- t-distribution
-
A variation on the normal distribution that is adjusted to account for estimating variance from the sample instead of knowing it in advance.
→ student's t-distribution - tab completion
- A technique implemented by most REPLs, shells, and programming editors that completes a command, variable name, filename, or other text when the tab key is pressed.
- table
-
A set of records in a relational database or observations in a data frame. Tables are usually displayed as rows (each of which represents one record or observation) and columns (each of which represents a field or variable).
⊗ Français - tag (in version control)
- A readable label attached to a specific commit so that it can easily be referred to later.
- Template Method pattern
- A design pattern in which a parent class defines an overall sequence of operations by calling abstract methods that child classes must then implement. Each child class then behaves in the same general way, but implements the steps in different ways.
- ternary expression
-
An expression that has three parts. Conditional expressions are the only ternary expressions in most languages.
→ binary expression, nullary expression, unary expression - test runner
- A program that finds and runs software tests and reports their results.
- test-driven development (TDD)
- A programming practice in which tests are written before a new feature is added or a bug is fixed in order to clarify the goal.
- three Vs
- The volume, velocity, and variety that distinguish big data.
- throw (exception)
- Another term for raising an exception.
- tibble
-
A modern replacement for R’s data frame, which stores tabular data in columns and rows, defined and used in the tidyverse.
⊗ Español, Français - ticket
- See issue.
- ticketing system
- See issue tracking system.
- tidy data
-
Tabular data that satisfies three conditions that facilitate initial cleaning, and later exploration and analysis: (1) each variable forms a column, (2) each observation forms a row, and (3) each type of observation unit forms a table.
→ table
⊗ Español - Tidymodels
-
A collection of R packages for modeling and statistical analysis designed with a shared philosophy.
⊗ Français - Tidyverse
-
A collection of R packages for operating on tabular data in consistent ways.
⊗ Español, Français, Português - time series
-
A set of measurements taken at different times, which may or may not be regular intervals.
→ moving average - timestamp
- A digital identifier showing the time at which something was created or accessed. Timestamps should use ISO date format for portability.
- tolerance
- How closely the actual result of a test must agree with the expected result in order for the test to pass. Tolerances are usually expressed in terms of relative error.
- transitive dependency
- If A depends on B and B depends on C, C is a transitive dependency of A.
- tree
- A graph in which every node except the root has exactly one parent.
- triage
- To go through the issues associated with a project and decide which are currently priorities. Triage is one of the key responsibilities of a project manager.
- true
-
The logical (Boolean) state opposite of “false”. Used in logic and programming to represent binary state of something.
→ truthy, falsy - truthy
-
Evaluating to true in a Boolean context.
→ falsy - tuple
- A value that has multiple parts, such as the three color components of a red-green-blue color specification.
- two hard problems in computer science
- Refers to a quote by Phil Karlton: “There are only two hard problems in computer science: cache invalidation and naming things.” Many variations add a third problem (most often “off-by-one errors”).
- type coercion
-
To convert data from one type to another, e.g., from the integer
4
to the equivalent floating point number4.0
. - unary expression
-
An expression with one argument, such as
log 5
.
→ binary expression, nullary expression, ternary expression - Unicode
- A standard that defines numeric codes for many thousands of characters and symbols. Unicode does not define how those numbers are stored; that is done by standards like UTF-8.
- Uniform Resource Locator (URL)
- A unique address on the World-Wide Web. URLs originally identified web pages, but may also represent datasets or database queries, particularly if they include a query string.
- unit test
-
A test that exercises one function or feature of a piece of software and produces pass, fail, or error.
→ integration test - unsupervised learning
-
Algorithms that cluster data without knowing in advance what the groups will be.
→ supervised learning, reinforcement learning - up-vote
-
A vote in favor of something.
→ down-vote - update operator
- See in-place operator.
- upstream repository
- The remote repository that this repository was derived from. Programmers typically save changes in their own repository and then submit a pull request to the upstream repository where changes from other programmers are also collected.
- UTF-8
- A way to store the numeric codes representing Unicode characters in memory that is backward-compatible with the older ASCII standard.
- variable (data)
-
Some attribute of a population that can be measured or observed.
→ continuous random variable, discrete random variable - variable (program)
-
A name in a program that has some data associated with it. A variable’s value can be changed after definition.
→ constant
⊗ Arabic, Español, Français - variable arguments
-
In a function, the ability to take any number of arguments. R uses
...
to capture the “extra” arguments.
→ keyword arguments, named argument
⊗ Français - variance
- How widely values in a dataset differ from the mean. It is calculated as the average of the squared differences between the values and the mean. The standard deviation is often used instead, since it has the same units as the data while the variance is expressed in units squared.
- vector
-
A sequence of values, usually of homogeneous type. Vectors are the fundamental data structure in R; a scalar is just a vector with exactly one element.
⊗ Español - vectorize
-
To write code so that operations are performed on entire vectors, rather than element-by-element within loops.
⊗ Español - version control system
-
A system for managing changes made to software during its development.
→ Git
⊗ Español, Français, Português - vignette
- A long-form guide used to provide details of a package beyond the README.md or function documentation.
- Vim (editor)
- The default text editor on Unix. “How do I exit the Vim editor?” is one of the most popular questions on Stack Overflow.
- virtual environment
-
In Python, the
virtualenv
package allows you to create virtual, disposable, Python software environments containing only the packages and versions of packages you want to use for a particular project or task, and to install new packages into the environment without affecting other virtual environments or the system-wide default environment. - virtual machine
-
A program that pretends to be a computer. This may seem a bit redundant, but VMs are quick to create and start up, and changes made inside the virtual machine are contained within that VM so we can install new packages or run a completely different operating system without affecting the underlying computer.
⊗ Español - Visitor pattern
-
A design pattern in which the operation to be done is taken to each element of a data structure in turn. It is usually implemented by having a generator “visitor” that knows how to reach the structure’s elements, and which is given a function or method to call for each in turn that carries out the specific operation.
→ Iterator pattern - walk (a tree)
- To visit each node in a tree in some order, typically depth-first or breadth-first.
- while loop
-
A statement in a program that repeats one or more other statements (the loop body) as long as a condition is true.
→ for loop - whitespace
- The space, newline, carriage return, and horizontal and vertical tab characters that take up space but do not create a visible mark. The name comes from their appearance on a printed page in the era of typewriters.
- wildcard
-
A character expression that can match text, such as the
*
in*.csv
(which matches any filename whose name ends with.csv
). - XML
-
A set of rules for defining HTML-like tags and using them to format documents (typically data). XML was popular in the early 2000s, but its complexity led many programmers to adopt JSON instead.
⊗ Español, Français - YAML
-
Short for “YAML Ain’t Markup Language”, a way to represent nested data using indentation rather than the parentheses and commas of JSON. YAML is often used in configuration files and to define parameters for various flavors of Markdown documents.
⊗ Français