\documentclass[11pt,twoside]{scrartcl}
%\documentclass[11pt,twoside]{article}
%opening
\newcommand{\lecid}{15-316}
\newcommand{\leccourse}{Software Foundations of Security and Privacy}
\newcommand{\lecdate}{} %e.g. {October 21, 2013}
\newcommand{\lecnum}{18}
\newcommand{\lectitle}{Safety and Information Flow on the Web: II}
\newcommand{\lecturer}{Matt Fredrikson}
\newcommand{\lecurl}{https://15316-cmu.github.io/index}
\usepackage{varwidth}
\usepackage{lecnotes}
\usepackage[irlabel]{bugcatch}
\usepackage{tikz}
\usetikzlibrary{automata,shapes,positioning,matrix,shapes.callouts,decorations.text,patterns,decorations.pathreplacing,matrix,arrows,chains,calc}
% \usepackage[bracketinterpret,seqinfers,sidenotecalculus]{logic}
% \newcommand{\I}{\interpretation[const=I]}
% \newcommand{\bebecomes}{\mathrel{::=}}
% \newcommand{\alternative}{~|~}
% \newcommand{\asfml}{F}
% \newcommand{\bsfml}{G}
% \newcommand{\cusfml}{C}
% \def\sqsubseteqftrule{L}%
% \def\rightrule{R}%
\begin{document}
\newcommand{\atrace}{\omega}%
%% the standard interpretation naming conventions
\newcommand{\stdI}{\dTLint[state=\omega]}%
\newcommand{\Ip}{\dTLint[trace=\atrace]}%
\newcommand{\ws}{\omega}\newcommand{\wt}{\nu}%
\newdimen{\linferenceRulehskipamount}
\linferenceRulehskipamount=2em
\linferenceRulevskipamount=0.6em
% \newcommand{\lowt}{\lowsec}
% \newcommand{\hight}{\hisec}
\lstdefinestyle{customc}{
belowcaptionskip=1\baselineskip,
breaklines=true,
language=C,
showstringspaces=false,
numbers=none,
% xleftmargin=1ex,
framexleftmargin=1ex,
% numbersep=5pt,
% numberstyle=\tiny\color{mygray},
basicstyle=\footnotesize\ttfamily,
keywordstyle=\color{blue},
commentstyle=\itshape\color{purple!40!black},
stringstyle=\color{orange},
morekeywords={output,assume,observe,input,bool,then,fun,match,in,val,list,type,of,string,unit,let,bytes,mov,imul,add,sar,shr,function,forall,nat,requires,ensures,method,returns,assert,new,array,modifies,reads,old,predicate,lemma,seq,calc,nan,var,exists,invariant,decreases,datatype,declassify,uint8},
tabsize=2,
deletestring=[b]',
backgroundcolor=\color{gray!15},
frame=tb
}
\lstset{escapechar=@,style=customc}
\maketitle
\thispagestyle{empty}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Introduction}
In the previous lecture we started discussing web applications, and covered a fair bit of ground regarding the platform and conventions. At the very end of the lecture, we hinted at a class of \emph{code injection} vulnerabilities that arise when untrusted inputs are used by the server side to compute responses. Today we will describe these vulnerabilities in greater depth, and continue on to \emph{cross-site scripting} (XSS) and \emph{cross-site request forgery} (CSRF) attacks. Along the way we will describe several best practices that can be used to mitigate such vulnerabilities in practice.
\section{Client-side injection vulnerabilities}
The use of dynamic server-side scripting opens up numerous possibilities for vulnerability. The most common among these are called \emph{client-side injection}, and occur when request arguments are used to perform actions on the server that are potentially unsafe.
\begin{figure}
\begin{lstlisting}[language=PHP]
$o';
?>
\end{lstlisting}
\caption{\label{fig:injection1} PHP script with a command injection vulnerability. This example is thanks to David Brumley.}
\end{figure}
Consider the PHP script shown in Figure~\ref{fig:injection1}. This might constitute the server-side component of a web application that allows users to ping a given host from the server, and returns the results of the \verb'ping' command directly back to the user. In order to do this, the PHP code calls \verb'shell_exec' to execute the \verb'ping' utility on an IP address given in the \verb'ip' argument of the URI request. For example, suppose that the web app were located at \nolinkurl{http://freeping.com/php/ping}. Then the following request would return the output of \verb'ping' in an HTML page:
\begin{verbatim}
http://freeping.com/php/ping?ip=28.2.42.10
\end{verbatim}
The given argument is simply concatenated with the string \verb'ping -C 3' before being sent to \verb'shell_exec', and this is where the vulnerability lies. Shell commands support composition with the usual semicolon location, so if one were to execute the command \verb'ping -C 3 28.2.42.10; ls', then the result would be the output of \verb'ping', followed by a listing of the directory from which the command was executed. An malicious user could exploit this fact by sending the request:
\begin{verbatim}
http://freeping.com/php/ping?ip=28.2.42.10%3b+ls
\end{verbatim}
URI parameters can contain non-alphanumeric symbols like semicolon as long as they are encoded in a particular way. Encodings consist of a percent sign followed by a hexadecimal code corresponding to the symbol. The \verb'+' is decoded as a space, as space characters are not allowed in URI strings.
The final result would be that when the PHP script is invoked, \verb'$_GET["ip"]$' returns the string \verb'ping -C 3 28.2.42.10; ls', which will in turn cause the generated HTML to contain the server's current directory listing. While a directory listing might not seem so bad, an enterprising attacker would leverage this to achieve quite a bit more. Sending an encoded version of the string:
\begin{verbatim}
ping -C 3 28.2.42.10; netcat -v -e ‘/bin/bash’ -l -p 31337
\end{verbatim}
Will cause a remote shell to open on port 31337, running at the same privilege level as the PHP interpreter.
\subsection{SQL injection}
\emph{Structured Query Language} (SQL) is a domain-specific language that is used to manage and interact with databases. SQL is very commonly-used to implement parts of server-side web applications, as it provides a rich set of commands for accessing, aggregating, and modifying the information stored in relational databases, and can be easily used with PHP.
Relational databases are collections of data modeled as one or more \emph{tables}, where each table is organized into columns and rows. A basic SQL query takes the form shown in Equation~\ref{eq:basicsql}, which returns a specified set of columns from a table satisfiying some Boolean expression.
\begin{equation}
\label{eq:basicsql}
\keywordfont{SELECT from where }
\end{equation}
Over the years SQL has grown into a rather large language with extensive functionality beyond selection queries, but for the purposes of this lecture it will suffice to understand this one sort of construct.
\begin{figure}
\begin{lstlisting}[language=PHP]
$result';
?>
\end{lstlisting}
\caption{\label{fig:injection2} PHP script with a SQL injection vulnerability. This example is thanks to David Brumley.}
\end{figure}
Many web applications function by taking user input from the client side, and using it to construct a SQL query that will fetch relevant information from a back-end database. For example, the PHP code in Figure~\ref{fig:injection2} reads the client-side parameter \verb'id', and constructs a \keywordfont{SELECT} query from it to look up the names of individuals with a certain user ID. By this point, you can probably guess what the vulnerability is. For example, if the user provided a string value for \keywordfont{id} as \verb'"1 or 1=1;"', then the condition used to select rows would contain the tautology \verb'1=1' in a disjunction, and thus return the \verb'firstname' and \verb'lastname' column of every row.
SQL injection vulnerabilities are among the most common vulnerabilities on the web today~\cite{owasp10}. Successful exploits often result in leaked sensitive information such as usernames, passwords, and personal data. For example, the famous CardSystems attack from 2005~\cite{cardsystems} was a SQL injection vulnerability that resulted in a leak of 40 million unencrypted credit card numbers stored in a relational database. This resulted in CardSystems, a third-party responsible for processing the payments of organizations like Visa and Mastercard, going out of business.
\subsection{Mitigation}
At first glance, it seems that the central problem here is that untrusted input was used as an argument to \verb'shell_exec'. But this is indeed unavoidable if we are to implement the necessary functionality for this application. A more nuanced view is that the PHP script blindly passed untrusted input to \verb'shell_exec' without first checking to make sure that it contained only an IP address, and nothing more. This is called \emph{input validation}, and is usually considered to be the best practice for avoiding client-side injection vulnerabilities.
However, input validation is a nuanced affair. There are multiple types of injection vulnerability; for example, if the PHP script accesses a back-end database using queries that are influenced by untrusted input, then without appropriate validation an attacker might read more of the database than intended, or worse yet, modify it. There is no silver bullet for input validation, and it must be done carefully on a case-by-case basis. Recent versions of PHP and other languages contain functions that assist in validating certain kinds of inputs (e.g., shell commands and database queries), and developers should only use those when dealing with such functionality. There are also static analysis tools that look for information flow between untrusted input and functions with potentially dangerous behavior, and subsequently advise developers on the best course of action for mitigating the potential vulnerability. But these tools are not perfect, and are no substitute for careful defensive programming to avoid injection attacks.
\paragraph{Parameterized queries.}
While injection vulnerabilities represent a very broad class of security issues that can apply to any situation in which untrusted inputs are used to interact with sensitive trusted entities such as databases and command shells, many of the common targets have developed more principled ways of incorporating untrusted input data. One such approach that is widely used to prevent SQL injection is \emph{parameterized queries} (sometimes called \emph{prepared} queries).
The idea behind parameterized queries is that when constructing SQL queries from user input using string operations, information about how the provided input relates to the query semantics is not available. For example, in Figure~\ref{fig:injection2} the user input is intended to represent an integer, which becomes part of an integer equality test within a Boolean expression. But the script treats it like any other string, blindly copying it into the larger SQL query without regard for its intended purpose.
\begin{figure}
\begin{lstlisting}[language=PHP]
prepare("SELECT lastname FROM users WHERE userid = ?");
$params = array($id);
$st->execute($params);
$result = $st->fetch()[1];
echo '$result
';
?>
\end{lstlisting}
\caption{\label{fig:paramquery} Example from Figure~\ref{fig:injection2} mitigated with PHP Data Objects, a form of parameterized queries built into PHP5 for safe interactions with back-end data stores.}
\end{figure}
Figure~\ref{fig:paramquery} shows the use of parameterized queries to address the injection vulnerability from the previous example in Figure~\ref{fig:injection2}. The part that is essential to the technique is \verb'$conn-prepare' statement, which instantiates a SQL query with ``holes'' left for the user-provided parameters (represented by question marks). The language runtime compiles the query without running it, leaving typed arguments for the parameters. The next line prepares the parameters using data passed in from the user, and the \verb'$st->execute' line runs the query with the given parameters. Behind the scenes, the language runtime takes care of sanitizing and type-checking the parameter against the compiled query, and finally running it.
\section{Cross-site scripting attacks}
So far the injection attacks that we have considered assume a threat model where a malicious user in control of the client side of a web application seeks to exfiltrate or modify data stored on the server. Another form of injection attack resides in the ``opposite'' model, where attacker-controlled information stored on the server compromises the client-side safety of end-users. These are called \emph{cross-site scripting attacks} (abbreviated ``XSS'').
\begin{figure}
\begin{lstlisting}[language=PHP]
prepare("INSERT INTO data VALUES (?, ?)");
$params = array(array($_GET['var'], $_GET['val']));
$st->execute($params);
} else if($action == 'get') {
$st = $conn->prepare("SELECT val FROM data WHERE var = ?");
$params = array($_GET['var']);
$st->execute($params);
echo '$st->fetch()[1]
';
}
?>
\end{lstlisting}
\caption{\label{fig:injection3} PHP script with a cross-site scripting vulnerability.}
\end{figure}
Consider the script shown in Figure~\ref{fig:injection3}, which is more or less an implementation of the task from Lab 0 in PHP for a server with a back-end SQL database. The script first reads from an input parameter \verb'act', which lets the user specify the action of either storing a value in a variable, or retrieving the value of a stored variable. Integers are boring, so let's assume that the intended functionality of the application is to let users associate string values with variable names in the database.
Now consider what happens when the user provides the following input, which we present without URL encoding to make it easier to read.
\begin{verbatim}
act=store&var=x&val=
\end{verbatim}
This will cause the server to store the string \verb'' in the database under variable \verb'x'. If another user subsequently issues the following request to get the value stored in \verb'x':
\begin{verbatim}
act=get&var=x
\end{verbatim}
Then the PHP script will render an HTML page with the \verb'