{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "47d1b74d",
   "metadata": {},
   "source": [
    "## Bulk load the data\n",
    "\n",
    "Run the `%load` command below and provide the inputs to kick off a bulk load."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4720c3a4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%load"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6be77adf",
   "metadata": {},
   "source": [
    "## Validate the data\n",
    "We expect 100 nodes and 118 edges in total."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "af3734bc",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.V().count()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1515c1d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.E().count()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "31aa6280",
   "metadata": {},
   "source": [
    "We expect the following counts for node and edge types:\n",
    "- LOCATION: 36\n",
    "- PERSON: 24\n",
    "- OBJECT: 25\n",
    "- EVENT: 15\n",
    "- hasAddress: 50 \n",
    "- personOfInterest: 15\n",
    "- evidenceItem: 17\n",
    "- ownedBy: 25\n",
    "- knows: 11"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f7f2d23b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.V().label().groupCount()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1133cebe",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.E().label().groupCount()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b04a7778",
   "metadata": {},
   "source": [
    "## Setting visualizations\n",
    "Run the cell below to set the icon shape and colors through the `%%graph_notebook_vis_options` magic."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "28ff0a8f",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%graph_notebook_vis_options\n",
    "\n",
    "{\n",
    "  \"groups\": {\n",
    "   \"PERSON\": {\n",
    "     \"shape\": \"icon\",\n",
    "     \"icon\": {\n",
    "       \"face\": \"FontAwesome\",\n",
    "       \"code\": \"\\uf007\",\n",
    "       \"color\": \"red\"\n",
    "     }\n",
    "   },\n",
    "   \"LOCATION\": {\n",
    "     \"shape\": \"icon\",\n",
    "     \"icon\": {\n",
    "       \"face\": \"FontAwesome\",\n",
    "       \"code\": \"\\uf124\",\n",
    "       \"color\": \"green\"\n",
    "     }\n",
    "   },\n",
    "   \"OBJECT\": {\n",
    "     \"shape\": \"icon\",\n",
    "     \"icon\": {\n",
    "       \"face\": \"FontAwesome\",\n",
    "       \"code\": \"\\uf042\",\n",
    "       \"color\": \"blue\"\n",
    "     }\n",
    "   },\n",
    "   \"EVENT\": {\n",
    "     \"shape\": \"icon\",\n",
    "     \"icon\": {\n",
    "       \"face\": \"FontAwesome\",\n",
    "       \"code\": \"\\uf133\",\n",
    "       \"color\": \"purple\"\n",
    "     }\n",
    "   }\n",
    " }\n",
    "}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9dbced2d",
   "metadata": {},
   "source": [
    "We also set the following variables, to change which values are visible on our graph visualization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6edd18d",
   "metadata": {},
   "outputs": [],
   "source": [
    "node_label = '{\"PERSON\":\"name\",\"EVENT\":\"type\",\"OBJECT\":\"type\",\"LOCATION\":\"streetAddress\"}'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18203ba5",
   "metadata": {},
   "source": [
    "## Visualizing some simple queries"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86344f81",
   "metadata": {},
   "source": [
    "First, we visualize the whole graph by dropping in at each vertex and hopping out to the immediate connecting vertex. Run the query below and click on the 'Graph' tab to see a visualization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e96050e",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.V().bothE().otherV().path()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ed6b705",
   "metadata": {},
   "source": [
    "Next, we find John Doe II and see what entities he's immediately connected to. Run the query below and click on the 'Graph' tab to see a visualization:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f68e28a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin -d $node_label -l 40\n",
    "\n",
    "g.V().has('name','John Doe II').bothE().otherV().path().by(elementMap())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6d556337",
   "metadata": {},
   "source": [
    "### Finding indirect connections to a crime\n",
    "Oftentimes crime is nuanced, and a person may be connected to a crime through another person or object for an unknown number of hops. Let’s see how many people there are that are connected to more than one crime, including connections that may be indirect."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed16034c",
   "metadata": {},
   "source": [
    "First, we'll grab all the paths between a person and a crime, even if the person is connected to the crime from more than one hop away:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb73a446",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.V().hasLabel('PERSON') // starting at each person\n",
    "     .repeat(in()).until(hasLabel('EVENT')).path() \n",
    "     // traverse along the inward edge to the adjacent \n",
    "     // vertex, until you reach an event"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6106c37f",
   "metadata": {},
   "source": [
    "Scrolling through all the results, you’ll notice that some paths have a direct connection between a person and an event, and some paths have object and/or person vertices between the starting person and event. This is what we want, since a person might be tied to an event via an object they own, or perhaps they know someone who is directly associated to an event (thereby making themselves indirectly associated). But because we only care about how many different events a person is associated with, we’ll need to pull just the person and the event information as pairs from each path:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d0214372",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "g.V().hasLabel('PERSON') \n",
    "     .repeat(in()).until(hasLabel('EVENT')).path()\n",
    "     .local(union(limit(local, 1), tail(local,1)).fold()).dedup()\n",
    "     // for each path that we found, create a pair that consists of the event \n",
    "     // and the person that is tied to the event, whether directly or indirectly;\n",
    "     // also get rid of duplicate pairs via the dedup() step "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1c23fc97",
   "metadata": {},
   "source": [
    "Now we have a list of person/event pairs, to denote which person is connected to which event, regardless if it’s a direct link to the event or through a number of other people/object connections. Let’s clean this up by grouping the pairs according to which person is represented, and only including people who are associated with more than one event:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f387c23b",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.V().hasLabel('PERSON')\n",
    "     .repeat(in()).until(hasLabel('EVENT')).path()\n",
    "     .local(union(limit(local,1), tail(local,1)).fold()).dedup()\n",
    "     .group().by(limit(local,1)).unfold() // group each event/person pair by person\n",
    "     .where(select(values).unfold().count().is(gt(1)))\n",
    "     // only keep the people that have appeared in more than one event"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "98e6e509",
   "metadata": {},
   "source": [
    "Now that we’ve grouped the person/event pairs by person, we can take this a step further and adjust the formatting to make it easier to read:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c5d271a6",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin\n",
    "\n",
    "g.V().hasLabel('PERSON')\n",
    "     .repeat(in()).until(hasLabel('EVENT')).path()\n",
    "     .local(union(limit(local,1), tail(local,1)).fold()).dedup()\n",
    "     .group().by(limit(local,1)).unfold()\n",
    "     .where(select(values).unfold().count().is(gt(1)))\n",
    "     .order().by(select(values).count(local),decr) // order the following map by decreasing count\n",
    "                                                   // of unique events\n",
    "     .project('Person','Events') // create a map with keys 'Person' and 'Events'\n",
    "     .by(select(keys)) // sets the value for key 'Person'\n",
    "     .by(select(values).unfold().tail(local,1).fold()) // sets the value for key 'Events'\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "67078a90",
   "metadata": {
    "scrolled": true
   },
   "source": [
    "It looks like Person-205 is associated (either directly and/or indirectly) with five different events. Let’s check this out in our graph:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "cc18be9d",
   "metadata": {},
   "outputs": [],
   "source": [
    "node_labels = '{\"EVENT\":\"offenseType\",\"PERSON\":\"name\",\"OBJECT\":\"type\",\"LOCATION\":\"city\"}'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3984dc06",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%gremlin -d $node_labels -l 40\n",
    "\n",
    "g.V('Person-205')\n",
    " .repeat(inE().outV()).until(hasLabel('EVENT')).path().by(elementMap())"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fa003696",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
