File reading in Python/Jupyter

  1. Working with files on CoLab
  2. Finding the input file
  3. Opening local files and text file issues

1. Data files and CoLab

  1. For this example, assume you have a file called "fred.txt" on your laptop containing at least a couple of lines of text that you want to open in Python in CoLab.

  2. Upload the file into the CoLab Notebooks subdirectory of your Google Drive

  3. Allow CoLab to mount your Google drive for file reading/writing. Execute the following in a CoLab cell:

from google.colab import drive
drive.mount('/content/gdrive')

This will immediately demand an authorization code, and give you a link to create one. Go to the link, get the code and copy it into the space provided, hit Enter. If all goes well, CoLab can now see the files in your Google Drive .

  1. Connect to the directory containing your uploaded file. To find that directory, import the "os" library, and use its "listdir" and "chdir" methods to find and connect to the Colab Notebooks directory, or wherever you uploaded "fred.txt".

  2. Open the file in Python for reading

In [0]:
from google.colab import drive
drive.mount('/content/gdrive')
Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=email%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdocs.test%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.photos.readonly%20https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fpeopleapi.readonly&response_type=code

Enter your authorization code:
··········
Mounted at /content/gdrive
In [0]:
import os
# Verify the location of the Colab Notebooks directory by listing its files...
# ...and you should see the "fred.txt" filename in the list
print (sorted(os.listdir('gdrive/My Drive/Colab Notebooks')))
# Connect to that directory:
os.chdir('gdrive/My Drive/Colab Notebooks')
# Now you should be able to open "fred.txt" for reading
['AI-HW-1.ipynb', 'BinTree.ipynb', 'CSP-LogicPuzzle2.ipynb', 'File Reading.ipynb', 'Python review - 1.ipynb', 'PythonReview-1 (1).ipynb', 'PythonReview-1.ipynb', 'Untitled.ipynb', 'Untitled0.ipynb', 'Untitled1.ipynb', 'Untitled2.ipynb', 'Untitled3.ipynb', 'Untitled4.ipynb', 'Untitled5.ipynb', 'Untitled6.ipynb', 'Untitled7.ipynb', 'Untitled8.ipynb', 'Untitled9.ipynb', 'dictall.txt', 'fred.txt', 'pre_input (1).txt', 'pre_input.txt']
first line
second line

Finding Files

Use the "listdir" and "chdir" methods of the "os" library to display and connect to directories:

import os

# os.listdir(dirname) returns the list of filenames in the named directory
filenames = os.listdir(dirname)
# or, for the current dir:
filenames = os.listdir('.')

# connect to a directory with
os.chdir(dirname)

3. Opening local files

  1. It's of course, easiest if your data files are in the same directory as your Python or Notebook file.

  2. There's a line-ending issue that arises if the source of your data files is a Windows system. On Windows, the normal text line-ending is "\r\n", not "\n". So...

try:
  fp = open('fred.txt','r')
  s = fp.read()
  fp.close()
  if '\r\n' in s:
    lines = s.split('\r\n')
  else:
    lines = s.split('\n')
except:
  print ("Dunno where fred.txt is, but it ain't here.")