Assignment 1: Getting Started with Python

Author

Ryan M. Moore, PhD

Published

February 8, 2025

Modified

September 3, 2025

Welcome to your first Python programming assignment! We’ll start with the fundamentals, focusing on basic operations that will serve as building blocks for more complex bioinformatics tasks later on. This assignment builds on concepts from Tutorial 1 and introduces some additional string operations that you’ll find particularly useful in handling biological sequence data.

To make learning easier, each section follows a “learn by example” approach. First, I’ll walk you through a complete example that demonstrates key concepts. Rather than using life science content directly, these examples will use more familiar, silly scenarios to clearly illustrate the problem-solving techniques you’ll need. Then, you’ll tackle a similar practice problem with a life science application where you can apply what you’ve learned. The example problems are specifically designed to parallel the actual assigned problems, so you can use similar problem-solving strategies to complete your work.

Think of this as building a toolkit – we’ll start with basic tools and gradually add more sophisticated ones as we progress.

Suggestions for Code Organization

When you’re first learning to code, it’s helpful to break down your calculations into smaller, clearer steps.

Instead of this condensed approach:

sentence = "kittens love balls of yarn"

vowel_ratio = (
    sentence.count("a")
    + sentence.count("e")
    + sentence.count("i")
    + sentence.count("o")
    + sentence.count("u")
) / len(sentence)

print(vowel_ratio)

0.2692307692307692

Consider breaking it down like this:

sentence = "kittens love balls of yarn"

# Count the number of each vowel
a_count = sentence.count("a")
e_count = sentence.count("e")
i_count = sentence.count("i")
o_count = sentence.count("o")
u_count = sentence.count("u")

# Add up the vowel counts
vowel_count = a_count + e_count + i_count + o_count + u_count

# Get the total length of the sentence
sentence_length = len(sentence)

# Calculate the vowel ratio
vowel_ratio = vowel_count / sentence_length

print(vowel_ratio)

0.2692307692307692

This step-by-step approach is more verbose, but it can provide some advantages, especially as you are learning to code:

It’s easier to debug (you can check each intermediate value)
It’s more readable and self-documenting

As you gain more experience, you’ll develop an intuition for when to use more concise code versus when to break things down. For now, focus on writing code that you (and others) can easily understand and troubleshoot.

Basic DNA Sequence Analysis

Let’s start with some basic text operations. We’ll learn how to store text in variables, combine pieces of text together, and count specific characters.

One particularly useful feature we’ll explore is something called a “string method”. Basically, these are built-in tools that help you analyze and modify text. For instance, we can use the count string method to “count” how many times a particular character appears in a piece of text.

Example

In this example, we want to create a friendly greeting for someone and then figure out how enthusiastic that greeting is by counting exclamation marks. Let’s break it down into clear steps:

Create variables for a greeting (Hello) and a name (Ryan)
Join them together with a space and add some exclamation marks (!) to the end
Print the result of (2)
Count the number of exclamation marks
Print the result of (4)

Breaking problems down into small, manageable steps like this is a crucial skill in programming. Just like how you might break down a complex laboratory protocol into distinct steps, programming tasks become much more approachable when divided into smaller pieces.

Solution

Below is a complete solution to the problem we just broke down. I’ve added comments that point back to each step from our list – this helps show how each line of code connects to our original plan. When you write your own solution, you don’t need to include these “step-reference” comments. They’re just here to help you see how the pieces fit together and build that mental connection between planning and implementation.

# Step 1
greeting = "Hello"
name = "Ryan"

# Step 2
message = greeting + " " + name + "!!"

# Step 3
print(message)

# Step 4
exclamation_mark_count = message.count("!")

# Step 5
print(exclamation_mark_count)

Hello Ryan!!
2

You might notice something new here: message.count("!"). This is one of Python’s string methods – basically a built-in tool that can perform specific operations on text. We’ll explore these string methods more thoroughly later, but for now, just know that count() does exactly what its name suggests: it counts how many times a particular character or pattern appears in your text.

Let’s look at some simple examples:

p_count = "apple pie".count("p")
print(p_count)

name = "Juan Carlos"
a_count = name.count("a")
print(a_count)

3
2

When we write "apple pie".count("p"), we’re essentially asking Python “How many times does the letter p appear in the text apple pie?”

The count() method isn’t limited to single characters – it can also count longer patterns. Here’s an example:

silly_string = "ab abab ab"
ab_count = silly_string.count("ab")
print(ab_count)

Problem

In this problem, we’ll work with some DNA sequences, joining them together and analyzing the combined sequence to count specific nucleotides and patterns. You could think of this as a simplified version of looking for motifs or counting nucleotide usage.

Create variables for three DNA fragments, ATATAACTG, CTATGTAC, and GGTGAGTAT
Join (concatenate) the three DNA fragments into one
Print the result of (2)
Count how many A nucleotides there are in the joined sequence created in (2)
Print the result of (4)
Count the number of occurrences of the substring AT in the joined sequence created in (2)
Print the result of (6)

Solution

# Put the code for your solution here!

DNA Composition Analysis

DNA sequence analysis often starts with understanding basic nucleotide composition. While simple, these calculations form the foundation for many bioinformatics tasks and can help validate your data quality.

Let’s build on our string operations to analyze DNA sequences by counting bases and calculating GC content.

Example

Before getting to the DNA string problem, let’s practice working with strings by analyzing the spaces in a sentence. We’ll count the spaces, calculate what percentage of the sentence they represent, and format our output nicely. Here’s the step-by-step breakdown:

Create a variable for the sentence: the tiny kitten chomps the pencil and eats the homework
Count the number of spaces (" ") in the sentence
Print the result of (2) in a nice way using an f-string
1. Don’t just print out the number, rather print a nice phrase like the sentence has 9 spaces or something similar.
Calculate the percentage of spaces in the sentence
1. E.g., if there are 5 spaces and 20 total characters, the percent spaces would be 25%.
Print the result of (4) in a nice way using an f-string
1. The percent should only have one decimal place of precision!
2. E.g., rather than the sentence has 16.363636363636363% spaces, you should print then sentence has 16.4% spaces.

Solution

To print things in a nice, formatted way, we will use Python’s Formatted String Literals, or “f-strings”.

Let’s take a look:

# Step 1
sentence = "the tiny kitten chomps the pencil and eats the homework"

# Step 2
space_count = sentence.count(" ")

# Step 3
print(f"the sentence has {space_count} spaces")

# Step 4
sentence_length = len(sentence)
space_ratio = space_count / sentence_length
space_percentage = space_ratio * 100

# Step 5
print(f"the sentence has {space_percentage:.1f}% spaces")

the sentence has 9 spaces
the sentence has 16.4% spaces

As with our previous example, we can see how each instruction maps directly to its corresponding code. While this level of detail isn’t always necessary, it’s very helpful during the learning process.

Pay attention to the .1f part of the f-string from Step 5. That allows us to control the amount of precision that is printed. F-strings have a [lot more things like that]((https://docs.python.org/3/tutorial/inputoutput.html#formatted-string-literals) to help you create nicely formatted output.

Problem

We often need to analyze DNA sequences by counting specific nucleotides and calculating metrics like GC content, which can tell us important things about the DNA’s properties and origin. Here is the problem statement:

Given a DNA string GGAAGTTTTCCATTTTTAGTAAGAATTGATTT, calculate the GC percentage and display it in a nicely formatted way.

Let’s solve this problem step by step:

Create a variable for the DNA string: GGAAGTTTTCCATTTTTAGTAAGAATTGATTT
Count the number of Gs in the sequence
Count the number of Cs in the sequence
Print the result of (2) and (3) in a nice way using an f-string
1. E.g., something like the sequence has 5 Gs and 10 Cs
Calculate the GC percentage of the given DNA string
1. The GC percentage of a DNA string is the number of Gs plus the number of Cs divided by the total length of the sequence.
Print the result of (5) in a nice way using an f-string
1. The percent should only have one decimal place of precision!
2. E.g., rather than the GC percentage is 45.66666666666%, you should print the GC percentage is 45.7%.

Solution

# Put the code for your solution here!

Gene Structure Analysis

When working with genetic sequences, we often need to analyze their structure and make decisions based on specific patterns or features we find. In this exercise, we’ll explore how to use Python’s string handling capabilities and if/then logic to examine DNA sequences.

Think of it as creating a set of simple rules for your computer to follow, similar to the mental checklist you might use when manually analyzing sequences, but more systematic.

Example

As usual, let’s try a small example that covers the concepts you can use to solve the real problem. We want to write a program that can look at a sentence and tell us if it’s “good” based on two criteria: it needs to be long enough, and it needs proper ending punctuation.

Here are the step-by-step instructions:

Create a variable to hold the sentence: My favorite class is Programming!!
Create a variable to hold the minimum length threshold of 20 characters
Check the criteria for a good sentence:
1. Determine if the given sentence is long enough
2. Determine if the given sentence ends with either a period (.) or an exclamation mark (!)
Determine if the sentence is “good”
1. A sentence is “good” if it is long enough (3a) and ends in either a period or an exclamation mark (3b)
Use an f-string to print a nice message about whether the sentence is good or not good

Solution

Let’s take a look at one solution to the problem. For reference, I have included comments to connect each line of code back to the original instructions.

# Step 1
sentence = "My favorite class is Programming!!"

# Step 2
minimum_length_threshold = 20

# Step 3a
is_too_short = len(sentence) < minimum_length_threshold

# Step 3b
ends_with_period = sentence.endswith(".")
ends_with_exclamation = sentence.endswith("!")
ends_with_punctuation = ends_with_period or ends_with_exclamation

# Step 4
is_good_sentence = not is_too_short and ends_with_punctuation

# Step 5
if is_good_sentence:
    print(f"the sentence '{sentence}' is good :D")
else:
    print(f"the sentence '{sentence}' is not good D:")

the sentence 'My favorite class is Programming!!' is good :D

Note the use of sentence.endswith(...). This is another one of Python’s string methods. It tells you whether or not a string ends with the given argument. In this case, sentence.endswith(".") is checking if the sentence ends with a .. There is also a string method called startswith. Can you guess what it does? (You can find the answer on the Python docs for str.startswith).

Let’s look at how we can potentially improve our code’s readability. In the original version, we first checked if a sentence was too short (step 3a) and then had to use not to negate that condition in step 4. Here’s an alternative approach:

is_long_enough = len(sentence) >= minimum_length_threshold

# ... other code ...

is_good_sentence = is_long_enough and ends_with_punctuation

By checking if the sentence is ‘long enough’ instead of ‘too short’, we can use the condition directly without negation. This makes the logic more straightforward. Both approaches work fine – choose whichever you find more intuitive and readable.

Problem

Now that we have seen the example, let’s try the real problem!

We often need to identify valid genes within DNA sequences. A “good gene” might need certain characteristics. For example, it must be long enough to code for a protein, start with a specific sequence called a start codon, and end with one of several possible stop codons. Let’s write code to check if a given DNA sequence meets these criteria.

Here are the steps we will follow:

Create variables to hold the following data, one variable for each object
1. start codon: ATG
2. stop codons:
  1. TAA
  2. TAG
  3. TGA
3. minimum length threshold for a gene: 25
Create a variable to hold the gene: ATGCAATTAATTAATTCAGCGTGTAAATTGTAA
Check the criteria for a good gene:
1. Determine if the gene is long enough (i.e., is the gene at least as long as the minimum length threshold given above?)
2. Determine if the gene starts with the given start codon (hint: use the startswith string method)
3. Determine if the gene ends with at least one of the three given stop codons (hint: use the endswith string method)
Given the conditions calculated in (3), determine if the gene is a “good gene”
1. A gene is “good” if it is long enough (3a), has a start codon (3b) and has at least one of the stop codons (3c).
Use an f-string to print a nice message about whether the gene is good or not

In these steps, it has you basically defining all your variables up front, and then performing all the calculations. As you write code for your solution, keep the following questions in mind: Can you think of a better “order” for these steps? Do you think it would be more clear if the variable was defined directly above the expression in which it was used? (You don’t have to answer those questions in this assignment – it’s just something to keep in mind as you’re learning. Eventually, you will develop preferences for this sort of thing.)

Solution

# Put the code for your solution here!

More Complex Gene Analysis

For this last problem, you will need to combine concepts from the previous three problems.

You will be given multiple gene fragments (something like exons for example) that you will need to join together (concatenate), and then check some criteria to see if it is a good sequence or not.

This time, you will be given just the general task to accomplish, and you will have to create the step-by-step breakdown. If you get stuck, look back at the previous three problems and see how we broke the problem down into small, actionable steps that you can program.

Problem

Given the three gene fragments ATGCAATTAATGCT, CTGGGTAATTCAGCCC, and GTTGGCGTGTAAATTGTAA, determine if the full DNA sequence resulting from their concatenation forms a “good gene”. In this context a good gene has at least 40 nucleotides, has the start codon ATG, has one of the three stop codons (TAA, TAG, or TGA), and has a GC percentage of at least 40% and no more than 60%.

Note: In your solution, you must check for all the criteria, even if you can see that one of the checks would fail.

Note: In your solution, do not rely on the same variables that you created above being in scope here as well. For example, if you created a variable called stop_codon_1 = "TAA" a the problem above, you must recreate that variable in this code block as well.

Step-By-Step Breakdown

Start by writing a step-by-step breakdown of the problem here. Use the examples above as your guide, and write small, actionable steps that you can then write code to carry out.

(… write the steps here …)

First step…
Second step… … remaining steps …

Solution

# Put the code for your solution here!

Summary

In this first assignment, we explored fundamental Python programming concepts essential for bioinformatics work. Through progressively challenging exercises, we learned to manipulate strings, perform calculations, and implement logical conditions – all crucial skills for bioinformatics and data analysis. We practiced breaking down complex problems into manageable steps, from basic string operations to more sophisticated gene analysis tasks. While we worked with simple examples here, this problem-solving approach is essential for tackling real-world bioinformatics challenges. These foundational programming techniques will serve as building blocks as we tackle more advanced challenges in future assignments and miniprojects.