-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* enonce rattrapage * last * add exam
- Loading branch information
Showing
4 changed files
with
342 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,337 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# 1A - Enoncé 2024\n", | ||
"\n", | ||
"Toutes les questions valent deux points." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Exercice 1 : recherches de motifs\n", | ||
"\n", | ||
"On suppose qu'on a une séquence d'ADN ``\"ttcagttgtg aatgaatgga cgtgccaaat agacgtgccg ccgccgctcg attcgcactt cgtgccaaat\"``. Les caractères `a, t, g, c` sont appelés bases (nucléiques) et le fragment d'ADN est donc une séquence de bases.\n", | ||
"\n", | ||
"On s'intéresse dans ce sujet à la recherche d'un motif, c'est-à-dire d'une sous-séquence de bases, par exemple `ccaa`.\n", | ||
"\n", | ||
"On pourra écrire ``text = \"ttcagttgtg aatgaatgga cgtgccaaat agacgtgccg ccgccgctcg attcgcactt cgtgccaaat\".replace(\" \", \"\")``." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q1: Ecrire une fonction qui retourne la première position du motif en utilisant une boucle et sans la méthode `index`\n", | ||
"\n", | ||
"La fonction retourne la longueur du texte cherché si le motif n'est pas trouvé.\n", | ||
"\n", | ||
"``def recherche_motif(text, motif):``" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"24\n", | ||
"70\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"def recherche_motif(text, motif):\n", | ||
" d = len(motif)\n", | ||
" for i in range(len(text) - d + 1):\n", | ||
" if text[i : i + d] == motif:\n", | ||
" return i\n", | ||
" return len(text)\n", | ||
"\n", | ||
"\n", | ||
"text = \"ttcagttgtg aatgaatgga cgtgccaaat agacgtgccg ccgccgctcg attcgcactt cgtgccaaat\".replace(\n", | ||
" \" \", \"\"\n", | ||
")\n", | ||
"motif = \"ccaa\"\n", | ||
"\n", | ||
"print(recherche_motif(text, motif))\n", | ||
"print(recherche_motif(text, \"aaaaaa\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q2 : quel est, dans le pire des cas, le coût de l'algorithme ?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q3 : On souhaite maintenant écrire une fonction qui retourne toutes les positions d'un motif au sein d'un texte.\n", | ||
"\n", | ||
"``def positions_motif(text, motif):``" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[24, 64]\n", | ||
"[]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"def positions_motif(text, motif):\n", | ||
" pos = []\n", | ||
" p = recherche_motif(text, motif)\n", | ||
" while p < len(text) and len(pos) < 5:\n", | ||
" pos.append(p)\n", | ||
" p += len(motif) + recherche_motif(text[p + len(motif) :], motif)\n", | ||
" return pos\n", | ||
"\n", | ||
"\n", | ||
"text = \"ttcagttgtg aatgaatgga cgtgccaaat agacgtgccg ccgccgctcg attcgcactt cgtgccaaat\".replace(\n", | ||
" \" \", \"\"\n", | ||
")\n", | ||
"motif = \"ccaa\"\n", | ||
"\n", | ||
"print(positions_motif(text, motif))\n", | ||
"print(positions_motif(text, \"aaaaaaaaaa\"))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q4 : quel est, dans le pire des cas, le coût de l'algorithme ?" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q5 : comment modifier (ou pas) votre fonction pour cet exemple ?\n", | ||
"\n", | ||
"```python\n", | ||
"text = \"cgtgccaaaccaaacc\"\n", | ||
"motif = \"ccaaac\"\n", | ||
"assert positions_motif(text, motif) == [4, 9]\n", | ||
"```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[4]\n", | ||
"[4, 9]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"def positions_motif_chevauchement(text, motif):\n", | ||
" pos = []\n", | ||
" p = recherche_motif(text, motif)\n", | ||
" while p < len(text) and len(pos) < 5:\n", | ||
" pos.append(p)\n", | ||
" p += 1 + recherche_motif(text[p + 1 :], motif)\n", | ||
" return pos\n", | ||
"\n", | ||
"\n", | ||
"text = \"cgtgccaaaccaaacc\"\n", | ||
"motif = \"ccaaac\"\n", | ||
"\n", | ||
"print(positions_motif(text, motif))\n", | ||
"print(positions_motif_chevauchement(text, motif))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q6 : On veut accélérer le processus. On considère un couple de bases et non plus une seule base.\n", | ||
"\n", | ||
"Ecrire une fonction une fonction qui construit tous les couples de bases à partir des bases a,c,g,t." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"['aa', 'ac', 'ag', 'at', 'ca', 'cc', 'cg', 'ct', 'ga', 'gc', 'gg', 'gt', 'ta', 'tc', 'tg', 'tt']\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"def couple_base():\n", | ||
" res = []\n", | ||
" for a in \"acgt\":\n", | ||
" for b in \"acgt\":\n", | ||
" res.append(a + b)\n", | ||
" return res\n", | ||
"\n", | ||
"\n", | ||
"print(couple_base())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q7 : Transformer cette liste en dictionnaire.\n", | ||
"\n", | ||
"Le dictionnaire devra avoir pour clé un couple, pour valeur un nombre entier unique." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"{'aa': 0, 'ac': 1, 'ag': 2, 'at': 3, 'ca': 4, 'cc': 5, 'cg': 6, 'ct': 7, 'ga': 8, 'gc': 9, 'gg': 10, 'gt': 11, 'ta': 12, 'tc': 13, 'tg': 14, 'tt': 15}\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"res = couple_base()\n", | ||
"dico = {couple: i for i, couple in enumerate(res)}\n", | ||
"print(dico)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Q8 : Ecrire une fonction qui découpe une séquence de bases en séquence de couples de bases.\n", | ||
"\n", | ||
"Chaque gène sera représenté par le nombre entier défini à la question précédente. On supposera que la séquence est de longueur paire. On fera de même pour le motif." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"[15, 4, 11, 14, 14, 0, 14, 0, 14, 8, 6, 14, 5, 0, 3, 2, 1, 11, 9, 6, 5, 9, 6, 7, 6, 3, 13, 9, 1, 15, 6, 14, 5, 0, 3]\n", | ||
"[5, 0]\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"def decoupe_sequence(text, dico):\n", | ||
" return [dico[text[i : i + 2]] for i in range(0, len(text), 2)]\n", | ||
"\n", | ||
"\n", | ||
"text = \"ttcagttgtg aatgaatgga cgtgccaaat agacgtgccg ccgccgctcg attcgcactt cgtgccaaat\".replace(\n", | ||
" \" \", \"\"\n", | ||
")\n", | ||
"motif = \"ccaa\"\n", | ||
"text_couple = decoupe_sequence(text, dico)\n", | ||
"print(text_couple)\n", | ||
"motif_couple = decoupe_sequence(motif, dico)\n", | ||
"print(motif_couple)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q9 : appliquer l'une de vos précédente fonction de recherche avec ces deux séquences pour trouver les positions." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/plain": [ | ||
"[24, 64]" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"def positions_motif_couple(text_couple, motif_couple):\n", | ||
" position = positions_motif(text_couple, motif_couple)\n", | ||
" return [p * 2 for p in position]\n", | ||
"\n", | ||
"\n", | ||
"positions_motif_couple(text_couple, motif_couple)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Q10 : coût de la recherche et résultats équivalents\n", | ||
"\n", | ||
"* Quelle recherche est plus rapide ? Combien de fois plus rapide ?\n", | ||
"* Pourquoi ces deux recherches ne sont pas équivalentes ?\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3 (ipykernel)", | ||
"language": "python", | ||
"name": "python3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters