-
Notifications
You must be signed in to change notification settings - Fork 1
/
search.xml
224 lines (106 loc) · 153 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>Request to Cloud Storage API in Apps Script with OAuth2</title>
<link href="Request-to-Cloud-Storage-API-in-Apps-Script-with-OAuth2/"/>
<url>Request-to-Cloud-Storage-API-in-Apps-Script-with-OAuth2/</url>
<content type="html"><![CDATA[<h2 id="what-we-want-to-do">What we want to do?</h2><ul><li>We want to access the data in Cloud Storage through API from App Script.</li><li>But Google Cloud Storage, unlike <a href="https://developers.google.com/apps-script/reference/spreadsheet">Spreadsheet</a>, is not a built-in services in App Script, so we need to take an extra step to pass the credential.</li><li>From the <a href="https://cloud.google.com/storage/docs/authentication">document</a>, we know that the Cloud Storage uses OAuth2 protocol for API authentication and authorization.</li></ul><h2 id="what-steps-we-should-perform">What steps we should perform?</h2><ul><li><p>Authentication: First we (user/server) make a request for access token as a client.</p><blockquote><p>"Authentication is the process of determining the identity of a client."</p></blockquote></li><li><p>Authorization: Then the App calling Google APIs with the access token</p><blockquote><p>"Authorization is the process of determining what permissions an authenticated identity has on a set of specified resources."</p></blockquote></li></ul><h2 id="what-data-do-our-script-app-want-to-access">What data do our Script App want to access?</h2><ul><li><p>We need to identify our entity (human or non-human) by type of the data we want to access.</p></li><li><p>Here, our App need to access resources stored in Cloud Storage and perform actions on its own, so it holds a <strong>server account credential </strong>, which represents non-human users and follow the <a href="https://cloud.google.com/storage/docs/authentication#oauth-flows">server-centric auth flow</a>.</p></li><li><p>But if your app need to access the end user's data, you need to obtain <a href="https://cloud.google.com/storage/docs/authentication#user_accounts">user account credential</a> which allows the end user signing in to complete authentication.</p></li></ul><h2 id="how-to-create-credentials-and-perform-authentication">How to create credentials and perform authentication?</h2><p>It introduced three options to perform authentication in the <a href="https://cloud.google.com/storage/docs/authentication#gsutilauth">document</a>. If you develop locally or want to deploy with command line, you can select the first two methods with gsutil installed and pass credential (key file) to the environment. And you may also need to install command line tool <a href="https://github.com/google/clasp">clasp</a> helping you creating the deployment. But if you just want to deploy the App Script on the web editor, like I do, then choose the last method.</p><ol type="1"><li><p>gsutil authentication</p><ul><li><p>Use/create a service account, and download the associated private key which contains your service account credential.</p></li><li><p>Use gcloud command to authenticate:</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">gcloud auth activate-service-account --key-file xxx</span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><a href="https://cloud.google.com/storage/docs/reference/libraries">Client library authentication</a></p><ul><li><p>Install the client library for corresponding languages, e.g. Node.js</p><p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">npm install --save @google-cloud/storage</span><br></pre></td></tr></table></figure></p></li><li><p>Create a service account in the Cloud Console.</p></li><li><p>Create and download JSON <strong>service key</strong> file.</p></li><li><p>Setting the environment variable in current shell session</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">export</span> GOOGLE_APPLICATION_CREDENTIALS=<span class="string">"KEY_PATH"</span></span><br></pre></td></tr></table></figure></p></li><li><p>Pass the application credentials to the client library</p><p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Imports the Google Cloud client library</span></span><br><span class="line"><span class="keyword">const</span> {Storage} = <span class="built_in">require</span>(<span class="string">'@google-cloud/storage'</span>);</span><br><span class="line"><span class="keyword">const</span> storage = <span class="keyword">new</span> Storage();</span><br><span class="line"><span class="keyword">const</span> bucketName = <span class="string">'your-unique-bucket-name'</span>;</span><br><span class="line"></span><br><span class="line"><span class="keyword">async</span> <span class="function"><span class="keyword">function</span> <span class="title">createBucket</span>(<span class="params"></span>) </span>{</span><br><span class="line"> <span class="keyword">await</span> storage.createBucket(bucketName);</span><br><span class="line"> <span class="built_in">console</span>.log(<span class="string">`Bucket <span class="subst">${bucketName}</span> created.`</span>);</span><br><span class="line">}</span><br><span class="line">createBucket().catch(<span class="built_in">console</span>.error);</span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><a href="https://developers.google.com/identity/protocols/oauth2">API authentication under OAuth2 protocol</a></p><ul><li><p>It allows your application to access Google Cloud APIs on behalf of the end user.</p></li><li><p>The first way to perform authentication is making requests to Cloud Storage (XML/JSON) API, which requires a valid OAuth2 access token.</p><p><figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Code.gs</span></span><br><span class="line"><span class="keyword">var</span> url = <span class="string">'https://storage.googleapis.co/BUCKET/OBJECT'</span></span><br><span class="line"><span class="keyword">var</span> resp = UrlFetchApp.fetch(url, {</span><br><span class="line"> <span class="attr">method</span>: <span class="string">"GET"</span>,</span><br><span class="line"> <span class="attr">headers</span>: {</span><br><span class="line"> <span class="attr">Authorization</span>: <span class="string">'Bearer '</span>+ OAUTH2_TOKEN</span><br><span class="line"> },</span><br><span class="line"> <span class="string">'muteHttpExceptions'</span>: <span class="literal">true</span>,</span><br><span class="line">});</span><br></pre></td></tr></table></figure></p></li><li><p>You can generate an access token in the <a href="https://developers.google.com/oauthplayground/">OAuth2 Playground</a></p><ul><li>"Select & authorize APIs": <code>Cloud Storage API v1</code></li><li>Choose the API with right <a href="https://cloud.google.com/storage/docs/authentication#oauth-scopes">scopes</a>: read-only, read-write, full-control... Here we choose <code>https://www.googleapis.com/auth/devstorage.read_write</code>.</li><li>Click <code>Authorize APIs</code></li><li>Click <code>Exchange authorization code for tokens</code>, and copy the "Access token"</li></ul></li><li><p>Cons</p><ul><li>The access token expires in one hour, so we need to refresh the access token frequently.</li><li>We can use a refresh token to ease this inconvenience, but the refresh token also expires in 24 hours.</li></ul></li><li><p>Find the detailed guiding steps introduced by Microsoft in the following article: <a href="https://docs.microsoft.com/en-us/advertising/scripts/examples/authenticating-with-google-services">Authenticating with Google services</a></p></li></ul></li></ol><h2 id="how-to-automatically-fetch-the-access-token">How to automatically fetch the access token?</h2><ul><li><p>We expect to improve the procedures of the third method, hoping the access token can be automatically fetched. The solution is redirecting the Script web app to a clickable consent screen once the access token expires, and use the library "<a href="https://github.com/googleworkspace/apps-script-oauth2">OAuth2</a>" to deal with authentication parts.</p><p><img src="../images/20210512153630363.png" width="80%;" /></p></li><li><p>Add OAuth2</p><ul><li>Click the icon <code>+</code> next to "Libararies" in Script page</li><li>Search the ID of "OAuth2" and add it: "1B7FSrk5Zi6L1rSxxTDgDEUsPzlukDsi4KGuTMorsTQHhGBzBkMun4iDF"</li></ul></li><li><p>Create credentials</p><ul><li>Find your Script's ID by clicking on the menu item "Project Settings".</li><li>Go to Cloud platform console, navigate to "API & Services > Credentials"</li><li>Click on <code>+ CREATE CREDENTIALS</code> > <code>OAuth Client ID</code> > <code>Web Application</code></li><li>Add "https://script.google.com" to "Authorised JavaScript origins"</li><li>Add the following URIs (with {SCRIPT ID} replaced) to "Authorized redirect URIs" <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># Deployment of Apps Script</span></span><br><span class="line">https://script.google.com/macros/d/{SCRIPT ID}/usercallback</span><br><span class="line"><span class="comment"># Test deployment, its ID differs with the upper deployment's</span></span><br><span class="line">https://script.google.com/macros/s/{TEST SCRIPT ID}/usercallback</span><br></pre></td></tr></table></figure></li><li>Get the "Client ID" and "Client Secret" of this OAuth Client</li></ul></li><li><p>Usage of credentials</p><ul><li><p>Back to Script, and paste your "Client ID"&"Client Secret" on the config part of the "Code.gs" file.</p></li><li><p>Like the "OAuth2" lib <a href="https://github.com/googleworkspace/apps-script-oauth2#2-direct-the-user-to-the-authorization-url">said</a>, "Apps Script UI's are not allowed to redirect the user's window to a new URL, so you'll need to present the authorization URL as a link for the user to click. " So modify the doGet() function to present the url of "Consent Screen" once "OAuth2" does not "hasAccess()". <figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">doGet</span>(<span class="params">e</span>) </span>{</span><br><span class="line"> Logger.log(e.parameter);</span><br><span class="line"> <span class="keyword">var</span> storageService = getService();</span><br><span class="line"> <span class="keyword">if</span> (storageService.hasAccess()) {</span><br><span class="line"> <span class="keyword">var</span> html = HtmlService.createTemplateFromFile(<span class="string">'index'</span>);</span><br><span class="line"> <span class="keyword">return</span> html.evaluate();</span><br><span class="line"> } <span class="keyword">else</span> {</span><br><span class="line"> <span class="comment">// Show the clickable authorization url</span></span><br><span class="line"> <span class="keyword">var</span> authorizationUrl = storageService.getAuthorizationUrl();</span><br><span class="line"> <span class="keyword">var</span> template = HtmlService.createTemplate(</span><br><span class="line"> <span class="string">'<a> Click the link ---> </a>'</span>+</span><br><span class="line"> <span class="string">'<a href="<?= authorizationUrl ?>"target="_blank">Authorize</a>'</span>+</span><br><span class="line"> <span class="string">'<p> Refresh this page after you complete the authorization.</p>'</span></span><br><span class="line"> );</span><br><span class="line"> template.authorizationUrl = authorizationUrl;</span><br><span class="line"> Logger.log(<span class="string">'Open the following URL and re-run the script: %s'</span>, authorizationUrl);</span><br><span class="line"> <span class="keyword">return</span> template.evaluate();</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p><p><img src="../images/20210512153656860.png" width="90%;" /></p></li></ul></li><li><p>Once you've done the previous steps and seen the "Consent Screen" page, it will say, "Google hasn’t verified this app". Just click on "Continue", or you can follow the official <a href="https://developers.google.com/apps-script/guides/client-verification#requesting_verification">document</a> to deal with it.</p></li><li><p>Check out my program of <a href="https://github.com/Konfido/pdf2audiobook">pdf2audiobook</a> to see the details of implementation.</p></li></ul><h2 id="how-do-we-manage-credentials">How do we manage credentials?</h2><ul><li>"Do not embed secrets related to authentication in source code" (Check out <a href="https://cloud.google.com/docs/authentication/production#best_practices">best practices for managing credentials.</a> )So we need an another way to safely store our CLIENT_ID and CLIENT_SECRET.</li><li>Instead of embedding ID & SECRET directly in the gs code, I could use an input box to pass them in the authorization page. But as they stay in the back-end server side (I assume), so I just gonna bear with it.</li><li>If you know any simpler and safer way to do it, please leave it in the comments.</li></ul>]]></content>
<tags>
<tag> dev </tag>
</tags>
</entry>
<entry>
<title>PDF2Audio - Convert PDFs to Audiobooks with Machine Learning</title>
<link href="Convert-PDFs-to-Audiobooks-with-Machine-Learning/"/>
<url>Convert-PDFs-to-Audiobooks-with-Machine-Learning/</url>
<content type="html"><![CDATA[<h2 id="what-why">What & Why?</h2><ul><li><p>The other day, while I was wondering and searching if it's possible to convert research papers (PDF file) into audiobooks, which then could be well utilized on my way to the Lab or canteen, the following two articles caught my attention.</p><ul><li>Dale Markowitz: <a href="https://daleonai.com/pdf-to-audiobook">Convert PDFs to Audiobooks with Machine Learning</a></li><li>佐藤一憲 Kazunori Sato:<a href="https://cloud.google.com/blog/ja/products/ai-machine-learning/practical-machine-learning-with-automl-series-3">〜AutoMLで実践する〜 ビジネスユーザーのための機械学習入門シリーズ 【第 3 回】 「積ん読」と「体重増」の悩みを AutoML で解決しよう</a></li></ul></li><li><p>Here is an example clip of the generated audio:</p><p><audio style="height: 40px;" src="../images/20210518030009000.mp3" controls="" preload="metadata"></audio></p></li><li><p>The main converting process (from <a href="https://cloud.google.com/blog/ja/products/ai-machine-learning/practical-machine-learning-with-automl-series-3">Kazunori</a>)</p><p><img src="../images/20210430113414677.png" width=90%/></p></li><li><p>The <a href="https://cloud.google.com/blog/ja/products/ai-machine-learning/practical-machine-learning-with-automl-series-3">article</a> and <a href="https://github.com/kazunori279/pdf2audiobook">code</a> of Kazunori Sato are great and helped me a lot in understanding the general method and process, but they unfortunately didn't provide much information on what exactly procedures I should perform on each step. After working on it for a few days, I managed to accomplish this project and archive my initial purpose. I wrote this blog as a guide to elaborate the required operations, hoping it could help you well.</p></li><li><p>And check out my code on <a href="https://github.com/Konfido/pdf2audiobook">GitHub</a>.</p></li></ul><h2 id="what-google-servicesapis-are-we-gonna-use">What Google services/APIs are we gonna use?</h2><ul><li><a href="https://console.cloud.google.com/storage/browser">Cloud Storage</a>: We will create a bucket to retain uploaded PDF and generated MP3 files.</li><li><a href="https://cloud.google.com/functions">Cloud Functions</a>: Function-as-a-Service, used to describe the whole procedures.</li><li><a href="https://cloud.google.com/vision">Vision API</a>: Cloud OCR which extract features (text, positions, size ...) of PDF</li><li><a href="https://cloud.google.com/automl">AutoML Tables</a>:Layout classification (detect and delete unnecessary strings)</li><li><a href="https://cloud.google.com/text-to-speech">Text to Speech</a>: Speech synthesis</li><li><a href="https://developers.google.com/apps-script">Apps Script</a>:Annotation tool, easing the process of labelling dataset.</li></ul><h2 id="what-can-we-finally-achieve">What can we finally achieve?</h2><ul><li><p>Once we upload a PDF file to the bucket, all functions will be triggered sequentially and a MP3 file will be generated in the end.</p></li><li><p>We can also preset different voices for header, caption and the body of a paper, determining their accent, speed and pitch. And all parts will be naturally synthesized and merged together.</p></li></ul><h2 id="how-to-assess-the-final-result">How to assess the final result?</h2><ul><li><p>It's awesome to see that AI can actually help in my routine life.</p></li><li><p>But the Google Vision API definitely has room for improvement. Specifically, the most significant problem is the usability for two-column papers. It only relatively performs well for single-column ones.</p></li><li><p>And I overlooked a big issue that there are a lot of formulas in the papers. It usually take me a long time to understand them, let alone just hear some jumbled characters and symbols of them. So I lowered my expectations, only sought to understand general idea of the paper. And this purpose is achieved.</p></li></ul><h2 id="step-by-step-guide">Step-by-step guide</h2><h3 id="create-a-google-cloud-project">1. Create a Google Cloud project</h3><ul><li><p>Create a new project in <a href="https://console.cloud.google.com/">Cloud Platform console</a></p></li><li><p>Enable billing</p></li><li><p>Enable all APIs we intend to use!!</p><ul><li>Go to <a href="https://console.cloud.google.com/apis/dashboard">API and Services</a></li><li>"Dashboard"<ul><li>Click <code>+ Enable APIS AND SERVICES</code></li><li>Enable: Cloud Function API, Vision API, Cloud Build API, AutoML Tables, App Script API, Text-to-Speech API</li></ul></li><li>"OAuth consent screen"<ul><li>User Type: Make External</li><li>Scope: https://cloud.googleapis.com/auth/devstorage.read_write</li><li>Test User: your google email</li></ul></li><li>"Credentials": Prepare authorization for App Script to access Cloud Storage<ul><li>Click <code>Create credentials</code>: Choose "OAuth Cliend ID", and then "Web application"</li><li>Copy "Client ID" and "Client Secret"</li></ul></li></ul></li></ul><h3 id="create-a-bucket">2. Create a Bucket</h3><ul><li>Create bucket which will be used to store your uploaded pdf and generated audio.<ul><li>Go to <a href="https://console.cloud.google.com/storage">Cloud Storage</a>, click"CREATE BUCKET".</li></ul></li><li>Configs<ul><li>Location: us-central1 (Be same with the following setting of cloud function, or it could go wrong when you perform AutoML.)</li><li>Storage class: "Standard"</li><li>Access control: "Fine-grained"! We need to manipulate objects' permissions in this project.</li></ul></li></ul><h3 id="create-a-cloud-funtion">3. Create a Cloud Funtion</h3><ul><li><p>Go to <a href="https://console.cloud.google.com/functions">Cloud Function</a>, create a function and config as the following.</p></li><li><p>Setting part:</p><ul><li>Region: us-central1</li><li>Trigger type: Cloud Storage</li><li>Event type: Finalise/Create</li><li>Bucket: my_temp_bucket</li><li>Memory > 2GiB (to avoid "out of memory" issues)</li><li>Timeout: 540 seconds (It's time-consuming to merge mp3 fragments.)</li></ul></li><li><p>Codes part:</p><ul><li>Copy the code of "<a href="https://github.com/Konfido/pdf2audiobook/blob/master/functions/app/main.py">main.py</a>" and "requirement.txt" to corresponding places.</li><li>Runtime: "Python 3.7", Entry point: "p2a_gcs_trigger"</li><li>Modify some settings in "main.py" <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">ANOTATION_MODE = <span class="literal">True</span></span><br><span class="line">model_display_name = <span class="string">""</span> <span class="comment"># Leave it blank for now</span></span><br></pre></td></tr></table></figure></li></ul></li></ul><h3 id="create-an-apps-script-project">4. Create an Apps Script project</h3><ul><li><p>Go to <a href="https://script.google.com/home">Apps Script</a>, create a new project.</p></li><li><p>As Kazunori has created such a great <a href="https://github.com/kazunori279/pdf2audiobook/tree/master/functions/app">notation</a> App, we can use it from the beginning, helping us to finish the initial training dataset labeling.</p></li><li><p>I modified Kazunori's code of Cloud Function so that once the Vision API complete the OCR procedure and generate a "xxx-features.csv" file, another function will be triggered to forge a "xxx-labels.csv" file by duplicating "xxx-features.csv" and automatically adding a "label" column to it. By default, any area with more 100 characters in its "text" will be pre-labelled as "body", and all other areas will be pre-labelled as "other".</p></li><li><p>Make API request to Google Cloud Storage</p><ul><li>Kazunori's <a href="https://github.com/kazunori279/pdf2audiobook/blob/b21ccc1bd6e78a472bb01a353d5aa18c0dd1c405/apps-script/do_get.gs#L77">code</a> skips the credential part, but at the same time, it took me a lot effort into figuring out how to pass the credential, modifying the code and even writing another <a href="https://konfido.github.io/Request-to-Cloud-Storage-API-in-Apps-Script-with-OAuth2/">blog</a> to elaborate on it. I doubt that I missed out something that complicated the whole process. (My assumption is that I only modify and deploy the Script in the web editor, but instead, he set up the environment of credential in shell session and used the command line tool <a href="https://github.com/google/clasp">clasp</a> to create Script's deployment.) If you've tries the other method or know the right way to do it, please leave a comment below.</li><li>From <a href="https://cloud.google.com/storage/docs/authentication#oauth-flows">what</a> I know, the Cloud Storage uses OAuth2 protocol for API authentication and authorization. The access token can be found in <a href="https://developers.google.com/oauthplayground/">OAuth Playground</a> which can be directly added to the Kazunori's code. Or you can modify the code, like I did, which will help you simplify the authentication process by providing a clickable "Consent Screen".</li><li>If you want to use my <a href="https://github.com/Konfido/pdf2audiobook/tree/master/apps-script">code</a>, checkout my other <a href="https://konfido.github.io/Request-to-Cloud-Storage-API-in-Apps-Script-with-OAuth2/">blog</a> on setting details. But if you want to use Kazunori's code, it just need a little modification to the "fetch" request in "downloadLabels()" function, which should looks like this: <figure class="highlight javascript"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment">// Code.gs</span></span><br><span class="line"><span class="function"><span class="keyword">function</span> <span class="title">downloadLabels</span>(<span class="params"></span>) </span>{</span><br><span class="line"> ...</span><br><span class="line"> <span class="keyword">var</span> resp = UrlFetchApp.fetch(url, {</span><br><span class="line"> <span class="attr">method</span>: <span class="string">"GET"</span>,</span><br><span class="line"> <span class="attr">headers</span>: {</span><br><span class="line"> <span class="attr">Authorization</span>: <span class="string">'Bearer '</span>+ YOUR_TOKEN,</span><br><span class="line"> }</span><br><span class="line"> <span class="string">'muteHttpExceptions'</span>: <span class="literal">true</span>,</span><br><span class="line"> });</span><br><span class="line"> ...</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li><li>And this token will expire in 1h, so remember to refresh it when you encounter some 401 issues.</li></ul></li></ul><h3 id="create-a-spread-sheet">5. Create a Spread Sheet</h3><ul><li>We need to create a <a href="https://docs.google.com/spreadsheets">Spread Sheet</a> to store all labelled training data.</li><li>Take note of its ID in browser's address bar.</li></ul><h3 id="label-the-training-dataset">6. Label the training dataset</h3><ul><li><p>Once you've done the previous step and copied the code into corresponding places, you can select out one PDF file as our first labelling target and upload it to the bucket. It will generate a "xxx-features.csv" file.</p></li><li><p>Paste the name and other configs of the file to "Code.gs" in Script. <figure class="highlight js"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">var</span> PDF_NAME = <span class="string">'<YOUR PDF FILE NAME>'</span>;</span><br><span class="line"><span class="keyword">var</span> BUCKET_NAME = <span class="string">'<YOUR BUCKET NAME>'</span>;</span><br><span class="line"><span class="keyword">var</span> SHEET_ID = <span class="string">'<YOUR SHEET ID>'</span>;</span><br></pre></td></tr></table></figure></p></li><li><p>Go the test web app we already created, refresh the webpage. And finally we can see the fabulous labelling webpage. Hooray!</p></li><li><p>Click the element with wrong labels on the webpage, loop through four colors until presenting the right one. Note the spreadsheet, on which the corresponding label part will change.</p><ul><li>Others: gray</li><li>Body: Yellow</li><li>Header: red</li><li>Caption: blue</li></ul></li><li><p>Once the labelling is done, switch to another PDF file and start labelling again. (There exists one issue, that I don't why the webpage always fails on the first refresh after modifying the PDF_NAME. So remember to refresh the webpage when you encounter some unreasonable problem.)</p></li><li><p>After we've labeled all the PDF, copy all Spreed Sheet (tab) into one and download the it as a CSV file.</p></li></ul><h3 id="create-an-automl-table">7. Create an AutoML Table</h3><ul><li><p>Go to <a href="https://console.cloud.google.com/automl-tables/introduction">AutoML Table</a></p><ul><li>Click <code>NEW DATA SET</code> > <code>Upload files from your computer</code>: select our just downloaded "all_labels.csv" file.</li><li>Select "label" as target</li><li>Click <code>Train</code>, uncheck "id" in features, train 2h</li><li>Model name: "label_predict"</li></ul></li><li><p>The training will automatically stop once the model start overfitting. (My training dataset has 3500 entries and the training early stops after about 40 minutes.)</p></li><li><p>And the evaluate result seems not bad, and now our model can predict the right labels (body, caption, header, other) from the feature files generated by Vision API.</p><p><img src="../images/20210508171525893.png" width="90%" /></p></li></ul><h3 id="select-an-audio-voice">8. Select an audio voice</h3><ul><li>Go to the page of <a href="https://cloud.google.com/text-to-speech">Cloud Text-to-Speech</a><ul><li>Adjust the options to get your preferred voice.</li><li>Click <code>Show JSON</code>, note the config values.</li></ul></li><li>Inspired by <a href="https://daleonai.com/pdf-to-audiobook">Markowitz</a>, I also modified the Kazunori's code to synthesize different voices for "header", "caption" and "body".</li></ul><h3 id="edit-the-cloud-function">9. Edit the Cloud Function</h3><ul><li><p>Go to the <a href="https://console.cloud.google.com/functions">Cloud Function</a></p><ul><li>Now we should disable the ANNOTATION_MODE, switching to "Speech Mode".</li><li>Add the name of your AutoML (prediction) model and adjust the configuration of your preferred voice. <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line">ANOTATION_MODE = <span class="literal">False</span></span><br><span class="line">model_display_name = <span class="string">"<you-predict-model>"</span></span><br><span class="line">LANGUAGE_CODE = <span class="string">"en-GB"</span></span><br><span class="line">PITCH = {...}</span><br><span class="line">SPEAKING_RATE = {...}</span><br><span class="line">NAME = {...}</span><br></pre></td></tr></table></figure></li></ul></li><li><p>Deploy the Function again.</p></li></ul><h3 id="generate-the-audio">10. Generate the audio</h3><ul><li><p>Upload a new PDF file to your bucket</p></li><li><p>Navigate to "LOGS" tab of the Cloud Function, check out how the process goes, or just wait for a few minutes.</p></li><li><p>And finally, enjoy your audio. Hooray!</p></li></ul><h2 id="deploy-and-debug-with-command-line">Deploy and Debug with command line</h2><p>I list some general steps when you deploy and debug functions with command line. Please refer to the <a href="https://cloud.google.com/vision/docs/setup">document</a> for details.</p><ul><li><p>Set up the environment</p><ul><li>Create service account</li><li>Create a service account key</li><li>Set environment</li><li>Download and install Google Cloud SDK</li><li>Install Vision Client Libraries</li></ul></li><li><p>Register a trigger function to Google Cloud Function</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">cd</span> ./<span class="built_in">functions</span>/app</span><br><span class="line">gcloud <span class="built_in">functions</span> deploy p2a_gcs_trigger --runtime python37 --trigger-bucket <bucket> --memory=2048MB --timeout=540</span><br></pre></td></tr></table></figure></p></li><li><p>Wait a few minutes till the Cloud Function is running.</p></li><li><p><strong>NOTE</strong></p><ul><li>After you modify the of Cloud Function, the new version of deployment may not update at all! Referring to this <a href="https://stackoverflow.com/questions/65674730/google-cloud-function-doesnt-update-on-change-when-using-deployment-manager">QA</a>, I decide to add an extra parameter "version" in the setting, and deliberately change its value on every deployment.</li></ul></li></ul><h2 id="references">References</h2><p>These blogs also helped me a lot finishing this project.</p><ul><li><p><a href="https://docs.microsoft.com/en-us/advertising/scripts/examples/authenticating-with-google-services">Authenticating with Google services</a></p></li><li><p><a href="http://blog.warehouseman.com/2014/10/getting-started-with-google-cloud.html">Getting started with Google Cloud Datastore for Google Apps Script</a></p></li><li><p><a href="https://www.labnol.org/code/20074-upload-files-to-google-cloud-storage">Upload Files to Google Cloud Storage with Google Scripts</a></p></li></ul>]]></content>
<tags>
<tag> project </tag>
<tag> ML </tag>
</tags>
</entry>
<entry>
<title>压缩已存数据的NTFS硬盘,创建 Mac TimeMachine</title>
<link href="Use-NTFS-hard-drive-to-create-Mac-TimeMachine/"/>
<url>Use-NTFS-hard-drive-to-create-Mac-TimeMachine/</url>
<content type="html"><![CDATA[<h2 id="背景">背景</h2><ul><li>硬盘:西数1T硬盘,NTFS格式,已经存储200G数据</li><li>目标:想从硬盘中划出300G空间作为Mac的TimeMachine备份分区,但并没有其他硬盘可以备份已储数据,因此无法全盘格式化后再重新分区。</li><li>查找各种方法后,终于通过以下步骤实现上述目标。</li></ul><h2 id="方法步骤">方法步骤</h2><p>本文操作环境:MBP 256G Mojave,Parallel Win10虚拟机</p><ol type="1"><li><p>压缩硬盘分区</p><ul><li><p>使用Windows自带磁盘管理工具</p></li><li><p>在硬盘主分区上右键,选“压缩卷”,<a href="https://www.iplaysoft.com/tools/partition-calculator/">压缩空间量</a>300G(307204 MB)</p></li><li><p>右键“未分配”的300G,新建简单卷(此时默认为NTFS格式)</p></li></ul></li><li><p>创建EFI(ESP)分区</p><ul><li><p>用于保存系统引导文件,如果略过此步骤直接进行下一步抹除,可能会失败报错:“Mediakit 报告设备上空间不足以执行此操作”</p></li><li><p>Windows自带磁盘管理工具+<a href="http://www.diskgenius.cn/">DiskGenius</a></p></li><li><p>使用Windows磁盘管理工具磁盘压缩出额外300M(大于200M即可)空间</p></li><li><p>使用DiskGenius,点选该300M未分配空间,右键“建立ESP/MSR分区”</p><p><img src="../images/20190930140443835.png" alt="20190930140443835" style="zoom: 50%;" /></p></li><li><p>硬盘预分区完成</p></li></ul></li><li><p>格式化为MacOSX分区</p><ul><li>使用Mac自带磁盘工具</li><li>选择300G的用于TimeMachine分区,选择抹除,格式为“Mac OS 扩展(日志式)”</li><li>打开Time Machine,成功开始备份!</li></ul></li></ol>]]></content>
<tags>
<tag> Mac </tag>
</tags>
</entry>
<entry>
<title>Python encode decode & 字符编码检测</title>
<link href="encoding-and-decoding-in-Python/"/>
<url>encoding-and-decoding-in-Python/</url>
<content type="html"><![CDATA[<h2 id="编码类型">编码类型</h2><ul><li>万国码/字码表:unicode</li><li>编码方法 (将unicode数据编码成byte数据):产生不同的字符集<ul><li>英文:ascii,utf-8</li><li>中文:utf-8,utf-16,gbk</li></ul></li></ul><h2 id="编码转换">编码转换</h2><ul><li><code>encode()</code>: str (unicode) -> bytes</li><li><code>decode()</code>:bytes -> str (unicode)</li><li>不同字符集之间通过Unicode转换:utf-8->unicode->gbk</li></ul><h2 id="note">Note</h2><ul><li><p>Python 2</p><ul><li>默认编码 <strong>ASCII</strong>,自动处理byte到unicode的转换,处理非ASCII时易出错。</li><li>两种字符类型<ul><li><code>unicode</code>:unicode 数据</li><li><code>str</code>:bytes 数据</li></ul></li><li><code>unicode</code>: <code>u'苑'</code> / <code>u'\u2d1'</code><ul><li>encode('utf-8'): <code>\xe8\x8b\x91</code> (<code>str</code>,bytes)</li><li>encode('gbk') : <code>\xd4\xb7</code> (<code>str</code>,bytes)</li></ul></li></ul></li><li><p>Python 3</p><ul><li>默认编码 <strong>Unicode</strong>,不会再对bytes数据自动解码</li><li>两种字符类型<ul><li><code>str</code>:unicode 数据</li><li><code>bytes</code>:bytes 数据</li><li><strong>文本总是Unicode</strong>,由<code>str</code>类型表示,二进制数据由<code>bytes</code>类型表示</li></ul></li><li><code>str</code>:<code>'苑'</code> / <code>\u82d1</code><ul><li>encode('utf-8'):<code>b'\xe8\x8b\x91'</code> (<code>bytes</code>)</li><li>encode('gbk'):<code>'\xd4\xb7'</code> (<code>bytes</code>)</li></ul></li><li>Example <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>c=<span class="string">"編碼abc"</span><span class="comment"># 字符串str为unicode数据,可以encode为bytes</span></span><br><span class="line"><span class="meta">>>> </span>c</span><br><span class="line"><span class="string">'編碼abc'</span></span><br><span class="line"><span class="meta">>>> </span>c.encode(<span class="string">'utf-8'</span>)</span><br><span class="line"><span class="string">b'\xe7\xb7\xa8\xe7\xa2\xbcabc'</span></span><br><span class="line"><span class="meta">>>> </span>c.encode(<span class="string">'utf-16'</span>)</span><br><span class="line"><span class="string">b'\xff\xfe\xe8}\xbcxa\x00b\x00c\x00'</span></span><br><span class="line"><span class="meta">>>> </span>c.encode(<span class="string">'gbk'</span>)</span><br><span class="line"><span class="string">b'\xbe\x8e\xb4aabc'</span></span><br><span class="line"><span class="meta">>>> </span>c.encode(<span class="string">'unicode_escape'</span>)</span><br><span class="line"><span class="string">b'\\u7de8\\u78bcabc'</span></span><br><span class="line"></span><br><span class="line"><span class="meta">>>> </span>d=<span class="string">u'\u7de8\u78bcabc'</span><span class="comment"># d为16进制unicode字符</span></span><br><span class="line"><span class="meta">>>> </span>d<span class="comment"># 可以看出python3中对d与c的处理结果是相同的</span></span><br><span class="line"><span class="string">"編碼abc"</span></span><br><span class="line"><span class="meta">>>> </span>d.encode(<span class="string">'utf-8'</span>)</span><br><span class="line"><span class="string">b'\xe7\xb7\xa8\xe7\xa2\xbcabc'</span></span><br><span class="line"><span class="meta">>>> </span>d.encode(<span class="string">'unicode_escape'</span>)</span><br><span class="line"><span class="string">b'\\u7de8\\u78bcabc'</span></span><br><span class="line"><span class="meta">>>> </span>d.encode(<span class="string">'utf-8'</span>).decode(<span class="string">'latin-1'</span>)<span class="comment"># 解码时出现乱码,可能是编码用错</span></span><br><span class="line"><span class="string">'編碼abc'</span></span><br><span class="line"></span><br><span class="line"><span class="meta">>>> </span><span class="string">'ç\xa0\x81'</span>.encode(<span class="string">'latin-1'</span>).decode(<span class="string">'utf8'</span>) <span class="comment"># 将乱码重新编码再解码</span></span><br><span class="line"><span class="string">'码'</span></span><br></pre></td></tr></table></figure></li></ul></li></ul><h2 id="字符编码检测">字符编码检测</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">import</span> chardet</span><br><span class="line"><span class="keyword">with</span> <span class="built_in">open</span>(file, <span class="string">'rb'</span>) <span class="keyword">as</span> f:</span><br><span class="line"> a = f.read()</span><br><span class="line">chardet.detect(a)</span><br><span class="line"></span><br><span class="line"><span class="comment"># {'encoding': 'Big5', 'confidence': 0.99, 'language': 'Chinese'}</span></span><br></pre></td></tr></table></figure>]]></content>
<tags>
<tag> python </tag>
</tags>
</entry>
<entry>
<title>Hexo 同时使用两种主题</title>
<link href="Use-Two-Themes-in-Hexo/"/>
<url>Use-Two-Themes-in-Hexo/</url>
<content type="html"><![CDATA[<h2 id="context">Context</h2><p>在Hexo主页面的catagory下增加了wiki项,但是想利用一个不同的theme生成另一个网页,放置在子目录下。因此尝试能否在一个仓库内分别渲染生成两个不同的theme。(只是想测试下可行性,其实完全可以建两个hexo仓库分别渲染生成页面,再将wiki页面复制到了主页面的public/wiki/目录下)。</p><h3 id="想法">想法</h3><p>root目录下有两个config文件,默认使用_config.yml,用<code>hexo --config config_wiki.yml g</code>命令渲染wiki页面至<code>/public/wiki/</code>,同时设置点击主页面<code>wiki</code>的category后会自动链接到<code>public/wiki/</code>。</p><h2 id="修改步骤">修改步骤</h2><ol type="1"><li><p>更改原theme(主页面)的config文件,将新增category类别(不同主题设置有区别)。</p><figure class="highlight diff"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">nav:</span><br><span class="line">Wiki: /wiki/</span><br></pre></td></tr></table></figure></li><li><p>为了在修改wiki配置时避免影响到主页面,将root目录下的<code>_config.yml</code>文件复制并重命名为<code>_config_wiki.yml</code>,以下修改均指此文件。</p></li><li><p>安装hexo的<a href="https://github.com/zthxxx/hexo-theme-Wikitten">Wikitten</a>主题,将其作为wiki页面的theme</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">cd your-hexo-directory</span><br><span class="line">git clone https://github.com/zthxxx/hexo-theme-Wikitten.git themes/Wikitten</span><br></pre></td></tr></table></figure></li><li><p>将Wikitten theme目录下clone所得的source和模板移动到root目录下,重命名主题config使其生效。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">cp -rf themes/Wikitten/_source/* source_wiki/</span><br><span class="line">cp themes/Wikitten/_scaffolds/embed.md scaffolds/embed.md</span><br><span class="line">cp themes/Wikitten/_scaffolds/post.md scaffolds/wiki.md</span><br><span class="line">cp -f themes/Wikitten/_config.yml.example themes/Wikitten/_config.yml</span><br></pre></td></tr></table></figure></li><li><p>修改wikitten主题的config,更改资源文件目录,icon、favicon等。</p></li><li><p>修改<code>_config_wiki.yml</code>设定使wikitten主题生效</p><figure class="highlight diff"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br></pre></td><td class="code"><pre><span class="line"><span class="deletion">- theme: origin</span></span><br><span class="line"><span class="addition">+ theme: Wikitten</span></span><br><span class="line"></span><br><span class="line"><span class="deletion">- per_page: 10</span></span><br><span class="line"></span><br><span class="line"><span class="deletion">- url: https://konfido.github.io/</span></span><br><span class="line"><span class="deletion">- root: /</span></span><br><span class="line"><span class="deletion">- permalink: :year/:month/:day/:title/</span></span><br><span class="line"><span class="addition">+ url: https://konfido.github.io/wiki/</span></span><br><span class="line"><span class="addition">+ root: /wiki/</span></span><br><span class="line"><span class="addition">+ permalink: :urlname/</span></span><br><span class="line"><span class="addition">+ permalink_defaults: :year/:month/:day/:title/</span></span><br><span class="line"></span><br><span class="line"><span class="addition">+ skip_render:</span></span><br><span class="line"><span class="addition">+ - README.md</span></span><br><span class="line"></span><br><span class="line"><span class="addition">+ marked:</span></span><br><span class="line"><span class="addition">+ gfm: true</span></span><br><span class="line"></span><br><span class="line"><span class="addition">+ # 设置单独的source文件夹,用于放置wiki的源文件</span></span><br><span class="line"><span class="deletion">- source_dir: source</span></span><br><span class="line"><span class="addition">+ source_dir: source_wiki</span></span><br><span class="line"></span><br><span class="line"><span class="addition">+ # 设置渲染到`public/wiki/`路径下</span></span><br><span class="line"><span class="deletion">- public_dir: public</span></span><br><span class="line"><span class="addition">+ public_dir: public/wiki</span></span><br><span class="line"></span><br><span class="line"><span class="addition">+ default_layout: wiki</span></span><br></pre></td></tr></table></figure></li></ol><h2 id="使用">使用</h2><ul><li><p>视自己情况在<code>.zshrc</code>或者<code>.bashrc</code>等文件中新增alias,以简化命令的输入。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">alias hw="hexo --config _config_wiki.yml"</span><br></pre></td></tr></table></figure></li><li><p>新建wiki</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">hw new "xxxx"</span><br></pre></td></tr></table></figure></li><li><p>渲染步骤</p><p>共用db.json(猜測)会导致主页面blog内容与wiki内容混杂,需要在渲染前后进行clean操作。</p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">#</span><span class="bash"> 首先渲染主页面,生成到/public/</span></span><br><span class="line">hexo g</span><br><span class="line"><span class="meta">#</span><span class="bash"> 删除db.json及旧的public/wiki</span></span><br><span class="line">hw clean</span><br><span class="line"><span class="meta">#</span><span class="bash"> 渲染wiki页面,生成到/public/wiki/</span></span><br><span class="line">hw g</span><br><span class="line"><span class="meta">#</span><span class="bash"> 手动删除db.json,MUST!</span></span><br><span class="line">rm db.json</span><br><span class="line"><span class="meta">#</span><span class="bash"> 查看效果</span></span><br><span class="line">hexo s</span><br></pre></td></tr></table></figure></li></ul>]]></content>
<tags>
<tag> hexo </tag>
</tags>
</entry>
<entry>
<title>Shell Script</title>
<link href="shell-script/"/>
<url>shell-script/</url>
<content type="html"><![CDATA[<h2 id="symbols">Symbols</h2><ul><li><p><code>[</code> single bracket</p><ul><li><p><code>if [ condition ]</code> shell builtin, <code>test</code> construct, used to ==evaluate expression==</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> [ <span class="variable">$a</span> == abc ] ; <span class="keyword">then</span> <span class="built_in">echo</span> yes ; <span class="keyword">fi</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>Array[1]=element1</code> Array initializatin</p></li><li><p><code>[a-z]</code> Range of characters within a Regular Expression</p></li></ul></li><li><p><code>[[</code> double bracket</p><ul><li>bash builtin, 与<code>[</code>类似,extended <code>test</code> constrcut</li></ul></li><li><p><code>{}</code> braces</p><ul><li><p><code>${variable}</code> unambiguously ==identify variables==</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> <span class="variable">$VARIBLE1234</span></span><br><span class="line"><span class="built_in">echo</span> <span class="variable">${VARIBLE}</span>1234</span><br></pre></td></tr></table></figure></p></li><li><p><code>${!variable}</code> Indirect variable reference 间接引用</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">NAME=<span class="string">"VARIABLE"</span>; VARIABLE=42; <span class="built_in">echo</span> <span class="variable">${!NAME}</span></span><br><span class="line"><span class="comment"># 42</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>{ command1; command2 }</code> ==command group==</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">{ date; top -b -n1 | head ; } >logfile</span><br><span class="line"><span class="comment"># `date` and `top` output are concatenated</span></span><br></pre></td></tr></table></figure></p></li><li><p>brace expansions, create lists of strings,可以在loop时使用</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> f{oo,ee,a}d<span class="comment"># food feed fad</span></span><br></pre></td></tr></table></figure></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mv error.log{,.OLD}<span class="comment"># 等价于:mv error.log error.log.OLD</span></span><br></pre></td></tr></table></figure></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> {000..2}<span class="comment"># 000 001 002</span></span><br><span class="line"><span class="built_in">echo</span> {00..8..3}<span class="comment"># 00 03 06</span></span><br><span class="line"><span class="built_in">echo</span> {D..T..4} <span class="comment"># D H L P T</span></span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><code>()</code> parentheses</p><ul><li><p><code>( commands1; command2 )</code> ==command group== excuted within a ==subshell== (without affecting the current shell's environment)</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">(<span class="built_in">cd</span> /tmp; <span class="built_in">pwd</span>)</span><br></pre></td></tr></table></figure></p></li><li><p><code>result=$(COMMAND)</code> Command substitution</p></li><li><p><code>Array=(element1 element2)</code> Array initialization</p></li></ul></li><li><p><code>(())</code> double parentheses</p><ul><li><code>(( var = 12 ))</code> Integer arithmetic,==整数计算==</li><li><code>var=$(( 20 + 5 ))</code> 整数计算与赋值</li><li>C-style operation<ul><li><code>((var++))</code> / <code>((var--))</code> / <code>((var0 = var1<12?3:21))</code></li></ul></li></ul></li><li><p><code>#</code>:注释</p></li></ul><h2 id="变量">变量</h2><ul><li><p>格式要求</p><ul><li>定义变量时,变量名不加<code>$</code>符号,使用时加<code>$</code> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">test</span>=<span class="string">"hello"</span></span><br><span class="line"><span class="built_in">echo</span> <span class="variable">$test</span></span><br></pre></td></tr></table></figure></li><li>变量名和等号之间==不能有空格==</li><li>传递带空格的参数需要加引号:<code>"${VARIABLE}"</code></li><li>单引号内任何字符不视作变量</li><li>将命令结果赋值给变量:反引号 或<code>$()</code><ul><li><code>x=`commands`</code></li><li><code>x=$(commands)</code></li></ul></li></ul></li><li><p>变量的作用域</p><ul><li>Shell脚本中定义的变量是global的,作用域从被定义的地方开始</li><li>Shell==函数==中定义的变量默认是global的,作用域从被调用时定义处开始</li><li>Shell函数内显式定义local变量时,作用域在函数内;但是与global变量重名时,会在函数内暂时屏蔽global变量 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="title">foo</span></span>()</span><br><span class="line">{</span><br><span class="line"><span class="built_in">local</span> x=100</span><br><span class="line"><span class="built_in">echo</span> <span class="variable">$x</span></span><br><span class="line">}</span><br><span class="line">foo <span class="comment"># 100</span></span><br><span class="line"><span class="built_in">echo</span> <span class="variable">$x</span> <span class="comment"># show nothing</span></span><br></pre></td></tr></table></figure></li></ul></li><li><p>判断变量是否为空</p><ul><li><code>${A:+xxx}</code> 变量A已赋值时,其值用xxx (字符串/变量的值) 替换,否则不进行任何替换</li><li><code>${A:-xxx}</code> 变量A==未定义/值为空==时,返回xxx (字符串/变量的值);否则返回变量A的值</li><li><code>${A:=xxx}</code> 变量A未定义/值为空时,将xxx赋值给A,返回xxx;否则返回变量A的值</li></ul></li><li><p>特殊变量</p><table><thead><tr class="header"><th>$0</th><th>当前脚本的文件名</th></tr></thead><tbody><tr class="odd"><td>$n</td><td>传递给脚本或函数的参数。n 是一个数字,表示第几个参数。例如,第一个参数是<code>$1</code>,第二个参数是<code>$2</code>。</td></tr><tr class="even"><td>$#</td><td>传递给脚本或函数的<strong>参数个数</strong>。</td></tr><tr class="odd"><td>$*</td><td>传递给脚本或函数的<strong>所有参数</strong>。被双引号包含时,以<code>"$1","$2" … "$n"</code>的形式输出所有参数。</td></tr><tr class="even"><td>$@</td><td>传递给脚本或函数的所有参数。被双引号包含时,以<code>"$1 $2 … $n"</code>的形式输出所有参数整体。</td></tr><tr class="odd"><td>$?</td><td>上个命令的退出状态,或函数的返回值。</td></tr><tr class="even"><td>$$</td><td>当前Shell进程ID。对于 Shell 脚本,就是这些脚本所在的进程ID。</td></tr><tr class="odd"><td>$!</td><td>后台运行的最后一个进程的ID号</td></tr></tbody></table><ul><li><p><code>[@]</code> 引用数组中的值</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">declare</span> -a testarray=(<span class="string">'box'</span> <span class="string">'cat'</span> <span class="string">'dog'</span>)</span><br><span class="line"><span class="keyword">for</span> item <span class="keyword">in</span> <span class="variable">${testarray[@]}</span>; <span class="keyword">do</span></span><br><span class="line"><span class="built_in">echo</span> <span class="variable">$item</span></span><br><span class="line"><span class="keyword">done</span></span><br><span class="line"><span class="comment">#box</span></span><br><span class="line"><span class="comment">#cat</span></span><br><span class="line"><span class="comment">#dog</span></span><br></pre></td></tr></table></figure></p></li></ul></li></ul><h2 id="数学计算">数学计算</h2><ul><li><p><code>let</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">let</span> `sum=3+5`<span class="comment"># sum <- 8</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>expr</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">sum=`expr 3 - 6` <span class="comment"># 运算符两边空格</span></span><br><span class="line">sum=`expr \( 3 \* \) `<span class="comment"># 转译*)(</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>$(())</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">sum=$((<span class="number">3</span>+<span class="number">5</span>))</span><br></pre></td></tr></table></figure></p></li></ul><h2 id="expression-表达式种类">expression 表达式种类</h2><ul><li><p>文件表达式</p><ul><li><code>-d</code> 如果目录存在:<code>if [ -d .. ]</code></li><li><code>-f</code> 如果文件存在:<code>if [ -f file ]</code></li><li><code>-s</code> 文件存在且非空</li><li><code>-r</code> 文件存在且可读</li><li><code>-w</code> 文件存在且可写</li><li><code>-x</code> 文件存在且可执行</li></ul></li><li><p>==整数变量==表达式</p><ul><li><code>-eq</code> 如果等于:<code>if [ int1 -eq int2 ]</code></li><li><code>-ne</code> 如果不等于</li><li><code>-ge</code> 如果>=</li><li><code>-gt</code> 如果></li><li><code>-le</code> 如果<=</li><li><code>-lt</code> 如果<</li></ul></li><li><p>字符串变量表达式</p><ul><li><code>=</code> 等于<ul><li><code>if [ $a = $b ]</code>,如果string1等于string2</li><li>允许使用赋值号做等号,等号两边==必须留空格==</li></ul></li><li><code>!=</code> 不等于<ul><li><code>if [ $a != $b ]</code> 如果string1不等于string2</li></ul></li><li><code>-z</code> 为空<ul><li><code>if [ -z $string ]</code> 如果string 为空</li></ul></li><li><code>-n</code> 非空<ul><li><code>if [ -n $string ]</code> 等价于 <code>if [ $string]</code> 如果string 非空(非0),返回0(true)</li></ul></li></ul></li><li><p>逻辑表达式</p><ul><li><code>!</code> 逻辑非: <code>if [ ! expr ]</code></li><li><code>-a</code> 逻辑与:<code>if [ expr1 -a expr2 ]</code></li><li><code>-o</code> 逻辑或:<code>if [ expr1 -o expr2 ]</code></li></ul></li></ul><h2 id="字符串处理">字符串处理</h2><ul><li>字符串长度/字符数量:<code>${#A}</code></li></ul><h3 id="匹配">匹配</h3><ul><li><p>简单方法,若 <code>$string</code>中含有<code>$foo</code> 则匹配成功:</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[[ <span class="variable">$string</span> == *<span class="variable">$foo</span>* ]] && <span class="built_in">echo</span> match</span><br></pre></td></tr></table></figure></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">[[ <span class="variable">$string</span> =~ <span class="variable">$foo</span> ]] && <span class="built_in">echo</span> match</span><br></pre></td></tr></table></figure></p></li><li><p>非贪婪匹配,匹配符合通配符的==最短结果==</p><ul><li><code>${A%B}</code> 从右向左匹配B,删除匹配,保留左边字符</li><li><code>${A#B}</code> 从左向右匹配B,删除匹配,保留右边字符</li><li><code>${A/old/new}</code> 用<code>new</code>替换<code>A</code>中匹配的<code>old</code>字符串</li></ul></li><li><p>贪婪匹配,匹配==最长结果==</p><ul><li><code>${A%%B}</code></li><li><code>${A##B}</code></li><li><code>${A//old/new}</code></li></ul></li></ul><h3 id="提取">提取</h3><ul><li><p><code>grep</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># -o,只输出匹配的字符串; -E,使用扩展正则</span></span><br><span class="line"><span class="built_in">echo</span> aaa888 | grep -Eo <span class="string">'\d+'</span></span><br><span class="line"><span class="comment"># 888</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>sed</code></p><ul><li><p>sed没有<code>\d</code> <code>\s</code>等用于匹配的元字符</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> aaa888s | sed -E <span class="string">'s/.*[a-z]([0-9]+).*/\q/'</span><span class="comment"># -E,使用正则</span></span><br><span class="line"><span class="comment"># 888</span></span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><code>awk</code> 的 <code>match()</code>函数</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> aaa888 | awk <span class="string">'match($0,/([0-9]+)/, matches) {print matches[1]}'</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>${A:offset}</code></p><ul><li>从字符串左边,偏移offset个字符,截取到字符串结束</li><li><code>${A:offset:length}</code> 截取length长度</li></ul></li><li><p><code>${A:0-offset}</code></p><ul><li>从字符串右边,偏移offset个字符,截取到字符串结束</li><li><code>${A:0-offset:length}</code></li></ul></li><li><p>String to Array</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">DATES_STRING=<span class="string">'Jun01 Jun02 Jun03 Jun04 Jun05'</span></span><br><span class="line">IFS=<span class="string">' '</span> <span class="built_in">read</span> -a DATES_ARRAY <<< <span class="string">"<span class="variable">$DATES_STRING</span>"</span></span><br></pre></td></tr></table></figure></p></li></ul><h3 id="替换">替换</h3><ul><li><p>普通字符</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> <span class="string">"haha s"</span> | sed <span class="string">'s/ha/Ha/g'</span></span><br></pre></td></tr></table></figure></p></li><li><p>替换<code>\n</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">echo</span> <span class="variable">$s</span> | tr <span class="string">'\n'</span> <span class="string">'\t'</span></span><br></pre></td></tr></table></figure></p></li></ul><h2 id="重定向-输入输出">重定向 输入/输出</h2><table><thead><tr class="header"><th style="text-align: center;">Commands</th><th style="text-align: center;">Description</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">command > file</td><td style="text-align: center;">将输出重定向到 file</td></tr><tr class="even"><td style="text-align: center;">command < file</td><td style="text-align: center;">将输入重定向到 file</td></tr><tr class="odd"><td style="text-align: center;">command >> file</td><td style="text-align: center;">将输出以追加的方式重定向到 file</td></tr><tr class="even"><td style="text-align: center;">n > file</td><td style="text-align: center;">将文件描述符为 n 的文件重定向到 file。</td></tr><tr class="odd"><td style="text-align: center;">n >> file</td><td style="text-align: center;">将文件描述符为 n 的文件以追加的方式重定向到 file。</td></tr><tr class="even"><td style="text-align: center;">n >& m</td><td style="text-align: center;">将输出文件 m 和 n 合并。</td></tr><tr class="odd"><td style="text-align: center;">n <& m</td><td style="text-align: center;">将输入文件 m 和 n 合并。</td></tr><tr class="even"><td style="text-align: center;"><< tag</td><td style="text-align: center;">将开始标记 tag 和结束标记 tag 之间的内容作为输入。</td></tr></tbody></table><ul><li>默认打开的文件描述符:0(标准输入),1(标准输出),2(标准错误)</li><li><code>1></code>:可简写为<code>></code>,标准输出重定向</li><li><code>2>&1</code>:标准错误重定向到标准输出。例:<code>echo hello 1>file.out 2>&1</code>,输出与错误都定向到file.out</li><li><code>1>&2</code>:标准输出重定向到标准错误</li><li><code>&>file</code>:标准输出与标准错误都从定向到file</li></ul><h2 id="流程控制">流程控制</h2><ul><li><p><code>case</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">case</span> <span class="variable">$a</span> <span class="keyword">in</span></span><br><span class="line"> 10) <span class="built_in">echo</span> <span class="string">"1"</span></span><br><span class="line"> ;;</span><br><span class="line"> reload) <span class="built_in">echo</span> <span class="string">"12"</span></span><br><span class="line"> ;;</span><br><span class="line"> *) <span class="built_in">echo</span> <span class="string">"0"</span></span><br><span class="line"> ;;</span><br><span class="line"><span class="keyword">esac</span></span><br></pre></td></tr></table></figure></p></li><li><p><code>if/else</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> [ <span class="variable">$a</span> -eq <span class="variable">$b</span> ]; <span class="keyword">then</span></span><br><span class="line"> <span class="built_in">echo</span> <span class="string">"="</span></span><br><span class="line"><span class="keyword">elif</span> [ <span class="variable">$a</span> -gt <span class="variable">$b</span> ]; <span class="keyword">then</span></span><br><span class="line"> <span class="built_in">echo</span> <span class="string">">"</span></span><br><span class="line"><span class="keyword">else</span></span><br><span class="line"> <span class="built_in">echo</span> <span class="string">"<"</span></span><br><span class="line"><span class="keyword">fi</span></span><br></pre></td></tr></table></figure></p><ul><li><p><code>if</code>语句简写</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># 如果左边表达式为真,则执行右边的语句</span></span><br><span class="line">[ -f <span class="string">"./foo"</span> ] && <span class="built_in">echo</span> <span class="string">"yes"</span></span><br><span class="line"><span class="comment"># 如果左边表达式为假,则执行右边</span></span><br><span class="line">[ -f <span class="string">"./foo"</span>] || <span class="built_in">exit</span> 0 <span class="comment"># 文件不存在就退出</span></span><br></pre></td></tr></table></figure></p></li><li><p>可用来handle exception</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># run fallback_command if a_command fails (returns a non-zero value)</span></span><br><span class="line">a_command || fallback_command</span><br><span class="line"><span class="comment"># execute second_command if a_command is successful (returns 0)</span></span><br><span class="line">a_command && second_command</span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><code>for/do</code></p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> var <span class="keyword">in</span> 1 2 3 4 5</span><br><span class="line"><span class="keyword">do</span> </span><br><span class="line"> <span class="built_in">echo</span> <span class="string">"<span class="variable">$var</span>"</span></span><br><span class="line"><span class="keyword">done</span></span><br></pre></td></tr></table></figure></li><li><p><code>while/do</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">while</span> [ <span class="variable">$int</span> -lt 8 ]</span><br><span class="line"><span class="keyword">do</span> </span><br><span class="line"> <span class="keyword">case</span> <span class="variable">$int</span> <span class="keyword">in</span></span><br><span class="line"> 1|4) <span class="built_in">echo</span> <span class="string">"<span class="variable">$int</span>"</span></span><br><span class="line"> ;;</span><br><span class="line"> 0|2|5|) <span class="built_in">echo</span> <span class="string">"<span class="variable">$int</span>"</span></span><br><span class="line"> <span class="built_in">continue</span></span><br><span class="line"> ;;</span><br><span class="line"> *) <span class="built_in">echo</span> <span class="string">"others"</span></span><br><span class="line"> <span class="built_in">break</span></span><br><span class="line"> ;;</span><br><span class="line"> <span class="keyword">esac</span></span><br><span class="line"><span class="keyword">done</span></span><br></pre></td></tr></table></figure></p></li></ul><h2 id="function">function</h2><ul><li><p>函数名前的"function"可以省略</p></li><li><p>使用 <code>$?</code> 获取函数返回值,且调用函数后需立即获取,否则返回值丢失</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">function</span> <span class="function"><span class="title">foo</span></span>(){</span><br><span class="line"> COMMANDS</span><br><span class="line"> <span class="built_in">return</span> xxx</span><br><span class="line">}</span><br></pre></td></tr></table></figure></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line"><span class="function"><span class="title">foo</span></span>(){</span><br><span class="line"> COMMANDS</span><br><span class="line"> ...</span><br><span class="line"> <span class="built_in">echo</span> <span class="variable">$1</span><span class="comment"># 引用参数</span></span><br><span class="line"> <span class="built_in">return</span> xxx<span class="comment"># 返回值</span></span><br><span class="line">}</span><br><span class="line">foo 13<span class="comment"># 传递参数</span></span><br><span class="line">ehco $?<span class="comment"># 获取函数返回值</span></span><br></pre></td></tr></table></figure></p></li></ul><h2 id="子进程子shell">子进程/子Shell</h2><ul><li><p>子Shelll</p><ul><li><p>只使用 fork() 函数,父进程中的函数、变量(全局变量、局部变量)、文件描述符、别名等在子Shell中仍然有效;子Shell对参数的修改无法传回父Shell。</p></li><li><p>组命令,管道,命令替换会产生子Shell</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">(<span class="built_in">echo</span> <span class="string">"<span class="variable">$SHLVL</span> <span class="variable">$BASH_SUBSHELL</span>"</span>)<span class="comment"># 在子Shell执行的组命令</span></span><br><span class="line">{ command1; command2; command3; }<span class="comment"># 在当前Shell执行的组命令</span></span><br><span class="line"></span><br><span class="line"><span class="built_in">echo</span> <span class="string">"test"</span> | { <span class="built_in">echo</span> <span class="string">"<span class="variable">$SHLVL</span> <span class="variable">$BASH_SUBSHELL</span>"</span>; }<span class="comment"># 管道</span></span><br><span class="line"></span><br><span class="line">var=$(<span class="built_in">echo</span> <span class="string">"<span class="variable">$SHLVL</span> <span class="variable">$BASH_SUBSHELL</span>"</span>)<span class="comment"># 命令替换</span></span><br><span class="line"><span class="built_in">echo</span> <span class="variable">$var</span></span><br></pre></td></tr></table></figure></p></li></ul></li><li><p>子进程</p><ul><li><p>使用 fork() 和 exec() 函数,即通过 fork() 创建子进程后立即调用 exec() 函数加载新的可执行文件,不使用父进程的参数。</p></li><li><p>下列几种方式会创建子进程,运行结束后立即退出子进程</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="comment"># a_script.sh</span></span><br><span class="line">bash ./test.sh</span><br><span class="line">./test.sh</span><br><span class="line">chmod +x ./test.sh</span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><code>&</code> 启动并行进程,加速命令执行</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">PIDARRAY=()</span><br><span class="line"><span class="keyword">for</span> file <span class="keyword">in</span> f1 f2</span><br><span class="line"><span class="keyword">do</span> </span><br><span class="line"> <span class="built_in">echo</span> <span class="variable">$file</span> &<span class="comment"># 命令符&, 将命令置于后台并继续执行脚本</span></span><br><span class="line"> PIDARRAY+=(<span class="string">"$!"</span>)<span class="comment"># $! 保存着最近一个后台进程的PID</span></span><br><span class="line"><span class="keyword">done</span></span><br><span class="line"><span class="built_in">wait</span> <span class="variable">${PIDARRAY[@]}</span><span class="comment"># wait 命令等待这些进程结束</span></span><br></pre></td></tr></table></figure></p></li></ul><h2 id="commands">Commands</h2><ul><li><p><code>printf</code></p><ul><li><p>格式化输出</p><p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash"> <span class="built_in">printf</span> <span class="string">"ID: %-5s NAME: %-10s Avage: %4.2f\n"</span> 1 tom 74.233;</span></span><br><span class="line">ID: 1 NAME: tom Avage: 74.23</span><br></pre></td></tr></table></figure></p></li><li><p>输出重复字符串</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">printf</span> <span class="string">'hello %.0s'</span> {1..6}</span><br></pre></td></tr></table></figure></p></li><li><p><code>--</code> means whatever follows should <strong>not</strong> be interpreted as a command line <em>option</em> to <code>printf</code></p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="built_in">printf</span> -- <span class="string">'--- hello ---'</span></span><br></pre></td></tr></table></figure></p></li></ul></li><li><p><code>xargs</code>: 传递参数的过滤器</p><ul><li><p>列出当前文件夹下每个pdf文件的大小</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">ls *.pdf | xargs -n1 -I{} du -h {} </span><br><span class="line"><span class="comment"># -n1:每行输出一个</span></span><br><span class="line"><span class="comment"># -I{}:substitute occurrences of {} in command with the parsed argument</span></span><br></pre></td></tr></table></figure></p></li></ul></li></ul><h2 id="others">Others</h2><ul><li>Shell <a href="https://segmentfault.com/a/1190000008080537">语法</a>, #TODO</li></ul><h2 id="后台运行"><a href="./shell脚本后台运行">后台运行</a></h2>]]></content>
<tags>
<tag> Handbook </tag>
<tag> dev </tag>
</tags>
</entry>
<entry>
<title>AppleScript</title>
<link href="AppleScript/"/>
<url>AppleScript/</url>
<content type="html"><![CDATA[<h2 id="basic">Basic</h2><ul><li><p>运行:<code>osascript foo.scpt</code></p></li><li><p>注释:</p><ul><li>行注释,可用在行尾:<code>--</code> or <code>#</code></li><li>块注释:<code>(* *)</code></li></ul></li><li><p>数据类型:number,string,list,record (字典)</p><ul><li><p>设置list</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">set L to {<span class="number">1</span>,<span class="number">2</span>,<span class="number">3</span>,<span class="string">"ss"</span>}</span><br><span class="line">set item <span class="number">3</span> of L to <span class="string">"xx"</span></span><br></pre></td></tr></table></figure></p></li><li><p>转换:<code>set n2s to "123" as number</code></p></li></ul></li><li><p>Keywords</p><ul><li><code>me</code> refers to the current script<ul><li><code>my</code> is synonym for <code>of me</code></li></ul></li><li><code>it</code> refers to the current target (A <code>tell</code> statement specifies a default target)<ul><li><code>its</code> is synonym for <code>of it</code></li></ul></li></ul></li></ul><h3 id="流程控制">流程控制</h3><ul><li><p>条件语句</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">if</span> a=<span class="number">0</span> then</span><br><span class="line"> xxx</span><br><span class="line"><span class="keyword">else</span> <span class="keyword">if</span> a=<span class="number">1</span> then</span><br><span class="line"> xxx</span><br><span class="line"><span class="keyword">else</span></span><br><span class="line"> xxx</span><br><span class="line">end <span class="keyword">if</span></span><br></pre></td></tr></table></figure></p></li><li><p>循环语句</p><ul><li><p><code>times</code></p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">repeat <span class="number">10</span> times</span><br><span class="line"> xxx</span><br><span class="line">end repeat</span><br></pre></td></tr></table></figure></li><li><p><code>from .. to ..</code></p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">repeat with n from <span class="number">0</span> to <span class="number">10</span> by <span class="number">1</span></span><br><span class="line"> xxx</span><br><span class="line">end repeat</span><br></pre></td></tr></table></figure></li><li><p><code>until</code></p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">repeat until n > <span class="number">10</span></span><br><span class="line"> xxx</span><br><span class="line"> set n to n +<span class="number">3</span></span><br><span class="line">end repeat</span><br></pre></td></tr></table></figure></li></ul></li><li><p><code>try</code></p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">try</span><br><span class="line"> -- Do something</span><br><span class="line">on error</span><br><span class="line"> -- other thing</span><br><span class="line">end try</span><br></pre></td></tr></table></figure></p></li></ul><h3 id="functionhandler">Function/Handler</h3><ul><li><p><code>on</code> or <code>to</code></p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line">on foo()</span><br><span class="line"> display dialog <span class="string">"Error!"</span></span><br><span class="line"> <span class="keyword">return</span> <span class="string">"xx"</span></span><br><span class="line">end foo</span><br><span class="line"> </span><br><span class="line">foo()</span><br><span class="line">set R to foo()</span><br></pre></td></tr></table></figure></p></li><li><p>带参数</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">to foo(theErrorMessage, theButtons)</span><br><span class="line"> display dialog the theErrorMessage buttons theButtons</span><br><span class="line">end displayError</span><br><span class="line"> </span><br><span class="line">foo(<span class="string">"xxx"</span>,{<span class="string">"ok"</span>,<span class="string">"cancel"</span>})</span><br></pre></td></tr></table></figure></p></li><li><p>交叉参数</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">on foo:E withButtons:B</span><br><span class="line"> display dialog E buttons B</span><br><span class="line">end foo:withButtons:</span><br></pre></td></tr></table></figure></p></li></ul><h3 id="交互">交互</h3><ul><li><p>发声</p><ul><li><code>say "hello world" using "Victoria"</code></li><li><code>beep 2</code></li></ul></li><li><p>dialog 弹窗:</p><ul><li><p><code>display dialog "xxx"</code></p></li><li><p>获取输入</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">display dialog <span class="string">"xxx"</span> <span class="keyword">default</span> answer <span class="string">""</span></span><br><span class="line">set a to text returned of result</span><br></pre></td></tr></table></figure></p></li><li><p>获取点击项</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">set message to <span class="string">"Hello world!"</span></span><br><span class="line">set DWindows to display dialog message buttons {<span class="string">"OK"</span>, <span class="string">"Cancel"</span>}</span><br><span class="line">set selectedResult to button returned of DWindows</span><br><span class="line">Say selectedResult</span><br></pre></td></tr></table></figure></p></li></ul></li><li><p>notification</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">display notification <span class="string">"..."</span> with title <span class="string">"xxx"</span> subtitle <span class="string">"hhh"</span></span><br></pre></td></tr></table></figure></p></li><li><p>列表选择</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">choose from list {<span class="string">"aa"</span>,<span class="string">"bb"</span>,<span class="string">"cc"</span>} with title <span class="string">"Names"</span> ¬</span><br><span class="line"> with prompt <span class="string">"请选择:"</span> <span class="keyword">default</span> items {<span class="string">"bb"</span>} ¬</span><br><span class="line"> with empty selection allowed and multiple selections allowed</span><br></pre></td></tr></table></figure></p></li><li><p>文件/夹选择</p><ul><li><code>choose file of type("txt") with prompt "请选择"</code></li><li><code>choose folder</code></li></ul></li></ul><h3 id="模拟行为">模拟行为</h3><ul><li><p>唤起应用至frontmost:<code>tell application "Chrome" to activate</code></p></li><li><p>延时/s:<code>delay 2</code></p></li><li><p>模型点击:<code>click</code></p><ul><li><p>位置</p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">tell application "System Events"</span><br><span class="line"> click at {123,456}</span><br><span class="line">end tell</span><br></pre></td></tr></table></figure></p></li><li><p>元素</p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br></pre></td><td class="code"><pre><span class="line">tell application "System Events"</span><br><span class="line"> tell menu bar 1 of process "Chrome"</span><br><span class="line"> click menu item "新标签页" of menu "文件" of menu bar item "文件"</span><br><span class="line"> end tell</span><br><span class="line">end tell</span><br></pre></td></tr></table></figure></p></li></ul></li><li><p>键盘输入</p><ul><li><p><code>key code 48 using command down</code>:<code>⌘Tab</code></p><ul><li>List of AppleScript <a href="https://eastmanreference.com/complete-list-of-applescript-key-codes">key codes</a></li></ul></li><li><p><code>keystroke "hello"</code></p></li></ul></li><li><p>剪贴板:<code>clipboard</code></p><ul><li><code>set c to get the clipboard</code></li><li><code>set the clipboard to "xxx"</code></li></ul></li></ul><h2 id="advanced">Advanced</h2><h3 id="applescript中执行shell指令">AppleScript中执行Shell指令</h3><ul><li><p>Basic</p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">do shell script "date +'%T'"</span><br></pre></td></tr></table></figure></p></li><li><p>==传递变量==</p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">set foo to "test"</span><br><span class="line">do shell script "echo " & foo</span><br></pre></td></tr></table></figure></p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">do shell script "echo " & quoted form of foo</span><br></pre></td></tr></table></figure></p></li></ul><h3 id="shell-中执行-applescript">Shell 中执行 AppleScript</h3><ol type="1"><li><p>两种方法</p><ol type="1"><li><p><code>osascript -e "echo \"Hello\"</code></p></li><li><p>命令语句较多时</p><p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">osascript <<EOF</span><br><span class="line"> say "Hello"</span><br><span class="line"> say "World"</span><br><span class="line">EOF</span><br></pre></td></tr></table></figure></p></li></ol></li><li><p>传递shell中的变量</p><ol type="1"><li><p><code>osascript -e "echo \"$foo\" "</code></p></li><li><p>命令语句较多时</p><p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">osascript <<EOF</span><br><span class="line">set a to "Hello"</span><br><span class="line">set b to "$foo"# 传递shell中的变量</span><br><span class="line">EOF</span><br></pre></td></tr></table></figure></p></li></ol></li><li><p>Shell 获取 Applescript 结果</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">finder=$(osascript -e <span class="string">"set dir to POSIX path of (choose folder)"</span>)</span><br></pre></td></tr></table></figure></p></li></ol><h2 id="others">Others</h2><h3 id="tips">Tips</h3><ul><li><p><code>AppleScript's Dictionary</code>: Press <code>⌘⇧O</code> to open AppleScript dictionary of this app.</p></li><li><p><code>Watch Me Do</code>("我做给您看")</p><ul><li>点击AutoMator的录制按钮(红色小圆点)</li><li>实际演示一遍想要完成的操作后,结束录制</li></ul></li><li><p>将自动生成的动作序列拖到下方空白处,可以看到转成的AppleScript细节(uiScript)</p></li><li><p><code>Accessibility Inspector</code></p><ul><li><p>Check the UI element tree, properties, attribute ...</p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">return</span> attribute of textArea</span><br><span class="line"><span class="keyword">return</span> the value of attribute <span class="string">"AXValue"</span> of xxx</span><br></pre></td></tr></table></figure><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">get properties of front window</span><br><span class="line"># {.., name:<span class="string">"foo"</span>, description: <span class="string">"xxx"</span>, ...}</span><br><span class="line">get name of front window</span><br><span class="line"># <span class="string">"foo"</span></span><br></pre></td></tr></table></figure></li></ul></li><li><p>语句过长时,可以使用符号 <code>¬</code> 断开</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">tell process <span class="string">"xxxxx"</span> to set A to a reference ¬</span><br><span class="line"> to (first window whose name of attributes contains <span class="string">".txt"</span>)</span><br></pre></td></tr></table></figure></p></li><li><p>使用语句<code>log variable</code>进行设置断点进行debug</p></li></ul><h3 id="应用">应用</h3><ul><li><p>toggle dock's preferences of "autohide"</p><p><figure class="highlight objc"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br></pre></td><td class="code"><pre><span class="line">tell application <span class="string">"System Events"</span></span><br><span class="line"> tell dock preferences</span><br><span class="line"> <span class="keyword">if</span> autohide is <span class="literal">true</span> then</span><br><span class="line"> set properties to {autohide:<span class="literal">false</span>}</span><br><span class="line"> <span class="keyword">else</span></span><br><span class="line"> set properties to {autohide:<span class="literal">true</span>}</span><br><span class="line"> end <span class="keyword">if</span></span><br><span class="line"> end tell</span><br><span class="line">end tell</span><br></pre></td></tr></table></figure></p></li></ul><h3 id="exaples">Exaples</h3><ul><li><a href="https://github.com/unforswearing/applescript">https://github.com/unforswearing/applescript</a></li></ul><h3 id="documentation">Documentation</h3><ul><li><p>AppleScript Language <a href="https://developer.apple.com/library/archive/documentation/AppleScript/Conceptual/AppleScriptLangGuide/introduction/ASLR_intro.html#/">Guide</a></p></li><li><p>Mac Automation Scripting <a href="https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/index.html#//apple_ref/doc/uid/TP40016239-CH56-SW1">Guide</a></p></li></ul>]]></content>
<tags>
<tag> Handbook </tag>
</tags>
</entry>
<entry>
<title>非root用户安装MongoDB</title>
<link href="Install-MongoDB/"/>
<url>Install-MongoDB/</url>
<content type="html"><![CDATA[<p>在实验室的服务器上安装MongoDB,然而没有sudo权限,只能安装已编译的程序了。 步骤如下:</p><ol type="1"><li><p>去<a href="https://www.mongodb.com/download-center#community">官网</a>查看下载并解压最新版4.0.0的MongoDB包 <span id="more"></span> <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ wget https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu1404-4.0.0.tgz</span><br><span class="line">$ tar -zxvf mongodb-linux-x86_64-ubuntu1404-4.0.0.tgz</span><br></pre></td></tr></table></figure></p></li><li><p>官网说需要安装OpenSSL >The binary of this version has been compiled with SSL enabled and dynamically linked. This requires that SSL libraries be installed separately. See here for more information on installing OpenSSL. 安装方法参考:<a href="https://skynineblog.wordpress.com/2016/04/08/linux%E7%B3%BB%E7%BB%9F%E5%AE%89%E8%A3%85-openssl%E4%B8%A4%E7%A7%8D%E6%96%B9%E6%B3%95/">这个</a></p></li><li><p>将路径添加到PATH <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">$ <span class="built_in">export</span> PATH=<span class="string">"/path/to/dir/:<span class="variable">$PATH</span>"</span></span><br><span class="line">$ <span class="built_in">source</span> ~/.bashrc</span><br></pre></td></tr></table></figure></p></li><li><p>在合适位置新建<code>mongod.conf</code>配置文件,主要是设置db和log文件夹,以及<code>fork</code>(后台运行),参考配置: <figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br><span class="line">22</span><br><span class="line">23</span><br><span class="line">24</span><br><span class="line">25</span><br><span class="line">26</span><br><span class="line">27</span><br><span class="line">28</span><br><span class="line">29</span><br><span class="line">30</span><br></pre></td><td class="code"><pre><span class="line"># mongod.conf</span><br><span class="line"></span><br><span class="line"># for documentation of all options, see:</span><br><span class="line"># http://docs.mongodb.org/manual/reference/configuration-options/</span><br><span class="line"></span><br><span class="line"># Where and how to store data.</span><br><span class="line">storage:</span><br><span class="line"> dbPath: /home/duhaihua/mongo/db</span><br><span class="line"> journal:</span><br><span class="line"> enabled: true</span><br><span class="line"></span><br><span class="line"># where to write logging data.</span><br><span class="line">systemLog:</span><br><span class="line"> destination: file</span><br><span class="line"> logAppend: true</span><br><span class="line"> path: /home/duhaihua/mongo/mongod.log</span><br><span class="line"></span><br><span class="line"># network interfaces</span><br><span class="line">net:</span><br><span class="line"> port: 27017</span><br><span class="line"> bindIp: 0.0.0.0</span><br><span class="line"></span><br><span class="line"># how the process runs</span><br><span class="line">processManagement:</span><br><span class="line"> timeZoneInfo: /usr/share/zoneinfo</span><br><span class="line"> fork: true</span><br><span class="line"></span><br><span class="line"># security.</span><br><span class="line">security:</span><br><span class="line"> authorization: enabled</span><br></pre></td></tr></table></figure></p></li><li><p>启动mongod服务 <figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mongod --config ~/mongo/mongod.conf</span><br></pre></td></tr></table></figure> 可以用以下命令查看mongod是否已启动: <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">$</span><span class="bash"> ps -ef|grep mogod</span></span><br></pre></td></tr></table></figure></p></li><li><p>打开mongo shell,使用mongoDB <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">mongo</span><br></pre></td></tr></table></figure></p></li></ol>]]></content>
<tags>
<tag> dev </tag>
<tag> intro </tag>
</tags>
</entry>
<entry>
<title>Markdown 表格生成</title>
<link href="Markdown-Tables/"/>
<url>Markdown-Tables/</url>
<content type="html"><![CDATA[<p>Markdown 对表格的支持不太友好,这里列了几个处理的技巧。</p><h2 id="插入表格">插入表格</h2><p>在Markdown文本中插入表格主要有以下三种方法:</p><ol type="1"><li><p>标准Markdown table语法</p><p><figure class="highlight markdown"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line">| A | B | C |</span><br><span class="line">| :--: | ---: | :--- |</span><br><span class="line">| 1 | 2 | 3 |</span><br><span class="line">| 4 | 5 | 6 |</span><br></pre></td></tr></table></figure></p><p>效果:</p><table><thead><tr class="header"><th style="text-align: center;">A</th><th style="text-align: right;">B</th><th style="text-align: left;">C</th></tr></thead><tbody><tr class="odd"><td style="text-align: center;">1</td><td style="text-align: right;">2</td><td style="text-align: left;">3</td></tr><tr class="even"><td style="text-align: center;">4</td><td style="text-align: right;">5</td><td style="text-align: left;">6</td></tr></tbody></table></li><li><p>嵌入html代码</p><p><figure class="highlight html"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="tag"><<span class="name">table</span> <span class="attr">style</span>=<span class="string">"text-align:center"</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">tr</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">td</span>></span>A<span class="tag"></<span class="name">td</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">td</span>></span>B<span class="tag"></<span class="name">td</span>></span></span><br><span class="line"> <span class="tag"></<span class="name">tr</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">tr</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">td</span>></span>C<span class="tag"></<span class="name">td</span>></span></span><br><span class="line"> <span class="tag"><<span class="name">td</span>></span>D<span class="tag"></<span class="name">td</span>></span></span><br><span class="line"> <span class="tag"></<span class="name">tr</span>></span></span><br><span class="line"><span class="tag"></<span class="name">table</span>></span></span><br></pre></td></tr></table></figure></p><p>效果:</p><table style="text-align:center"><tr><td><p>A</p></td><td><p>B</p></td></tr><tr><td><p>C</p></td><td><p>D</p></td></tr></table></li><li><p>Latex 语法</p><ul><li><p>Typora等编辑器可借由MathJax插入数学公式,其中<code>\array</code>可用来插入表格</p></li><li><p>在Hexo等博客框架中也可以利用MathJax插入Latex,不过渲染工具要换为Pandoc</p><p><figure class="highlight bash"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">brew install pandoc</span><br><span class="line">npm uninstall hexo-renderer-marked --save</span><br><span class="line">npm install hexo-renderer-pandoc --save</span><br></pre></td></tr></table></figure></p></li></ul><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line">$$</span><br><span class="line">\begin{array}{ccc} % c:center, r:right, l:left</span><br><span class="line">\hline</span><br><span class="line">姓名& 学号& 性别\\</span><br><span class="line">\hline</span><br><span class="line">Steve Jobs& 001& Male\\</span><br><span class="line">Bill Gates& 002& Female\\</span><br><span class="line">\hline</span><br><span class="line">\end{array}</span><br><span class="line">$$</span><br></pre></td></tr></table></figure></p><p>效果: <span class="math display">\[ \begin{array}{ccc} % c:center, r:right, l:left \hline 姓名& 学号& 性别\\ \hline Steve Jobs& 001& Male\\ Bill Gates& 002& Female\\ \hline \end{array} \]</span></p></li></ol><h2 id="跨行跨列表格-columnspanrowspan">跨行/跨列表格 columnspan/rowspan</h2><ol type="1"><li><p>html语法</p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br><span class="line">20</span><br><span class="line">21</span><br></pre></td><td class="code"><pre><span class="line"><table style="text-align:center"></span><br><span class="line"> <tr></span><br><span class="line"> <td colspan='2' rowspan='2'></td></span><br><span class="line"> <td colspan='2' >真实类别<br>Truth</td></span><br><span class="line"> </tr></span><br><span class="line"> <tr></span><br><span class="line"> <td>1 (Event)</td></span><br><span class="line"> <td>0 (Non-Event)</td></span><br><span class="line"> </tr></span><br><span class="line"> <tr></span><br><span class="line"> <td rowspan='2'>预测类别<br>Prediction</td></span><br><span class="line"> <td>1<br>Positive</td></span><br><span class="line"> <td>True<br>Positive</td></span><br><span class="line"> <td>False<br>Positive</td></span><br><span class="line"> </tr></span><br><span class="line"> <tr></span><br><span class="line"> <td>0<br>Negative</td></span><br><span class="line"> <td>False<br>Negative</td></span><br><span class="line"> <td>True<br>Negative</td></span><br><span class="line"> </tr></span><br><span class="line"></table></span><br></pre></td></tr></table></figure></p><p>效果:</p><table style="text-align:center" class="table-bordered table-striped table-condensed"><tr><td colspan="2" rowspan="2"><p>Example</p></td><td colspan="2"><p>真实类别<br>Truth</p></td></tr><tr><td><p>1 (Event)</p></td><td><p>0 (Non-Event)</p></td></tr><tr><td rowspan="2"><p>预测类别<br>Prediction</p></td><td><p>1<br>Positive</p></td><td><p>True<br>Positive</p></td><td><p>False<br>Positive</p></td></tr><tr><td><p>0<br>Negative</p></td><td><p>False<br>Negative</p></td><td><p>True<br>Negative</p></td></tr></table></li><li><p>Latex语法:</p><ul><li><p>使用array嵌套排列</p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br></pre></td><td class="code"><pre><span class="line">$$</span><br><span class="line">\begin{array}{c|c}</span><br><span class="line">a^2-b^2 & </span><br><span class="line">\begin{array}{ccc} 1+i & 1-i & \frac{1}{\sqrt{2}}</span><br><span class="line">\end{array} \\\hline </span><br><span class="line">\begin{array}{cc}a-b & a+b</span><br><span class="line">\end{array} & </span><br><span class="line"></span><br><span class="line">\begin{array}{c|c}</span><br><span class="line"> A & \begin{array}{ccc} a & b & c \end{array}\\</span><br><span class="line"> \hline</span><br><span class="line"> \begin{array}{cc} X & Y \end{array} & Z</span><br><span class="line">\end{array}</span><br><span class="line"></span><br><span class="line">\end{array}</span><br><span class="line">$$</span><br></pre></td></tr></table></figure></p><p>效果: <span class="math display">\[ \begin{array}{c|c} a^2-b^2 & \begin{array}{ccc} 1+i & 1-i & \frac{1}{\sqrt{2}} \end{array} \\\hline \begin{array}{cc}a-b & a+b \end{array} & \begin{array}{c|c} A & \begin{array}{ccc} a & b & c \end{array}\\ \hline \begin{array}{cc} X & Y \end{array} & Z \end{array} \end{array} \]</span></p></li><li><p>使用<code>rlap</code></p><p><figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br></pre></td><td class="code"><pre><span class="line">\begin{array}{r|lll}</span><br><span class="line">& \rlap{\text{number of foo}} \\</span><br><span class="line">\text{number of bar} & 0 & 1 & 2 \\</span><br><span class="line">\hline</span><br><span class="line">0 & 0.125 & 0.250 & 0.168 \\</span><br><span class="line">1 & 0.125 & 0.250 & 0.168 \\</span><br><span class="line">2 & 0.125 & 0.250 & 0.168</span><br><span class="line">\end{array}</span><br></pre></td></tr></table></figure></p><p>效果: <span class="math display">\[ \begin{array}{r|lll} & \rlap{\text{number of foo}} \\ \text{number of bar} & 0 & 1 & 2 \\ \hline 0 & 0.125 & 0.250 & 0.168 \\ 1 & 0.125 & 0.250 & 0.168 \\ 2 & 0.125 & 0.250 & 0.168 \end{array} \]</span></p></li></ul></li></ol><h2 id="快速插入">快速插入</h2><ul><li>多种格式的表格的生成与相互转换<ul><li><a href="http://tablesgenerator.com/markdown_tables">http://tablesgenerator.com</a></li><li><a href="https://tableconvert.com/">https://tableconvert.com</a> (Thanks <span class="citation" data-cites="*">@*</span>*ayayo** for the recommendation.)</li><li><a href="http://pressbin.com/tools/excel_to_html_table/index.html">http://pressbin.com/tools/excel_to_html_table/index.html</a></li></ul></li></ul><h2 id="其他">其他</h2><ul><li>表格内换行:插入<code><br></code>换行。</li><li>表格样式调整与优化:参考-><a href="http://moxfive.xyz/2016/03/04/markdown-table-style/">这篇</a></li></ul>]]></content>
<tags>
<tag> Productivity </tag>
</tags>
</entry>
<entry>
<title>pandas 数据处理</title>
<link href="data-processing-with-pandas/"/>
<url>data-processing-with-pandas/</url>
<content type="html"><![CDATA[<h2 id="数据选取-pandas.dateframe">数据选取 pandas.DateFrame</h2><h3 id="读取">读取</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a = pd.read_csv(<span class="string">'datasets/results.csv'</span>)</span></span><br><span class="line"><span class="meta">></span><span class="bash">>> a = a.drop([1,3],axis=0)</span></span><br><span class="line"><span class="meta">></span><span class="bash">>> a</span></span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line">0 2018-06-03 Andorra Cape Verde 0 0</span><br><span class="line">2 2018-06-03 Zimbabwe Botswana 1 1</span><br><span class="line">4 2018-06-04 Serbia Chile 0 1</span><br><span class="line">5 2018-06-04 Slovakia Morocco 1 2</span><br><span class="line">6 2018-06-04 Armenia Moldova 0 0</span><br><span class="line">7 2018-06-04 India Kenya 3 0</span><br><span class="line">8 2018-06-05 Russia Turkey 1 1</span><br><span class="line">9 2018-06-05 Romania Finland 2 0</span><br></pre></td></tr></table></figure><h3 id="生成转成dataframe">生成(转成DataFrame)</h3><ul><li><code>df = pd.DataFrame([[1,1],[2,2]])</code></li></ul><h3 id="统计">统计</h3><ul><li>列数:<code>df.shape[1]</code></li><li>行数:<code>df.shape[0]</code></li><li>相同的个数:<code>df['id'].value_counts()</code></li></ul><h3 id="行选择">行选择</h3><ul><li><code>df.head()/df.tail()</code> 展示前/后5行样例 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a.head()</span></span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line">0 2018-06-03 Andorra Cape Verde 0 0</span><br><span class="line">2 2018-06-03 Zimbabwe Botswana 1 1</span><br><span class="line">4 2018-06-04 Serbia Chile 0 1</span><br><span class="line">5 2018-06-04 Slovakia Morocco 1 2</span><br><span class="line">6 2018-06-04 Armenia Moldova 0 0</span><br></pre></td></tr></table></figure></li><li><code>slicing</code> 切片 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a[1:3]</span></span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line">2 2018-06-03 Zimbabwe Botswana 1 1</span><br><span class="line">4 2018-06-04 Serbia Chile 0 1</span><br></pre></td></tr></table></figure></li><li><code>loc[ ]</code> 按数据标注的index选取,选取的结果包含下界 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a.loc[5:6]</span></span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line">5 2018-06-04 Slovakia Morocco 1 2</span><br><span class="line">6 2018-06-04 Armenia Moldova 0 0</span><br></pre></td></tr></table></figure></li><li><code>iloc[ ]</code> 按数据从0开始的实际index选取,且选取结果不包含下界。 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a.iloc[3:5]</span></span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line">5 2018-06-04 Slovakia Morocco 1 2</span><br><span class="line">6 2018-06-04 Armenia Moldova 0 0</span><br></pre></td></tr></table></figure></li></ul><h3 id="列选择">列选择</h3><ul><li>按列名选取 <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a[[<span class="string">'date'</span>,<span class="string">'home_team'</span>]]</span></span><br><span class="line"> date home_team</span><br><span class="line">0 2018-06-03 Andorra</span><br><span class="line">2 2018-06-03 Zimbabwe</span><br><span class="line">4 2018-06-04 Serbia</span><br><span class="line">5 2018-06-04 Slovakia</span><br><span class="line">6 2018-06-04 Armenia</span><br><span class="line">7 2018-06-04 India</span><br><span class="line">8 2018-06-05 Russia</span><br><span class="line">9 2018-06-05 Romania</span><br></pre></td></tr></table></figure></li></ul><h3 id="删除列">删除列</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">df.drop([<span class="string">'B'</span>, <span class="string">'C'</span>], axis=<span class="number">1</span>)</span><br><span class="line">df.drop(columns=[<span class="string">'B'</span>, <span class="string">'C'</span>])</span><br></pre></td></tr></table></figure><h3 id="删除行">删除行</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">df.drop([<span class="number">0</span>, <span class="number">1</span>]) <span class="comment"># 默认axis=0</span></span><br><span class="line">df.drop(index=[<span class="number">0</span>, <span class="number">1</span>])</span><br></pre></td></tr></table></figure><h3 id="选取数据">选取数据</h3><h4 id="按条件选取">按条件选取</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>df[df[<span class="string">'ip_address'</span>]==<span class="string">"213.234.238.52"</span>]</span><br><span class="line"> ip_addressportprotocal countryanonymousresponse</span><br><span class="line"><span class="number">0</span> <span class="number">213.234</span><span class="number">.238</span><span class="number">.52</span><span class="number">8080.0</span>HTTPS RussiaTransparent<span class="number">3407</span> ms</span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>a.loc[a[<span class="string">'date'</span>]><span class="string">'2018-06-04'</span>]</span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line"><span class="number">8</span> <span class="number">2018</span>-06-05 Russia Turkey <span class="number">1</span> <span class="number">1</span></span><br><span class="line"><span class="number">9</span> <span class="number">2018</span>-06-05 Romania Finland <span class="number">2</span> <span class="number">0</span></span><br></pre></td></tr></table></figure><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>b = a.loc[(a[<span class="string">'date'</span>]==<span class="string">'2018-06-04'</span>)&(a[<span class="string">'home_team'</span>]==<span class="string">'Slovakia'</span>)]</span><br><span class="line"><span class="meta">>>> </span>b</span><br><span class="line"> date home_team away_team home_score away_score</span><br><span class="line"><span class="number">5</span> <span class="number">2018</span>-06-04 Slovakia Morocco <span class="number">1</span> <span class="number">2</span></span><br></pre></td></tr></table></figure><h4 id="由具体值选取">由具体值选取</h4><ul><li><code>.values</code> <figure class="highlight plaintext"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line">>>> b['away_team'].values[0]</span><br><span class="line">'Morocco'</span><br></pre></td></tr></table></figure></li><li><code>loc[]</code> <figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> a.loc[4,<span class="string">'date'</span>]</span></span><br><span class="line">'2018-06-04'</span><br></pre></td></tr></table></figure></li></ul><h2 id="数据清洗">数据清洗</h2><h3 id="重复值">重复值</h3><ul><li>查找重复值:<code>data.duplicated()</code></li><li>删除重复值:<code>data.drop_duplicates()</code></li></ul><h3 id="空值">空值</h3><ul><li>Note: 在pandas中空值显示为NaN</li><li>空值:<code>data.isnull()</code></li><li>非空值:<ul><li>dataframe: <code>df.notna()</code></li><li>float: <code>np.isnan()</code></li></ul></li><li>空值数量:<code>data['column'].isnull().value_counts()</code></li><li>处理空值:<ul><li>填充:<ul><li>填充0:<code>data.fillna(0)</code></li><li>用字典对不同列填充:<code>data.fillna({'col1':0,'col2':1})</code></li></ul></li><li>删除:<ul><li>删除全部为NaN的行:<code>data.dropna(axis=0, how='all'))</code></li><li>删除在列名为"age"和"sex"中有NaN的行<code>df.dropna(subset=["age", "sex"])</code></li><li>删除任何含有NaN的列:<code>data.dropna(axis=1, how='any')</code></li></ul></li></ul></li></ul><h3 id="空格">空格</h3><ul><li>用null替代空格<ul><li><code>df["VIN"]=df["VIN"].apply(lambda x: np.NaN if str(x).isspace() else x)</code></li></ul></li><li>去除数据间的空格:<ul><li><code>a['column'] = a['column'].map(str.strip)</code></li><li><code>a['column'] = a['column'].map(str.lstrip)</code></li><li><code>a['column'] = a['column'].map(str.rstrip)</code></li></ul></li></ul><h3 id="大小写">大小写</h3><ul><li>全部大写:<code>a['column'] = a['column'].map(str.upper)</code></li><li>全部小写:<code>a['column'] = a['column'].map(str.lower)</code></li><li>首字母大写:<code>a['column'] = a['column'].map(str.title)</code></li></ul><h3 id="极值异常值">极值/异常值</h3><ul><li>检查数据情况:<code>data.describe().astype(np.int64).T</code></li><li>替换异常值:<code>data.replace([max, min],data['col'].mean())</code></li></ul><h3 id="数据格式">数据格式</h3><ul><li>数值类型转换:<code>data['col1'] = data['col1'].astype(np.int64)</code></li><li>日期格式:<code>data['time'] = pd.to_datetime(data['time'])</code></li></ul><h2 id="数据处理">数据处理</h2><h3 id="统计-1">统计</h3><ul><li>字符数:<code>string.count(" ")</code></li><li><code>pd.value_counts(x)</code></li></ul><h3 id="排序">排序</h3><ul><li><code>df.sort_values(by = "a", ascending = False)</code></li></ul><h3 id="新增赋值数据">新增/赋值数据</h3><ul><li>新增/赋值行:<code>df.loc['raw'] = value</code></li><li>新增/赋值列:<code>df['column'] = value</code></li><li>单个数据<ul><li><code>df.loc[index,'column'] = value</code></li><li><code>df.insert(idx, 'column', value)</code></li></ul></li></ul><h3 id="column重命名">Column重命名</h3><ul><li><code>df.rename(columns={'col':'new_col'},inplace=True)</code></li></ul><h3 id="类型转换">类型转换</h3><ul><li><p>Str to Float: <code>pd.to_numeric(df)</code>/<code>df.apply(pd.to_numeric)</code></p></li><li><p>DataFrame to List:首先使用np.array()函数把DataFrame转化为np.ndarray(),再利用tolist()函数把np.ndarray()转为list</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">data = pd.read_csv(<span class="string">"a.csv"</span>)</span><br><span class="line">data_array = np.array(data)</span><br><span class="line">data_list = data_array.tolist()</span><br></pre></td></tr></table></figure></li></ul><h3 id="运算">运算</h3><ul><li><p>单列运算:<code>map()</code>函数,用于<strong>Series对象</strong>(或DataFrame对象的一列),接收函数作为或字典对象作为参数,返回经过函数或字典映射处理后的值。例:对a['text']列中的每一项进行分句:</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">a[<span class="string">'text'</span>] = a[<span class="string">'text'</span>].<span class="built_in">map</span>(<span class="keyword">lambda</span> x:sent_tokenize(x))</span><br></pre></td></tr></table></figure></li><li><p>多列运算:<code>apply()</code>函数,用于<strong>DataFrame对象</strong>。col3 = col1 + col2</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">a[<span class="string">'col3'</span>] = a.apply(<span class="keyword">lambda</span> x: x[<span class="string">'col1'</span>] + x[<span class="string">'col2'</span>], axis=<span class="number">1</span>)</span><br></pre></td></tr></table></figure></li><li><p>平均值:mean(axis=1)</p><ul><li><strong>axis</strong>: along which the means are computed. <code>axis=0</code> means along index, <code>axis=1</code> means along colunms.</li></ul></li></ul><h3 id="遍历列">遍历列</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br></pre></td><td class="code"><pre><span class="line"><span class="keyword">for</span> index, row <span class="keyword">in</span> df.iterrows():</span><br><span class="line"> <span class="built_in">print</span>(row[<span class="string">"c1"</span>], row[<span class="string">"c2"</span>])</span><br></pre></td></tr></table></figure><h3 id="分组-groupby">分组 groupby</h3><ul><li><p>使用<code>cut()</code>函数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br></pre></td><td class="code"><pre><span class="line">interval = [<span class="number">0</span>, <span class="number">5</span>, <span class="number">10</span>, <span class="number">15</span>, <span class="number">20</span>] <span class="comment">#根据数值大小分区间</span></span><br><span class="line">group_name = [<span class="string">'A'</span>, <span class="string">'B'</span>, <span class="string">'C'</span>, <span class="string">'D'</span>, <span class="string">'E'</span>]</span><br><span class="line">data[<span class="string">'categories'</span>] = pd.cut(data[<span class="string">'col2'</span>], interval, labels = group_names)</span><br></pre></td></tr></table></figure></li><li><p>使用<code>groupby()</code>,<code>transform()</code>函数</p><ul><li>分组后每组的size: <code>df.groupby('type').size()</code></li><li>按A列分组,获取<strong>其他列</strong>的均值:<code>df.groupby('A').mean()</code></li><li>根据<code>type</code>列将数据分组后,对组内<code>col</code>的值进行求和。 <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>df.groupby(<span class="string">'type'</span>)[<span class="string">"col"</span>].<span class="built_in">sum</span>()</span><br><span class="line"><span class="built_in">type</span></span><br><span class="line"><span class="number">1</span> <span class="number">12.0</span></span><br><span class="line"><span class="number">2</span> <span class="number">23.1</span></span><br><span class="line"><span class="number">3</span> <span class="number">34.2</span></span><br><span class="line">Name: col, dtype: float64</span><br></pre></td></tr></table></figure></li><li>根据<code>type</code>列将数据分组后,对组内<code>col</code>的值进行多种计算。 <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">df.groupby(<span class="string">'type'</span>)[<span class="string">'col'</span>].agg({<span class="string">'sum'</span>,<span class="string">'size'</span>})</span><br></pre></td></tr></table></figure></li><li>添加一列数据,其按类显示每行数据<code>col</code>占总数的百分比 <figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">a[<span class="string">'percentage'</span>] = a.groupby(<span class="string">'type'</span>)[<span class="string">'col'</span>].transform(<span class="keyword">lambda</span> x: x / x.<span class="built_in">sum</span>())</span><br></pre></td></tr></table></figure></li></ul></li></ul><h3 id="合并-merge">合并 <a href="https://pandas-docs.github.io/pandas-docs-travis/user_guide/merging.html#brief-primer-on-merge-methods-relational-algebra">merge</a></h3><ul><li><p>Merge on one key:<code>pd.merge(left,right)</code></p><p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>left</span><br><span class="line"> A B key</span><br><span class="line"><span class="number">0</span> A0 B0 K0</span><br><span class="line"><span class="number">1</span> A1 B1 K1</span><br><span class="line"><span class="number">2</span> A2 B2 K2</span><br><span class="line"><span class="number">3</span> A3 B3 K3</span><br><span class="line"><span class="meta">>>> </span>right</span><br><span class="line"> C D key</span><br><span class="line"><span class="number">0</span> C0 D0 K0</span><br><span class="line"><span class="number">1</span> C1 D1 K1</span><br><span class="line"><span class="number">2</span> C2 D2 K2</span><br><span class="line"><span class="number">3</span> C3 D3 K3</span><br><span class="line"><span class="meta">>>> </span>pd.merge(left,right)</span><br><span class="line"> A B key C D</span><br><span class="line"><span class="number">0</span> A0 B0 K0 C0 D0</span><br><span class="line"><span class="number">1</span> A1 B1 K1 C1 D1</span><br><span class="line"><span class="number">2</span> A2 B2 K2 C2 D2</span><br><span class="line"><span class="number">3</span> A3 B3 K3 C3 D3</span><br></pre></td></tr></table></figure></p></li><li><p>Merge on two keys:</p><p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br><span class="line">13</span><br><span class="line">14</span><br><span class="line">15</span><br><span class="line">16</span><br><span class="line">17</span><br><span class="line">18</span><br><span class="line">19</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">>>> </span>left</span><br><span class="line"> A B key1 key2</span><br><span class="line"><span class="number">0</span> A0 B0 K0 K0</span><br><span class="line"><span class="number">1</span> A1 B1 K1 K1</span><br><span class="line"><span class="number">2</span> A2 B2 K2 K2</span><br><span class="line"><span class="number">3</span> A3 B3 K3 K3</span><br><span class="line"></span><br><span class="line"><span class="meta">>>> </span>right</span><br><span class="line"> C D key1 key3</span><br><span class="line"><span class="number">0</span> C0 D0 K0 K0</span><br><span class="line"><span class="number">1</span> C1 D1 K1 K1</span><br><span class="line"><span class="number">2</span> C2 D2 K2 K2</span><br><span class="line"><span class="number">3</span> C3 D3 K3 K4</span><br><span class="line"></span><br><span class="line"><span class="meta">>>> </span>pd.merge(left, right, on = <span class="string">'key2'</span>,suffixes=[<span class="string">'_left'</span>,<span class="string">'_right'</span>])<span class="comment">#默认合并方式how="inner"</span></span><br><span class="line"> A B key1_left key2 C D key1_right</span><br><span class="line"><span class="number">0</span> A0 B0 K0 K0 C0 D0 K0</span><br><span class="line"><span class="number">1</span> A1 B1 K1 K1 C1 D1 K1</span><br><span class="line"><span class="number">2</span> A2 B2 K2 K2 C2 D2 K2</span><br></pre></td></tr></table></figure></p></li><li><p>若合并的两个对象的列名不同,可以单独指定: <code>pd.merge(left, right, left_on='key1',right_on='key2')</code></p><p><figure class="highlight shell"><table><tr><td class="gutter"><pre><span class="line">1</span><br><span class="line">2</span><br><span class="line">3</span><br><span class="line">4</span><br><span class="line">5</span><br><span class="line">6</span><br><span class="line">7</span><br><span class="line">8</span><br><span class="line">9</span><br><span class="line">10</span><br><span class="line">11</span><br><span class="line">12</span><br></pre></td><td class="code"><pre><span class="line"><span class="meta">></span><span class="bash">>> pd.merge(left, right, on=[<span class="string">'key1'</span>,<span class="string">'key2'</span>])</span></span><br><span class="line"> A B key1 key2 C D</span><br><span class="line">0 A0 B0 K0 K0 C0 D0</span><br><span class="line">1 A1 B1 K1 K1 C1 D1</span><br><span class="line">2 A2 B2 K2 K2 C2 D2</span><br><span class="line"><span class="meta"> </span></span><br><span class="line"><span class="meta">></span><span class="bash">>> pd.merge(left, right, on = [<span class="string">'key1'</span>, <span class="string">'key2'</span>], how = <span class="string">'left'</span>, indicator = True) <span class="comment">#indicator展示合并方式</span></span></span><br><span class="line"> A B key1 key2 C D _merge</span><br><span class="line">0 A0 B0 K0 K0 C0 D0 both</span><br><span class="line">1 A1 B1 K1 K1 C1 D1 both</span><br><span class="line">2 A2 B2 K2 K2 C2 D2 both</span><br><span class="line">3 A3 B3 K3 K3 NaN NaN left_only</span><br></pre></td></tr></table></figure></p></li><li><p>right on index合并: <code>pd.merge(left,right,right_index=True,left_on=['key'])</code></p></li></ul><h3 id="导出数据-csv">导出数据 csv</h3><ul><li><code>df.to_csv('data.csv')</code></li><li><code>df.to_csv('data.csv',header=0,index=0)</code>不保留列名,索引</li></ul><h2 id="作图plot">作图plot</h2><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">pd.plot(*args, **kwargs)</span><br></pre></td></tr></table></figure><p>重要参数</p><figure class="highlight python"><table><tr><td class="gutter"><pre><span class="line">1</span><br></pre></td><td class="code"><pre><span class="line">DataFrame.plot(x=<span class="literal">None</span>, y=<span class="literal">None</span>, kind=<span class="string">'line'</span>, ax=<span class="literal">None</span>, subplots=<span class="literal">False</span>, sharex=<span class="literal">None</span>, sharey=<span class="literal">False</span>, layout=<span class="literal">None</span>,figsize=<span class="literal">None</span>, use_index=<span class="literal">True</span>, title=<span class="literal">None</span>, grid=<span class="literal">None</span>, legend=<span class="literal">True</span>, style=<span class="literal">None</span>, logx=<span class="literal">False</span>, logy=<span class="literal">False</span>, loglog=<span class="literal">False</span>, xticks=<span class="literal">None</span>, yticks=<span class="literal">None</span>, xlim=<span class="literal">None</span>, ylim=<span class="literal">None</span>, rot=<span class="literal">None</span>,xerr=<span class="literal">None</span>,secondary_y=<span class="literal">False</span>, sort_columns=<span class="literal">False</span>, **kwds)</span><br></pre></td></tr></table></figure><ul><li>logx / logy:设置坐标轴刻度是否取对数,boolean</li><li>xlim / ylim:设置坐标轴范围,2-tuple/list</li><li>linestyle:可取:":", "--", "-.", "-"</li></ul><h2 id="references">References</h2><ul><li><a href="http://noahsnail.com/2017/04/29/2017-4-29-pandas%E7%9A%84%E5%9F%BA%E6%9C%AC%E7%94%A8%E6%B3%95(%E4%B8%83)%E2%80%94%E2%80%94%E5%90%88%E5%B9%B6%E6%95%B0%E6%8D%AEmerge/">pandas的基本用法(七)——合并数据merge</a></li><li><a href="http://bluewhale.cc/2016-08-21/python-data-cleaning.html">使用python进行数据清洗</a></li><li><a href="https://www.cnblogs.com/lemonbit/p/6810972.html">Pandas分组运算(groupby)修炼</a></li><li><a href="https://blog.csdn.net/brucewong0516/article/details/80524442">详解pandas.DataFrame.plot()</a></li></ul>]]></content>
<tags>
<tag> python </tag>
</tags>
</entry>
<entry>
<title>Win10笔记本触控板重生</title>
<link href="Rebirth-of-Touchpad-on-Win10-Laptop/"/>
<url>Rebirth-of-Touchpad-on-Win10-Laptop/</url>
<content type="html"><![CDATA[<p>改善Windows10笔记本触控板体验的方法!!</p><h2 id="背景">背景</h2><p>一直以来,我都无比羡慕MacBook超强大的触控板,其可以脱离鼠标进行各种高效的操作。随着对笔记本使用的增加,我越来越忍受不了手里这台联想笔记本的“智障式”触摸板,并让我十分的好奇:“为什么Windows的触控板会如此难用?” 然而,这两天查找了<a href="Win10%E7%AC%94%E8%AE%B0%E6%9C%AC%E8%A7%A6%E6%8E%A7%E6%9D%BF%E9%87%8D%E7%94%9F.md#%E5%8F%82%E8%80%83%E6%BA%90">各种资料</a>,惊喜的发现这个问题是可以被解决的!</p><span id="more"></span><h2 id="精确式触控precision-touchpad">精确式触控(Precision Touchpad)</h2><blockquote><p>长久以来,在PC上触摸板的体验一直不尽人意。微软也一度试图解决这个问题,但PC厂商数量众多,产品也良莠不齐。面对这种状况,微软联合英特尔、义隆电子和新思科技提出“精确式触控板”概念。精确式触控板支持多点触控,并直接由Windows操作系统控制而非第三方驱动程序。相比与传统触摸板,精确式触摸板能够给用户提供更好的交互体验。</p></blockquote><h2 id="我的设备信息">我的设备信息</h2><ul><li>5年老机联想Y500</li><li>操作系统:Windows 10 (Cumulative Update, Version 1706, x64-based Systems)</li><li>触控板厂商:Elan</li><li>触控板驱动: 联想适配,但2014年后不再更新。</li></ul><h2 id="操作步骤">操作步骤</h2><ol type="1"><li>确定触控板厂商:Elan/Synaptics。 我的是Elan,其他Synaptics触控板可以参考这篇-><a href="https://zhuanlan.zhihu.com/p/28888470">文章</a></li><li>下载精确式触控驱动。 去微软官方的<a href="https://www.catalog.update.microsoft.com/Home.aspx">Microsoft update catalog</a>上搜索<code>elan wdf</code>,下载最新版驱动。 <strong>这里有几个细节</strong>:</li></ol><ul><li>该驱动随Win10更新而更新,尽量下载最新版。</li><li>当时我直觉认为列表最上面的是最新版,然后直接点选了第一条的<code>Elan - Other hardware - ELAN Input Device For WDF</code>,信息如下: <img src="../images/20180618110805000.png" alt="Screenshot from 2018-06-18 11-08-05" /> 然后,按之后步骤更新了驱动,是正常有效的。但是后来才发现此驱动更新日期为<code>2016/9/30</code>,不是最新版,但也不影响使用(摊手.jpg)...且效果提升已然很好,所以没有试其他的,你们可以自己试下其他的驱动。</li></ul><ol start="3" type="1"><li><p>将下载的驱动放到任意一个空的文件夹,并解压。</p></li><li><p>双击dpinst.exe运行。 <a href="https://zhuanlan.zhihu.com/p/28888470">参考这篇</a>是建议手动查找驱动然后更新的方法,比较麻烦,而且试的时候我的笔记本一直报错,索性选择直接运行解压后的dpinst.exe文件,发现是这样也是可以的。 <strong>更新:</strong> 昨天更新了win10系统至1803,触控板的驱动失效了,需要重新装,发现直接运行dpinst.exe并没有效果,于是按照<a href="https://zhuanlan.zhihu.com/p/28888470">参考文章</a>手动更新后才生效,方法: >打开设备管理器,选择自己的触摸板设备,右键选择更新驱动。在弹出的窗口中选择“浏览我的计算机以查找驱动程序”,之后选则“让我从计算机上的可用驱动程序列表中选取”,再在弹出的窗口中选择<code>从磁盘安装</code>,选择刚刚才解压目录中的<code>AutoRun</code>(我点了<code>ETD.inf</code>),一路next,之后重启就大功告成了。</p></li><li><p>重启电脑后生效,系统设置中出现精确式触控的设置选项: <img src="../images/20180618110805001.png" alt="multi" /></p></li><li><p>享受飞一般效率提升吧!!</p></li><li><p><p><span style="background-color:rgb(241, 239, 8);color:rgb(255, 15, 0)">注意:</span></p></p></li></ol><ul><li>此方法在我的电脑(联想Y500)上是可行的,其余厂商如Thinkpad、Dell、神州、微星等等在<a href="https://zhuanlan.zhihu.com/p/28888470">这篇文章</a>中都有成功案例。但是不保证每台机器都有效,也有评论说按此方法更新驱动后出现各种问题,请自己斟酌。</li><li>不想冒风险重装的,也可以试着通过修改注册表的方式更改设置:<ul><li><code>Win + r</code>打开运行,输入<code>regedit</code>进入注册表页面。</li><li>找到Elan驱动的注册表位置<code>[HKEY_CURRENT_USER\Software\Elantech]</code>,参考以下两篇文章操作: <a href="http://tieba.baidu.com/p/4099196263">修改注册表键值扩充y50触摸板功能</a> <a href="https://blog.csdn.net/IKQMKSQM/article/details/73470032">关于elan触摸板实现三指点击的方法</a></li><li>修改完成后,重启才能生效。</li><li>如果失败,或者没用可以再改回来~</li></ul></li><li>安装完成后,一定记得重启电脑使更改生效。我弄完后没重启,发现触控板虽然可以识别,但是点击确认只能按压,无法通过触摸完成,然后绕了很多弯路都没解决,最后重启了才变正常(当然,也可能是因为没安装最新版驱动的原因)。</li><li>发现三指或四指有时有误判的情况,可以适当的将手指略分开一点,误判会减少许多。</li><li>新装的驱动注册表位置: <code>[HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\PrecisionTouchPad]</code>,可以自己继续更改探索~ ## 参考源</li><li><a href="https://zhuanlan.zhihu.com/p/28888470">为Synaptics与Elan驱动的触控板安装微软精确式触控板驱动</a></li><li><a href="http://drivers.softpedia.com/get/KEYBOARD-and-MOUSE/Elantech/ELAN-Input-Device-for-WDF-Driver-16-21-13-3-for-Windows-10-Anniversary-Update-64-bit.shtml">ELAN Input Device for WDF Driver 16.21.13.3 for Windows 10 Anniversary Update 64-bit</a></li><li><a href="https://answers.microsoft.com/en-us/windows/forum/windows8_1-hardware/two-finger-right-click-does-not-work/c7fa3239-0c53-4e7b-87d8-57170293513d">Two finger right click does not work</a></li><li><a href="http://tieba.baidu.com/p/4099196263">修改注册表键值扩充y50触摸板功能</a></li><li><a href="https://blog.csdn.net/IKQMKSQM/article/details/73470032">关于elan触摸板实现三指点击的方法</a></li></ul>]]></content>
<tags>
<tag> Productivity </tag>
<tag> Windows </tag>
</tags>
</entry>
</search>